Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Data Science for Web3

You're reading from  Data Science for Web3

Product type Book
Published in Dec 2023
Publisher Packt
ISBN-13 9781837637546
Pages 344 pages
Edition 1st Edition
Languages
Author (1):
Gabriela Castillo Areco Gabriela Castillo Areco
Profile icon Gabriela Castillo Areco

Table of Contents (23) Chapters

Preface Part 1 Web3 Data Analysis Basics
Chapter 1: Where Data and Web3 Meet Chapter 2: Working with On-Chain Data Chapter 3: Working with Off-Chain Data Chapter 4: Exploring the Digital Uniqueness of NFTs – Games, Art, and Identity Chapter 5: Exploring Analytics on DeFi Part 2 Web3 Machine Learning Cases
Chapter 6: Preparing and Exploring Our Data Chapter 7: A Primer on Machine Learning and Deep Learning Chapter 8: Sentiment Analysis – NLP and Crypto News Chapter 9: Generative Art for NFTs Chapter 10: A Primer on Security and Fraud Detection Chapter 11: Price Prediction with Time Series Chapter 12: Marketing Discovery with Graphs Part 3 Appendix
Chapter 13: Building Experience with Crypto Data – BUIDL Chapter 14: Interviews with Web3 Data Leaders Index Other Books You May Enjoy Appendix 1
Appendix 2
Appendix 3

Working with Off-Chain Data

In the previous chapter, we learned that on-chain data serves as the primary source of Web3 data analysis. It is open, distributed, and trustworthy. While on-chain data will be key to answering most business data science questions, it is essential to complement it with relevant information from off-chain data sources, which is the focus of this chapter.

Consider a scenario where we receive a request to assess the economic relevance of a smart contract. We can query the number of tokens locked in it, but to finalize the analysis, we need to determine the monetary value of those tokens. To accomplish this, we must integrate on-chain data with prices, often derived from off-chain sources.

Prices, news, and opinions are not stored on-chain and must be retrieved from external sources. In this chapter, we will delve into those sources and acquire data from selected APIs. Specifically, we will discuss alternatives for fetching prices, analyze a crypto news...

Technical requirements

We will be using Tweepy, a Python library that allows us to easily interact with X. With Tweepy, we can fetch, post, and retrieve information about tweets, users, and much more. To start using Tweepy, we first need to register for a developer account on the Twitter developer website and obtain a set of API keys, as explained in Appendix 2. The documentation for Tweepy is available at https://docs.tweepy.org/en/stable/.

If you have not worked with Tweepy before, it can be installed with the following code:

pip install tweepy

Additionally, we’ll be utilizing Plotly graph objects and Plotly Express, two visualization libraries that empower us to create interactive visualizations with Python. Plotly Express is a high-level library that allows us to plot common types of graphs—such as scatter plots, line charts, maps, pie charts, and more—with minimal lines of code. The documentation for Plotly Express can be found at https://plotly.com...

Introductory example – listing data sources

Let’s examine this headline:

Dogecoin gains 25% after Elon Musk confirms Tesla will accept DOGE for merchandise

(Source – Cointelegraph: https://cointelegraph.com/news/dogecoin-gains-25-after-elon-musk-confirms-tesla-will-accept-doge-for-merchandise.)

This headline references three data sources:

  • Headline (news): An online newspaper specializing in the Web3 industry generates this headline. Blockchain news is gradually entering mainstream platforms, and similar information can also be found in traditional financial news indexes such as Reuters.
  • Prices: The headline refers to the price variation of a particular cryptocurrency. Price data is not typically fetched from the on-chain sources; rather, it is a piece of information data scientists find useful to integrate from a third-party data source.
  • X (formerly Twitter)/social networks: Numerous market-impact events unfold on social networks, where...

Adding prices to our dataset

Price information is typically stored off-chain, and various sources provide access to this data. Some of the most popular APIs include the following:

  • CoinGecko
  • CoinMarketCap
  • Binance
  • Chainlink
  • OHLC data: Kraken

Each API comes with its own limitations, which we need to consider when deciding whether to integrate them into our projects. Specific details can be found in their respective documentation.

Regarding price data, it is important to understand how it is calculated, as in these examples:

  • CoinMarketCap calculates an asset’s price by considering the volume-weighted average of all markets where the asset is traded. This approach is based on the notion that more liquid markets are less susceptible to price fluctuations or manipulation and, therefore, more reliable.
  • Binance reports prices based on transactions conducted on their platform. Depending on the pair, it provides the price of the last trade...

Adding news to our dataset

A professor once mentioned that in the crypto world, news takes only five minutes to impact the price of an asset.

News is not only important for its effect on prices but marketing teams may also request analysis of the social impact of a brand, a campaign, or a product, or it may be necessary to source an algorithm, among other applications. For that purpose, data scientists need news formatted for analysis.

As of today, there is a dedicated source named CryptoPanic, a data aggregator that specifically indexes news relevant to the Web3 ecosystem. The link to the website is https://cryptopanic.com/.

Its data can be consumed through an API and the API key is available upon registration.

On the main page, go to the Sign In tab on the left menu:

Figure 3.16 – An overview of the CryptoPanic main view

Figure 3.16 – An overview of the CryptoPanic main view

If it is your first time signing up, you will need to confirm your email. After that, you are registered. Click...

Adding social networks to our dataset

Web3 is an online industry so everything that happens online, from opinions to interactions, holds significant influence.

Sentiment analysis, gauging reactions to products or tokens, plays a crucial role for marketing teams, analysts, and traders alike. A noteworthy example illustrating the importance of such metrics is the CoinStats Fear and Greed indicator. This index, available at https://coinstats.app/fear-and-greed/, incorporates social media posts, among other factors, to measure market sentiment.

Figure 3.20 – Crypto Fear and Greed Indicator

Figure 3.20 – Crypto Fear and Greed Indicator

According to CoinStats’ explanation, the index combines data from various sources. To capture psychological momentum, they also draw insights from social media interactions on X, focusing on specific hashtags that carry both fear and greed components, which contribute to the overall calculation. The social media component holds a 15% weight in the final...

Summary

In this chapter, we examined various off-chain sources of data relevant to the Web3 economy, categorizing the analysis into three main areas. For prices, we explored multiple APIs from traditional sources to exchanges, as well as an oracle. With news, we learned how to extract real-time headlines from the best dedicated news indexer as of the time of writing – namely, CryptoPanic. For X (formerly Twitter), we utilized its API to gauge the sentiment around an NFT protocol. This list of resources is not exhaustive, and we have only scratched the surface of the uses we can give to all this data.

In the next chapter, we will delve into NFTs and their applications in the gaming, art, and name service industries.

Further reading

To complement this chapter, the following links may help:

lock icon The rest of the chapter is locked
You have been reading a chapter from
Data Science for Web3
Published in: Dec 2023 Publisher: Packt ISBN-13: 9781837637546
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}