You're reading from Data Science for Web3

Product type Book

Published in Dec 2023

Publisher Packt

ISBN-13 9781837637546

Pages 344 pages

Edition 1st Edition

Languages

Concepts

Data Science

Author (1):

Gabriela Castillo Areco

Table of Contents (23) Chapters

Preface

Part 1 Web3 Data Analysis Basics

Chapter 1: Where Data and Web3 Meet

Chapter 2: Working with On-Chain Data

Chapter 3: Working with Off-Chain Data

Chapter 4: Exploring the Digital Uniqueness of NFTs – Games, Art, and Identity

Chapter 5: Exploring Analytics on DeFi

Part 2 Web3 Machine Learning Cases

Chapter 6: Preparing and Exploring Our Data

Chapter 7: A Primer on Machine Learning and Deep Learning

Chapter 8: Sentiment Analysis – NLP and Crypto News

Chapter 9: Generative Art for NFTs

Chapter 10: A Primer on Security and Fraud Detection

Chapter 11: Price Prediction with Time Series

Chapter 12: Marketing Discovery with Graphs

Part 3 Appendix

Chapter 13: Building Experience with Crypto Data – BUIDL

Chapter 14: Interviews with Web3 Data Leaders

Index

Why subscribe?

Other Books You May Enjoy

Appendix 1

Appendix 2

Appendix 3

A Primer on Machine Learning and Deep Learning

Before applying any machine learning algorithm, having a comprehensive understanding of the dataset and its key features is essential. This understanding is typically derived through exploratory data analysis (EDA). Once acquainted with the data, we must invest time in feature engineering, which involves selecting, transforming, and creating new features (if necessary) to enable the use of the chosen model or enhance its performance. Feature engineering may include tasks such as converting classes into numerical values, scaling or normalizing features, creating new features from existing ones, and more. This process is tailored for each specific model and dataset under analysis. Once this process is completed, we can proceed to modeling.

The goal of this chapter is to review introductory concepts of machine learning and deep learning, laying the foundation for Part 2 of this book. In Part 2, we will delve into various use cases where...

Technical requirements

We will be using scikit-learn, a popular Python library specially designed for machine learning tasks. It offers algorithms and tools for data preprocessing, feature selection, model selection, and model evaluation.

If you have not worked with scikit-learn before, it can be installed by using the following code snippet:

pip install scikit-learn

The documentation for scikit-learn can be found at https://scikit-learn.org/stable/.

For deep learning, we have the option to use TensorFlow or Keras. TensorFlow is a powerful open source library for numerical computation that provides solutions to train, test, and deploy a variety of deep learning neural networks. It serves as the infrastructure layer, which enables low-level tensor operations on the CPU, TPU, and GPU. On the other hand, Keras is a high-level Python API built on top of TensorFlow. It is specially prepared to enable fast experimentation and provides informative feedback when an error is discovered...

Introducing machine learning

The definition of machine learning, as provided by Computer Science Wiki, is “a field of inquiry devoted to understanding and building methods that “learn” – that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so.”

(Source: https://computersciencewiki.org/index.php/Machine_learning)

Professor Jason Brownlee defines deep learning as “a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks.” Deep learning is distinguishable from other machine learning methods because it uses artificial neural networks as a basis for its methods.

The relationship between these two fields...

Building a machine learning pipeline

After cleaning the data and selecting the most important features, the machine learning flow can be summarized into steps, as shown in Figure 7.4:

Figure 7.4 – Machine learning pipeline

To carry out this process, we must do the following:

Select a model and its initial parameters based on the problem and available data.
Train: First, we must split the data into a training set and a test set. The process of training consists of making the model learn from the data. Each model’s training process can vary in time and computational consumption. To improve the model’s performance, we must employ hyperparameter tuning through techniques such as grid search or random grid search.
Predict and evaluate: The trained model is then used to predict over the test set, which contains rows of data that have not been seen by the algorithm. If we evaluate the model with the data that we used to train...

Introducing deep learning

In Part 2 of this book, we will also use deep learning methodologies when solving the use cases. Deep learning models employ multiple layers of interconnected nodes called neurons, which process input data and produce outputs based on learned weights and activation functions. The connections between neurons facilitate information flow, and the architecture of the network determines how information is processed and transformed.

We will study three types of neural network architectures in detail in their corresponding chapters. For now, let’s introduce the framework and terminology that we will use in them.

The neuron serves as the fundamental building block of the system and can be defined as a node with one or more input values, weights, and output values:

Figure 7.9 – A neuron’s structure

When we stack multiple layers with this structure, it becomes a neural network. This architecture typically consists...

Summary

In this chapter, we delved into the fundamental concepts of artificial intelligence, which will serve as the foundation for our journey in Part 2 of this book. We explored various types of tasks, including supervised learning, unsupervised learning, and reinforcement learning. Through a hands-on example, we gained insights into the typical machine learning process, which encompasses model selection, training, and evaluation.

Throughout this chapter, we acquired essential knowledge related to common challenges in machine learning, such as striking the right balance between underfitting and overfitting models, the existence of imbalanced datasets, and which metrics are relevant to evaluate models that are trained with them. Understanding these concepts is vital for any successful machine learning project.

Moreover, we progressed into the basics of deep learning, where we explored the key components of a neural network using Keras. Additionally, we implemented a pipeline...