Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Data Science for Web3

You're reading from  Data Science for Web3

Product type Book
Published in Dec 2023
Publisher Packt
ISBN-13 9781837637546
Pages 344 pages
Edition 1st Edition
Languages
Author (1):
Gabriela Castillo Areco Gabriela Castillo Areco
Profile icon Gabriela Castillo Areco

Table of Contents (23) Chapters

Preface Part 1 Web3 Data Analysis Basics
Chapter 1: Where Data and Web3 Meet Chapter 2: Working with On-Chain Data Chapter 3: Working with Off-Chain Data Chapter 4: Exploring the Digital Uniqueness of NFTs – Games, Art, and Identity Chapter 5: Exploring Analytics on DeFi Part 2 Web3 Machine Learning Cases
Chapter 6: Preparing and Exploring Our Data Chapter 7: A Primer on Machine Learning and Deep Learning Chapter 8: Sentiment Analysis – NLP and Crypto News Chapter 9: Generative Art for NFTs Chapter 10: A Primer on Security and Fraud Detection Chapter 11: Price Prediction with Time Series Chapter 12: Marketing Discovery with Graphs Part 3 Appendix
Chapter 13: Building Experience with Crypto Data – BUIDL Chapter 14: Interviews with Web3 Data Leaders Index Other Books You May Enjoy Appendix 1
Appendix 2
Appendix 3

A Primer on Machine Learning and Deep Learning

Before applying any machine learning algorithm, having a comprehensive understanding of the dataset and its key features is essential. This understanding is typically derived through exploratory data analysis (EDA). Once acquainted with the data, we must invest time in feature engineering, which involves selecting, transforming, and creating new features (if necessary) to enable the use of the chosen model or enhance its performance. Feature engineering may include tasks such as converting classes into numerical values, scaling or normalizing features, creating new features from existing ones, and more. This process is tailored for each specific model and dataset under analysis. Once this process is completed, we can proceed to modeling.

The goal of this chapter is to review introductory concepts of machine learning and deep learning, laying the foundation for Part 2 of this book. In Part 2, we will delve into various use cases where...

Technical requirements

We will be using scikit-learn, a popular Python library specially designed for machine learning tasks. It offers algorithms and tools for data preprocessing, feature selection, model selection, and model evaluation.

If you have not worked with scikit-learn before, it can be installed by using the following code snippet:

pip install scikit-learn

The documentation for scikit-learn can be found at https://scikit-learn.org/stable/.

For deep learning, we have the option to use TensorFlow or Keras. TensorFlow is a powerful open source library for numerical computation that provides solutions to train, test, and deploy a variety of deep learning neural networks. It serves as the infrastructure layer, which enables low-level tensor operations on the CPU, TPU, and GPU. On the other hand, Keras is a high-level Python API built on top of TensorFlow. It is specially prepared to enable fast experimentation and provides informative feedback when an error is discovered...

Introducing machine learning

The definition of machine learning, as provided by Computer Science Wiki, is “a field of inquiry devoted to understanding and building methods that “learn” – that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so.”

(Source: https://computersciencewiki.org/index.php/Machine_learning)

Professor Jason Brownlee defines deep learning as “a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks.” Deep learning is distinguishable from other machine learning methods because it uses artificial neural networks as a basis for its methods.

The relationship between these two fields...

Building a machine learning pipeline

After cleaning the data and selecting the most important features, the machine learning flow can be summarized into steps, as shown in Figure 7.4:

Figure 7.4 – Machine learning pipeline

Figure 7.4 – Machine learning pipeline

To carry out this process, we must do the following:

  1. Select a model and its initial parameters based on the problem and available data.
  2. Train: First, we must split the data into a training set and a test set. The process of training consists of making the model learn from the data. Each model’s training process can vary in time and computational consumption. To improve the model’s performance, we must employ hyperparameter tuning through techniques such as grid search or random grid search.
  3. Predict and evaluate: The trained model is then used to predict over the test set, which contains rows of data that have not been seen by the algorithm. If we evaluate the model with the data that we used to train...

Introducing deep learning

In Part 2 of this book, we will also use deep learning methodologies when solving the use cases. Deep learning models employ multiple layers of interconnected nodes called neurons, which process input data and produce outputs based on learned weights and activation functions. The connections between neurons facilitate information flow, and the architecture of the network determines how information is processed and transformed.

We will study three types of neural network architectures in detail in their corresponding chapters. For now, let’s introduce the framework and terminology that we will use in them.

The neuron serves as the fundamental building block of the system and can be defined as a node with one or more input values, weights, and output values:

Figure 7.9 – A neuron’s structure

Figure 7.9 – A neuron’s structure

When we stack multiple layers with this structure, it becomes a neural network. This architecture typically consists...

Summary

In this chapter, we delved into the fundamental concepts of artificial intelligence, which will serve as the foundation for our journey in Part 2 of this book. We explored various types of tasks, including supervised learning, unsupervised learning, and reinforcement learning. Through a hands-on example, we gained insights into the typical machine learning process, which encompasses model selection, training, and evaluation.

Throughout this chapter, we acquired essential knowledge related to common challenges in machine learning, such as striking the right balance between underfitting and overfitting models, the existence of imbalanced datasets, and which metrics are relevant to evaluate models that are trained with them. Understanding these concepts is vital for any successful machine learning project.

Moreover, we progressed into the basics of deep learning, where we explored the key components of a neural network using Keras. Additionally, we implemented a pipeline...

Further reading

To learn more about the topics that were covered in this chapter, take a look at the following resources:

  • Definitions:
    • Igual, L. and Seguí, S. (2017). Introduction to data science: A python approach to concepts, techniques and applications. Springer.
    • Ertel, W. (2018). Introduction to artificial intelligence. Springer.
    • Skansi, S. (2018). Introduction to deep learning: From logical calculus to artificial intelligence. Springer.
    • Ian Goodfellow, Yoshua Bengio, and Aaron Courville. (2016). Deep Learning. Available at https://www.deeplearningbook.org/.
    • Chollet, F. (2017). Deep Learning with Python. Manning Publications.
    • Müller, A. C. and Guido, S. (2016). Introduction to Machine Learning with Python: A guide for data scientists. O’Reilly Media.
    • VanderPlas, J. (n.d.). What Is Machine Learning? Pythonic Perambulations. Available at https://jakevdp.github.io/PythonDataScienceHandbook/05.01-what-is-machine-learning.html.
    • What is Deep Learning?: https://machinelearningmastery...
lock icon The rest of the chapter is locked
You have been reading a chapter from
Data Science for Web3
Published in: Dec 2023 Publisher: Packt ISBN-13: 9781837637546
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}