All Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Newsletters

Free Learning

You're reading from Python Machine Learning - Third Edition

Product type Book

Published in Dec 2019

Publisher Packt

ISBN-13 9781789955750

Pages 772 pages

Edition 3rd Edition

Languages

Python

Concepts

Machine Learning

Authors (2):

Sebastian Raschka

Vahid Mirjalili

View More author details

Table of Contents (21) Chapters

Preface

1. Giving Computers the Ability to Learn from Data

2. Training Simple Machine Learning Algorithms for Classification

3. A Tour of Machine Learning Classifiers Using scikit-learn

4. Building Good Training Datasets – Data Preprocessing

5. Compressing Data via Dimensionality Reduction

6. Learning Best Practices for Model Evaluation and Hyperparameter Tuning

7. Combining Different Models for Ensemble Learning

8. Applying Machine Learning to Sentiment Analysis

9. Embedding a Machine Learning Model into a Web Application

10. Predicting Continuous Target Variables with Regression Analysis

11. Working with Unlabeled Data – Clustering Analysis

12. Implementing a Multilayer Artificial Neural Network from Scratch

13. Parallelizing Neural Network Training with TensorFlow

14. Going Deeper – The Mechanics of TensorFlow

15. Classifying Images with Deep Convolutional Neural Networks

16. Modeling Sequential Data Using Recurrent Neural Networks

17. Generative Adversarial Networks for Synthesizing New Data

18. Reinforcement Learning for Decision Making in Complex Environments

19. Other Books You May Enjoy

Leave a review - let other readers know what you think

20. Index

Introducing the bag-of-words model

You may remember from Chapter 4, Building Good Training Datasets – Data Preprocessing, that we have to convert categorical data, such as text or words, into a numerical form before we can pass it on to a machine learning algorithm. In this section, we will introduce the bag-of-words model, which allows us to represent text as numerical feature vectors. The idea behind bag-of-words is quite simple and can be summarized as follows:

We create a vocabulary of unique tokens—for example, words—from the entire set of documents.
We construct a feature vector from each document that contains the counts of how often each word occurs in the particular document.

Since the unique words in each document represent only a small subset of all the words in the bag-of-words vocabulary, the feature vectors will mostly consist of zeros, which is why we call them sparse. Do not worry if this sounds too abstract; in the following...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime}

Authors (2)

Sebastian Raschka

Sebastian Raschka is an Assistant Professor of Statistics at the University of Wisconsin-Madison focusing on machine learning and deep learning research. As Lead AI Educator at Grid AI, Sebastian plans to continue following his passion for helping people get into machine learning and artificial intelligence.

See other products by Sebastian Raschka

Vahid Mirjalili

Vahid Mirjalili is a deep learning researcher focusing on CV applications. Vahid received a Ph.D. degree in both Mechanical Engineering and Computer Science from Michigan State University.

See other products by Vahid Mirjalili