Reader small image

You're reading from  scikit-learn Cookbook - Second Edition

Product typeBook
Published inNov 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781787286382
Edition2nd Edition
Languages
Right arrow
Author (1)
Trent Hauck
Trent Hauck
author image
Trent Hauck

Trent Hauck is a data scientist living and working in the Seattle area. He grew up in Wichita, Kansas and received his undergraduate and graduate degrees from the University of Kansas. He is the author of the book Instant Data Intensive Apps with pandas How-to, Packt Publishing—a book that can get you up to speed quickly with pandas and other associated technologies.
Read more about Trent Hauck

Right arrow

What this book covers

Chapter 1, High-Performance Machine Learning – NumPy, features your first machine learning algorithm with support vector machines. We distinguish between classification (what type?) and regression (how much?). We predict an outcome on data we have not seen.

Chapter 2, Pre-Model Workflow and Pre-Processing, exposes a realistic industrial setting with plenty of data munging and pre-processing. To do machine learning, you need good data, and this chapter tells you how to get it and get it into good form for machine learning.

Chapter 3, Dimensionality Reduction, discusses reducing the number of features to simplify machine learning and allow better use of computational resources.

Chapter 4, Linear Models with scikit-learn, tells the story of linear regression, the oldest predictive model, from the machine learning and artificial intelligence lenses. You deal with correlated features with ridge regression, eliminate related features with LASSO and cross-validation, or eliminate outliers with robust median-based regression.

Chapter 5, Linear Models – Logistic Regression, examines the important healthcare datasets for cancer and diabetes with logistic regression. This model highlights both similarities and differences between regression and classification, the two types of supervised learning.

Chapter 6, Building Models with Distance Metrics, places points in your familiar Euclidean space of school geometry, as distance is synonymous with similarity. How close (similar) or far away are two points? Can we group them together? With Euclid's help, we can approach unsupervised learning with k-means clustering and place points in categories we do not know in advance.

Chapter 7, Cross-Validation and Post-Model Workflow, features how to select a model that works well with cross-validation: iterated training and testing of predictions. We also save computational work with the pickle module.

Chapter 8, Support Vector Machines, examines in detail the support vector machine, a powerful and easy-to-understand algorithm.

Chapter 9, Tree Algorithms and Ensembles, features the algorithms of decision making: decision trees. This chapter introduces meta-learning algorithms, diverse algorithms that vote in some fashion to increase overall predictive accuracy.

Chapter 10, Text and Multiclass Classification with scikit-learn, reviews the basics of natural language processing with the simple bag-of-words model. In general, we view classification with three or more categories.

Chapter 11, Neural Networks, introduces a neural network and perceptrons, the components of a neural network. Each layer figures out a step in a process, leading to a desired outcome. As we do not program any steps specifically, we venture into artificial intelligence. Save the neural network so that you can keep training it later, or load it and utilize it as part of a stacking ensemble.

Chapter 12, Create a Simple Estimator, helps you make your own scikit-learn estimator, which you can contribute to the scikit-learn community and take part in the evolution of data science with scikit-learn.

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
scikit-learn Cookbook - Second Edition
Published in: Nov 2017Publisher: PacktISBN-13: 9781787286382

Author (1)

author image
Trent Hauck

Trent Hauck is a data scientist living and working in the Seattle area. He grew up in Wichita, Kansas and received his undergraduate and graduate degrees from the University of Kansas. He is the author of the book Instant Data Intensive Apps with pandas How-to, Packt Publishing—a book that can get you up to speed quickly with pandas and other associated technologies.
Read more about Trent Hauck