Hands-On Data Science and Python Machine Learning

This book covers the fundamentals of machine learning with Python in a concise and dynamic manner. It covers data mining and large-scale machine learning using Apache Spark.
Preview in Mapt

Hands-On Data Science and Python Machine Learning

Frank Kane

1 customer reviews
This book covers the fundamentals of machine learning with Python in a concise and dynamic manner. It covers data mining and large-scale machine learning using Apache Spark.

Quick links: > What will you learn?> Table of content> Product reviews

Mapt Subscription
FREE
$29.99/m after trial
eBook
$22.40
RRP $31.99
Save 29%
Print + eBook
$39.99
RRP $39.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$22.40
$39.99
$29.99 p/m after trial
RRP $31.99
RRP $39.99
Subscription
eBook
Print + eBook
Start 14 Day Trial

Frequently bought together


Hands-On Data Science and Python Machine Learning Book Cover
Hands-On Data Science and Python Machine Learning
$ 31.99
$ 22.40
Statistics for Machine Learning Book Cover
Statistics for Machine Learning
$ 39.99
$ 28.00
Buy 2 for $35.00
Save $36.98
Add to Cart

Book Details

ISBN 139781787280748
Paperback420 pages

Book Description

Join Frank Kane, who worked on Amazon and IMDb’s machine learning algorithms, as he guides you on your first steps into the world of data science. Hands-On Data Science and Python Machine Learning gives you the tools that you need to understand and explore the core topics in the field, and the confidence and practice to build and analyze your own machine learning models. With the help of interesting and easy-to-follow practical examples, Frank Kane explains potentially complex topics such as Bayesian methods and K-means clustering in a way that anybody can understand them.

Based on Frank’s successful data science course, Hands-On Data Science and Python Machine Learning empowers you to conduct data analysis and perform efficient machine learning using Python. Let Frank help you unearth the value in your data using the various data mining and data analysis techniques available in Python, and to develop efficient predictive models to predict future results. You will also learn how to perform large-scale machine learning on Big Data using Apache Spark. The book covers preparing your data for analysis, training machine learning models, and visualizing the final data analysis.

Table of Contents

Chapter 1: Getting Started
Installing Enthought Canopy
Using and understanding IPython (Jupyter) Notebooks
Python basics - Part 1
Understanding Python code
Importing modules
Python basics - Part 2
Running Python scripts
Summary
Chapter 2: Statistics and Probability Refresher, and Python Practice
Types of data
Mean, median, and mode
Using mean, median, and mode in Python
Standard deviation and variance
Probability density function and probability mass function
Types of data distributions
Percentiles and moments
Summary
Chapter 3: Matplotlib and Advanced Probability Concepts
A crash course in Matplotlib
Covariance and correlation
Conditional probability
Bayes' theorem
Summary
Chapter 4: Predictive Models
Linear regression
Polynomial regression
Multivariate regression and predicting car prices
Multi-level models
Summary
Chapter 5: Machine Learning with Python
Machine learning and train/test
Using train/test to prevent overfitting of a polynomial regression
Bayesian methods - Concepts
Implementing a spam classifier with Naïve Bayes
K-Means clustering
Clustering people based on income and age
Measuring entropy
Decision trees - Concepts
Decision trees - Predicting hiring decisions using Python
Ensemble learning
Support vector machine overview
Using SVM to cluster people by using scikit-learn
Summary
Chapter 6: Recommender Systems
What are recommender systems?
Item-based collaborative filtering
How item-based collaborative filtering works?
Finding movie similarities
Improving the results of movie similarities
Making movie recommendations to people
Improving the recommendation results
Summary
Chapter 7: More Data Mining and Machine Learning Techniques
K-nearest neighbors - concepts
Using KNN to predict a rating for a movie
Dimensionality reduction and principal component analysis
A PCA example with the Iris dataset
Data warehousing overview
Reinforcement learning
Summary
Chapter 8: Dealing with Real-World Data
Bias/variance trade-off
K-fold cross-validation to avoid overfitting
Data cleaning and normalisation
Cleaning web log data
Normalizing numerical data
Detecting outliers
Summary
Chapter 9: Apache Spark - Machine Learning on Big Data
Installing Spark
Spark introduction
Spark and Resilient Distributed Datasets (RDD)
Introducing MLlib
Decision Trees in Spark with MLlib
K-Means Clustering in Spark
TF-IDF
Searching wikipedia with Spark MLlib
Using the Spark 2.0 DataFrame API for MLlib
Summary
Chapter 10: Testing and Experimental Design
A/B testing concepts
T-test and p-value
Measuring t-statistics and p-values using Python
Determining how long to run an experiment for
A/B test gotchas
Summary

What You Will Learn

  • Learn how to clean your data and ready it for analysis
  • Implement the popular clustering and regression methods in Python
  • Train efficient machine learning models using decision trees and random forests
  • Visualize the results of your analysis using Python’s Matplotlib library
  • Use Apache Spark’s MLlib package to perform machine learning on large datasets

Authors

Table of Contents

Chapter 1: Getting Started
Installing Enthought Canopy
Using and understanding IPython (Jupyter) Notebooks
Python basics - Part 1
Understanding Python code
Importing modules
Python basics - Part 2
Running Python scripts
Summary
Chapter 2: Statistics and Probability Refresher, and Python Practice
Types of data
Mean, median, and mode
Using mean, median, and mode in Python
Standard deviation and variance
Probability density function and probability mass function
Types of data distributions
Percentiles and moments
Summary
Chapter 3: Matplotlib and Advanced Probability Concepts
A crash course in Matplotlib
Covariance and correlation
Conditional probability
Bayes' theorem
Summary
Chapter 4: Predictive Models
Linear regression
Polynomial regression
Multivariate regression and predicting car prices
Multi-level models
Summary
Chapter 5: Machine Learning with Python
Machine learning and train/test
Using train/test to prevent overfitting of a polynomial regression
Bayesian methods - Concepts
Implementing a spam classifier with Naïve Bayes
K-Means clustering
Clustering people based on income and age
Measuring entropy
Decision trees - Concepts
Decision trees - Predicting hiring decisions using Python
Ensemble learning
Support vector machine overview
Using SVM to cluster people by using scikit-learn
Summary
Chapter 6: Recommender Systems
What are recommender systems?
Item-based collaborative filtering
How item-based collaborative filtering works?
Finding movie similarities
Improving the results of movie similarities
Making movie recommendations to people
Improving the recommendation results
Summary
Chapter 7: More Data Mining and Machine Learning Techniques
K-nearest neighbors - concepts
Using KNN to predict a rating for a movie
Dimensionality reduction and principal component analysis
A PCA example with the Iris dataset
Data warehousing overview
Reinforcement learning
Summary
Chapter 8: Dealing with Real-World Data
Bias/variance trade-off
K-fold cross-validation to avoid overfitting
Data cleaning and normalisation
Cleaning web log data
Normalizing numerical data
Detecting outliers
Summary
Chapter 9: Apache Spark - Machine Learning on Big Data
Installing Spark
Spark introduction
Spark and Resilient Distributed Datasets (RDD)
Introducing MLlib
Decision Trees in Spark with MLlib
K-Means Clustering in Spark
TF-IDF
Searching wikipedia with Spark MLlib
Using the Spark 2.0 DataFrame API for MLlib
Summary
Chapter 10: Testing and Experimental Design
A/B testing concepts
T-test and p-value
Measuring t-statistics and p-values using Python
Determining how long to run an experiment for
A/B test gotchas
Summary

Book Details

ISBN 139781787280748
Paperback420 pages
Read More
From 1 reviews

Read More Reviews

Recommended for You

Statistics for Machine Learning Book Cover
Statistics for Machine Learning
$ 39.99
$ 28.00
Python: End-to-end Data Analysis Book Cover
Python: End-to-end Data Analysis
$ 71.99
$ 50.40
Mastering Machine Learning with scikit-learn - Second Edition Book Cover
Mastering Machine Learning with scikit-learn - Second Edition
$ 35.99
$ 25.20
Data Science Algorithms in a Week Book Cover
Data Science Algorithms in a Week
$ 31.99
$ 22.40
Hands-On Deep Learning with TensorFlow Book Cover
Hands-On Deep Learning with TensorFlow
$ 27.99
$ 19.60
Frank Kane's Taming Big Data with Apache Spark and Python Book Cover
Frank Kane's Taming Big Data with Apache Spark and Python
$ 31.99
$ 22.40