Learning scikit-learn: Machine Learning in Python


Learning scikit-learn: Machine Learning in Python
eBook: $17.99
Formats: PDF, PacktLib, ePub and Mobi formats
$15.29
save 15%!
Print + free eBook + free PacktLib access to the book: $47.98    Print cover: $29.99
$29.99
save 37%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Overview
Table of Contents
Author
Reviews
Support
Sample Chapters
  • Use Python and scikit-learn to create intelligent applications
  • Apply regression techniques to predict future behaviour and learn to cluster items in groups by their similarities
  • Make use of classification techniques to perform image recognition and document classification

Book Details

Language : English
Paperback : 118 pages [ 235mm x 191mm ]
Release Date : November 2013
ISBN : 1783281936
ISBN 13 : 9781783281930
Author(s) : Raúl Garreta, Guillermo Moncecchi
Topics and Technologies : All Books, Application Development, Open Source

Table of Contents

Preface
Chapter 1: Machine Learning – A Gentle Introduction
Chapter 2: Supervised Learning
Chapter 3: Unsupervised Learning
Chapter 4: Advanced Features
Index
  • Chapter 1: Machine Learning – A Gentle Introduction
    • Installing scikit-learn
      • Linux
      • Mac
      • Windows
      • Checking your installation
    • Our first machine learning method – linear classification
    • Evaluating our results
    • Machine learning categories
    • Important concepts related to machine learning
    • Summary
    • Chapter 2: Supervised Learning
      • Image recognition with Support Vector Machines
        • Training a Support Vector Machine
      • Text classification with Naïve Bayes
        • Preprocessing the data
        • Training a Naïve Bayes classifier
        • Evaluating the performance
      • Explaining Titanic hypothesis with decision trees
        • Preprocessing the data
        • Training a decision tree classifier
        • Interpreting the decision tree
        • Random Forests – randomizing decisions
        • Evaluating the performance
      • Predicting house prices with regression
        • First try – a linear model
        • Second try – Support Vector Machines for regression
        • Third try – Random Forests revisited
        • Evaluation
      • Summary

          Raúl Garreta

          Raúl Garreta is a Computer Engineer with much experience in the theory and application of Artificial Intelligence (AI), where he specialized in Machine Learning and Natural Language Processing (NLP). He has an entrepreneur profile with much interest in the application of science, technology, and innovation to the Internet industry and startups. He has worked in many software companies, handling everything from video games to implantable medical devices. In 2009, he co-founded Tryolabs with the objective to apply AI to the development of intelligent software products, where he performs as the CTO and Product Manager of the company. Besides the application of Machine Learning and NLP, Tryolabs' expertise lies in the Python programming language and has been catering to many clients in Silicon Valley. Raul has also worked in the development of the Python community in Uruguay, co-organizing local PyDay and PyCon conferences. He is also an assistant professor at the Computer Science Institute of Universidad de la República in Uruguay since 2007, where he has been working on the courses of Machine Learning, NLP, as well as Automata Theory and Formal Languages. Besides this, he is finishing his Masters degree in Machine Learning and NLP. He is also very interested in the research and application of Robotics, Quantum Computing, and Cognitive Modeling. Not only is he a technology enthusiast and science fiction lover (geek) but also a big fan of arts, such as cinema, photography, and painting.

          Guillermo Moncecchi

          Guillermo Moncecchi is a Natural Language Processing researcher at the Universidad de la República of Uruguay. He received a PhD in Informatics from the Universidad de la República, Uruguay and a Ph.D in Language Sciences from the Université Paris Ouest, France. He has participated in several international projects on NLP. He has almost 15 years of teaching experience on Automata Theory, Natural Language Processing, and Machine Learning. He also works as Head Developer at the Montevideo Council and has lead the development of several public services for the council, particularly in the Geographical Information Systems area. He is one of the Montevideo Open Data movement leaders, promoting the publication and exploitation of the city's data.

          Code Downloads

          Download the code and support files for this book.


          Submit Errata

          Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.


          Errata

          - 14 submitted: last submission 17 Jul 2014

          Errata type: Code | Page no. 13

           

          The code snippet    from sklearn.linear_modelsklearn._model import SGDClassifier

          should be              from sklearn.linear_model import SGDClassifier

          Errata type: Code | Page no. 36

          >>> from sklearn.feature_extraction.text import TfidfVectorizer, >>> 

          HashingVectorizer, CountVectorizer 

           

          should be

           

          >>> from sklearn.feature_extraction.text import TfidfVectorizer, 

          HashingVectorizer, CountVectorizer 

          Errata type: Typo | Page no. 92

          In the line   "It is worth mentioning that at this point a second pass could be performed in the range of 10-2 and 10-1 with a finer grid to find an ever better alpha value."

          -2 and -1 are super scripted. 

          Errata Type: Code | Page no. : 7

          It is:

          sudo apt-get install build-essential python-dev python-numpy

          python-setuptools python-scipy libatlas-dev

          It should be:

          sudo apt-get install build-essential python-dev python-numpy

          python-setuptools python-scipy libatlas-dev python-pip

          Errata Type: Technical | Page no. : 75

          It is:

          Now, why should we expect that distribution and not another? Actually,
          not every phenomenon  has the same distribution, but a theorem called the Law of
          Large Numbers tells us that whenever we repeat an experiment a large number of
          times (for example, measuring somebody's height), the distribution of results can be
          approximated by a Gaussian.

          Should be:

          Now, why should we expect that distribution and not another? Actually, not every phenomenon  has the same distribution, but a theorem called the Central Limit Theorem tells us that whenever we repeat an experiment a large number of times (for example, measuring some people heights), the distribution of the average results can be approximated by a Gaussian.

          Errata Type: Typo | Page no. : 9

          It is:

           iris.target.target_names

          Should be:

          iris.target_names

          Errata Type: Code | Page no. : 19

          It is:

          clf = Pipeline([
                  ('scaler', StandardScaler()),
                  ('linear_model', SGDClassifier())
          ])

          Should be:

          clf = Pipeline([
                  ('scaler', preprocessing.StandardScaler()),
                  ('linear_model', SGDClassifier())
          ])

          Errata Type: Typo | Page no. : 37 and 90

          It is:

          # create a k-fold croos validation iterator of k=5 folds

          Should be:

          # create a k-fold cross validation iterator of k=5 folds

           

          Errata Type: Code | Page no. : 33

          It is:

          A targetarray, with values in the range of 0 to 2, corresponding to each 
          instance of Iris species (0: setosa, 1: versicolor, and 2: virginica), as you 
          can verify by printing the iris.target.target_names value.


          Should be:

          A targetarray, with values in the range of 0 to 2, corresponding to each 
          instance of Iris species (0: setosa, 1: versicolor, and 2: virginica), as you 
          can verify by printing the iris.target_names value.

          Errata Type: Code | Page no. : 33

          It is:

          eval_faces = [np.reshape(a, (64, 64)) for a in X_eval]
          Should be:
          eval_faces = [np.reshape(a, (64, 64)) for a in X_test]

          Errata Type: Code Page No: 38

          [a-z0->>>9_
          should be:

          [a-z0-9_

          Errata Type: Code Page No: 39

          stop_words=stop_words

          should be:

          stop_words=get_stop_words()

          Errata Type: Code Page No: 64

          In our case, we want to transform instances of 64 features to instances of just two features, so we will set n_components to 2.
          Should be:
          In our case, we want to transform instances of 64 features to instances of just ten features, so we will set n_components of 10.

          Errata Type: Graphics Page No: 48

          The diagram value is 622,322

          should be:

          662,322

          Sample chapters

          You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

          Frequently bought together

          Learning scikit-learn: Machine Learning in Python +    Mobile Web Development =
          50% Off
          the second eBook
          Price for both: $25.20

          Buy both these recommended eBooks together and get 50% off the cheapest eBook.

          What you will learn from this book

          • Set up scikit-learn inside your Python environment
          • Classify objects (from documents to human faces and flower species) based on some of their features, using a variety of methods from Support Vector Machines to Naïve Bayes
          • Use Decision Trees to explain the main causes of certain phenomenon such as the Titanic passengers’ survival
          • Predict house prices using regression techniques
          • Display and analyse groups in your data using dimensionality reduction
          • Make use of different tools to preprocess, extract, and select the learning features
          • Select the best parameters for your models using model selection
          • Improve the way you build your models using parallelization techniques

          In Detail

          Machine learning, the art of creating applications that learn from experience and data, has been around for many years. However, in the era of “big data”, huge amounts of information is being generated. This makes machine learning an unavoidable source of new data-based approximations for problem solving.

          With Learning scikit-learn: Machine Learning in Python, you will learn to incorporate machine learning in your applications. The book combines an introduction to some of the main concepts and methods in machine learning with practical, hands-on examples of real-world problems. Ranging from handwritten digit recognition to document classification, examples are solved step by step using Scikit-learn and Python.

          The book starts with a brief introduction to the core concepts of machine learning with a simple example. Then, using real-world applications and advanced features, it takes a deep dive into the various machine learning techniques.

          You will learn to evaluate your results and apply advanced techniques for preprocessing data. You will also be able to select the best set of features and the best methods for each problem.

          With Learning scikit-learn: Machine Learning in Python you will learn how to use the Python programming language and the scikit-learn library to build applications that learn from experience, applying the main concepts and techniques of machine learning.

          Approach

          The book adopts a tutorial-based approach to introduce the user to Scikit-learn.

          Who this book is for

          If you are a programmer who wants to explore machine learning and data-based methods to build intelligent applications and enhance your programming skills, this the book for you. No previous experience with machine-learning algorithms is required.

          Code Download and Errata
          Packt Anytime, Anywhere
          Register Books
          Print Upgrades
          eBook Downloads
          Video Support
          Contact Us
          Awards Voting Nominations Previous Winners
          Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
          Resources
          Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software