Introduction to Scikit-Learn

Introduction

Since its release in 2007, scikit-learn has become one of the most popular open source machine learning libraries for Python. scikit-learn provides algorithms for machine learning tasks including classification, regression, dimensionality reduction, and clustering. It also provides modules for extracting features, processing data, and evaluating models.

(For more resources related to this topic, see here.)

Conceived as an extension to the SciPy library, scikit-learn is built on the popular Python libraries NumPy and matplotlib. NumPy extends Python to support efficient operations on large arrays and multidimensional matrices. matplotlib provides visualization tools, and SciPy provides modules for scientific computing.

scikit-learn is popular for academic research because it has a well-documented, easy-to-use, and versatile API. Developers can use scikit-learn to experiment with different algorithms by changing only a few lines of the code. scikit-learn wraps some popular implementations of machine learning algorithms, such as LIBSVM and LIBLINEAR. Other Python libraries, including NLTK, include wrappers for scikit-learn. scikit-learn also includes a variety of datasets, allowing developers to focus on algorithms rather than obtaining and cleaning data.

Licensed under the permissive BSD license, scikit-learn can be used in commercial applications without restrictions. Many of scikit-learn's algorithms are fast and scalable to all but massive datasets. Finally, scikit-learn is noted for its reliability; much of the library is covered by automated tests.

Installing scikit-learn

This book is written for version 0.15.1 of scikit-learn; use this version to ensure that the examples run correctly. If you have previously installed scikit-learn, you can retrieve the version number with the following code:

>>> import sklearn
>>> sklearn.__version__
'0.15.1'

If you have not previously installed scikit-learn, you can install it from a package manager or build it from the source. We will review the installation processes for Linux, OS X, and Windows in the following sections, but refer to http://scikit-learn.org/stable/install.html for the latest instructions. The following instructions only assume that you have installed Python 2.6, Python 2.7, or Python 3.2 or newer. Go to http://www.python.org/download/ for instructions on how to install Python.

Installing scikit-learn on Windows

scikit-learn requires Setuptools, a third-party package that supports packaging and installing software for Python. Setuptools can be installed on Windows by running the bootstrap script at https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py.

Windows binaries for the 32- and 64-bit versions of scikit-learn are also available. If you cannot determine which version you need, install the 32-bit version. Both versions depend on NumPy 1.3 or newer. The 32-bit version of NumPy can be downloaded from http://sourceforge.net/projects/numpy/files/NumPy/. The 64-bit version can be downloaded from http://www.lfd.uci.edu/~gohlke/pythonlibs/#scikit-learn.

A Windows installer for the 32-bit version of scikit-learn can be downloaded from http://sourceforge.net/projects/scikit-learn/files/. An installer for the 64-bit version of scikit-learn can be downloaded from http://www.lfd.uci.edu/~gohlke/pythonlibs/#scikit-learn.

scikit-learn can also be built from the source code on Windows. Building requires a C/C++ compiler such as MinGW (http://www.mingw.org/), NumPy, SciPy, and Setuptools.

To build, clone the Git repository from https://github.com/scikit-learn/scikit-learn and execute the following command:

python setup.py install

Installing scikit-learn on Linux

There are several options to install scikit-learn on Linux, depending on your distribution. The preferred option to install scikit-learn on Linux is to use pip. You may also install it using a package manager, or build scikit-learn from its source.

To install scikit-learn using pip, execute the following command:

sudo pip install scikit-learn

To build scikit-learn, clone the Git repository from https://github.com/scikit-learn/scikit-learn. Then install the following dependencies:

sudo apt-get install python-dev python-numpy python-numpy-dev python-setuptools python-numpy-dev python-scipy libatlas-dev g++

Navigate to the repository's directory and execute the following command:

python setup.py install

Installing scikit-learn on OS X

scikit-learn can be installed on OS X using Macports:

sudo port install py26-sklearn

If Python 2.7 is installed, run the following command:

sudo port install py27-sklearn

scikit-learn can also be installed using pip with the following command:

pip install scikit-learn

Verifying the installation

To verify that scikit-learn has been installed correctly, open a Python console and execute the following:

>>> import sklearn
>>> sklearn.__version__
'0.15.1'

To run scikit-learn's unit tests, first install the nose library. Then execute the following:

nosetest sklearn –exe

Congratulations! You've successfully installed scikit-learn.

Summary

In this article, we had a brief introduction of Scikit. We also covered the installation of Scikit on various operating system Windows, Linux, OS X.

You can also refer the following books on the similar topics:

Resources for Article:


Further resources on this subject:


You've been reading an excerpt of:

Mastering Machine Learning with scikit-learn

Explore Title
comments powered by Disqus