Reader small image

You're reading from  Regression Analysis with Python

Product typeBook
Published inFeb 2016
Reading LevelIntermediate
Publisher
ISBN-139781785286315
Edition1st Edition
Languages
Concepts
Right arrow
Authors (2):
Luca Massaron
Luca Massaron
author image
Luca Massaron

Having joined Kaggle over 10 years ago, Luca Massaron is a Kaggle Grandmaster in discussions and a Kaggle Master in competitions and notebooks. In Kaggle competitions he reached no. 7 in the worldwide rankings. On the professional side, Luca is a data scientist with more than a decade of experience in transforming data into smarter artifacts, solving real-world problems, and generating value for businesses and stakeholders. He is a Google Developer Expert(GDE) in machine learning and the author of best-selling books on AI, machine learning, and algorithms.
Read more about Luca Massaron

Alberto Boschetti
Alberto Boschetti
author image
Alberto Boschetti

Alberto Boschetti is a data scientist with expertise in signal processing and statistics. He holds a Ph.D. in telecommunication engineering and currently lives and works in London. In his work projects, he faces challenges ranging from natural language processing (NLP) and behavioral analysis to machine learning and distributed processing. He is very passionate about his job and always tries to stay updated about the latest developments in data science technologies, attending meet-ups, conferences, and other events.
Read more about Alberto Boschetti

View More author details
Right arrow

Python packages and functions for linear models


Linear models diffuse in many different scientific and business applications and can be found, under different functions, in quite a number of different Python packages. We have selected a few for use in this book. Among them, Statsmodels is our choice for illustrating the statistical properties of models, and Scikit-learn is instead the package we recommend for easily and seamlessly preparing data, building models, and deploying them. We will present models built with Statsmodels exclusively to illustrate the statistical properties of the linear models, resorting to Scikit-learn to demonstrate how to approach modeling from a data science point of view.

NumPy

NumPy, which is Travis Oliphant's creation, is at the core of every analytical solution in the Python language. It provides the user with multidimensional arrays, along with a large set of functions to operate multiple mathematical operations on these arrays. Arrays are blocks of data arranged along multiple dimensions and that implement mathematical vectors and matrices. Arrays are useful not just for storing data, but also for fast matrix operations (vectorization), which are indispensable when you wish to solve ad hoc data science problems.

In the book, we are primarily going to use the module linalg from NumPy; being a collection of linear algebra functions, it will provide help in explaining the nuts and bolts of the algorithm:

  • Website: http://www.numpy.org/

  • Import conventions: import numpy as np

  • Version at the time of print: 1.9.2

  • Suggested install command: pip install numpy

Tip

As a convention largely adopted by the Python community, when importing NumPy, it is suggested that you alias it as np:

import numpy as np

There are importing conventions also for other Python features that we will be using in the code presented in this book.

SciPy

An original project by Travis Oliphant, Pearu Peterson, and Eric Jones, SciPy completes NumPy's functionalities, offering a larger variety of scientific algorithms for linear algebra, sparse matrices, signal and image processing, optimization, fast Fourier transformation, and much more.

The scipy.optimize package provides several commonly used optimization algorithms, used to detail how a linear model can be estimated using different optimization approaches:

  • Website: http://www.scipy.org/

  • Import conventions: import scipy as sp

  • Version at time of print: 0.16.0

  • Suggested install command: pip install scipy

Statsmodels

Previously part of Scikit, Statsmodels has been thought to be a complement to SciPy statistical functions. It features generalized linear models, discrete choice models, time series analysis, and a series of descriptive statistics as well as parametric and nonparametric tests.

In Statsmodels, we will use the statsmodels.api and statsmodels.formula.api modules, which provide functions for fitting linear models by providing both input matrices and formula's specifications:

  • Website: http:/statsmodels.sourceforge.net/

  • Import conventions: import statsmodels.api as sm and import statsmodels.formula.api as smf

  • Version at the time of print: 0.6.1

  • Suggested install command: pip install statsmodels

Scikit-learn

Started as part of the SciPy Toolkits (SciKits), Scikit-learn is the core of data science operations on Python. It offers all that you may need in terms of data preprocessing, supervised and unsupervised learning, model selection, validation, and error metrics. Expect us to talk at length about this package throughout the book.

Scikit-learn started in 2007 as a Google Summer of Code project by David Cournapeau. Since 2013, it has been taken over by the researchers at INRA (French Institute for Research in Computer Science and Automation).

Scikit-learn offers modules for data processing (sklearn.preprocessing, sklearn.feature_extraction), model selection, and validation (sklearn.cross_validation, sklearn.grid_search, and sklearn.metrics) and a complete set of methods (sklearn.linear_model) in which the target value, being both a number or a probability, is expected to be a linear combination of the input variables:

  • Website: http://scikit-learn.org/stable/

  • Import conventions: None; modules are usually imported separately

  • Version at the time of print: 0.16.1

  • Suggested install command: pip install scikit-learn

Tip

Note that the imported module is named sklearn.

Previous PageNext Page
You have been reading a chapter from
Regression Analysis with Python
Published in: Feb 2016Publisher: ISBN-13: 9781785286315
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (2)

author image
Luca Massaron

Having joined Kaggle over 10 years ago, Luca Massaron is a Kaggle Grandmaster in discussions and a Kaggle Master in competitions and notebooks. In Kaggle competitions he reached no. 7 in the worldwide rankings. On the professional side, Luca is a data scientist with more than a decade of experience in transforming data into smarter artifacts, solving real-world problems, and generating value for businesses and stakeholders. He is a Google Developer Expert(GDE) in machine learning and the author of best-selling books on AI, machine learning, and algorithms.
Read more about Luca Massaron

author image
Alberto Boschetti

Alberto Boschetti is a data scientist with expertise in signal processing and statistics. He holds a Ph.D. in telecommunication engineering and currently lives and works in London. In his work projects, he faces challenges ranging from natural language processing (NLP) and behavioral analysis to machine learning and distributed processing. He is very passionate about his job and always tries to stay updated about the latest developments in data science technologies, attending meet-ups, conferences, and other events.
Read more about Alberto Boschetti