Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Regression Analysis with Python

You're reading from  Regression Analysis with Python

Product type Book
Published in Feb 2016
Publisher
ISBN-13 9781785286315
Pages 312 pages
Edition 1st Edition
Languages
Concepts
Authors (2):
Luca Massaron Luca Massaron
Profile icon Luca Massaron
Alberto Boschetti Alberto Boschetti
Profile icon Alberto Boschetti
View More author details

Table of Contents (16) Chapters

Regression Analysis with Python
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
1. Regression – The Workhorse of Data Science 2. Approaching Simple Linear Regression 3. Multiple Regression in Action 4. Logistic Regression 5. Data Preparation 6. Achieving Generalization 7. Online and Batch Learning 8. Advanced Regression Methods 9. Real-world Applications for Regression Models Index

Online mini-batch learning


From the previous section, we've learned an interesting lesson: for big data, always use SGD-based learners because they are faster, and they do scale.

Now, in this section, let's consider this regression dataset:

  • Massive number of observations: 2M

  • Large number of features: 100

  • Noisy dataset

The X_train matrix is composed of 200 million elements, and may not completely fit in memory (on a machine with 4 GB RAM); the testing set is composed of 10,000 observations.

Let's first create the datasets, and print the memory footprint of the biggest one:

In:
# Let's generate a 1M dataset
X_train, X_test, y_train, y_test = generate_dataset(2000000, 10000, 100, 10.0)
print("Size of X_train is [GB]:", X_train.size * X_train[0,0].itemsize/1E9)

Out:
Size of X_train is [GB]: 1.6

The X_train matrix is itself 1.6 GB of data; we can consider it as a starting point for big data. Let's now try to classify it using the best model we got from the previous section, SGDRegressor(). To access...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}