Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Building Machine Learning Systems with Python

You're reading from  Building Machine Learning Systems with Python

Product type Book
Published in Jul 2013
Publisher Packt
ISBN-13 9781782161400
Pages 290 pages
Edition 1st Edition
Languages

Table of Contents (20) Chapters

Building Machine Learning Systems with Python
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
1. Getting Started with Python Machine Learning 2. Learning How to Classify with Real-world Examples 3. Clustering – Finding Related Posts 4. Topic Modeling 5. Classification – Detecting Poor Answers 6. Classification II – Sentiment Analysis 7. Regression – Recommendations 8. Regression – Recommendations Improved 9. Classification III – Music Genre Classification 10. Computer Vision – Pattern Recognition 11. Dimensionality Reduction 12. Big(ger) Data Where to Learn More about Machine Learning Index

Books


This book focused on the practical side of machine learning. We did not present the thinking behind the algorithms or the theory that justifies them. If you are interested in that aspect of machine learning, then we recommend Pattern Recognition and Machine Learning, C. Bishop , Springer Apply Italics to this. This is a classical introductory text in the field. It will teach you the nitty-gritties of most of the algorithms we used in this book.

If you want to move beyond an introduction and learn all the gory mathematical details, then Machine Learning: A Probabilistic Perspective, K. Murphy, The MIT Press, is an excellent option. It is very recent (published in 2012), and contains the cutting edge of ML research. This 1,100 page book can also serve as a reference, as very little of machine learning has been left out.

Q&A sites

The following are the two Q&A websites of machine learning:

  • MetaOptimize (http://metaoptimize.com/qa) is a machine learning Q&A website where many very knowledgeable researchers and practitioners interact

  • Cross Validated (http://stats.stackexchange.com) is a general statistics Q&A site, which often features machine learning questions as well

As mentioned in the beginning of the book, if you have questions specific to particular parts of the book, feel free to ask them at TwoToReal (http://www.twotoreal.com). We try to be as quick as possible to jump in and help as best as we can.

Blogs

The following is an obviously non-exhaustive list of blogs that are interesting to someone working on machine learning:

  • Machine Learning Theory at http://hunch.net

    • This is a blog by John Langford, the brain behind Vowpal Wabbit (http://hunch.net/~vw/), but guest posts also appear.

    • The average pace is approximately one post per month. The posts are more theoretical. They also offer additional value in brain teasers.

  • Text and data mining by practical means at http://textanddatamining.blogspot.de

    • The average pace is one per month, which is very practical and has always surprising approaches

  • http://blog.echen.me

    • The average pace is one per month, providing more applied topics

  • Machined Learnings at http://www.machinedlearnings.com

    • The average pace is one per month, providing more applied topics; often revolving around learning big data

  • FlowingData at http://flowingdata.com

    • The average pace is one per day, with the posts revolving more around statistics

  • Normal deviate at http://normaldeviate.wordpress.com

    • The average pace is one per month, covering theoretical discussions of practical problems. Although being more of a statistics blog, the posts often intersect with machine learning.

  • Simply statistics at http://simplystatistics.org

    • There are several posts per month, focusing on statistics and big data

  • Statistical Modeling, Causal Inference, and Social Science at http://andrewgelman.com

    • There is one post per day with often funny reads when the author points out flaws in popular media using statistics

Data sources

If you want to play around with algorithms, you can obtain many datasets from the Machine Learning Repository at University of California at Irvine (UCI). You can find it at http://archive.ics.uci.edu/ml.

Getting competitive

An excellent way to learn more about machine learning is by trying out a competition! Kaggle (http://www.kaggle.com) is a marketplace of ML competitions and has already been mentioned in the introduction. On the website, you will find several different competitions with different structures and often cash prizes.

The supervised learning competitions almost always follow the following format:

  • You (and every other competitor) are given access to labeled training data and testing data (without labels).

  • Your task is to submit predictions for the testing data.

  • When the competition closes, whoever has the best accuracy wins. The prizes range from glory to cash.

Of course, winning something is nice, but you can gain a lot of useful experience just by participating. So, you have to stay tuned, especially after the competition is over and participants start sharing their approaches in the forum. Most of the time, winning is not about developing a new algorithm; it is about cleverly preprocessing, normalizing, and combining the existing methods.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}