Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Learning Data Mining with Python, - Second Edition

You're reading from  Learning Data Mining with Python, - Second Edition

Product type Book
Published in Apr 2017
Publisher Packt
ISBN-13 9781787126787
Pages 358 pages
Edition 2nd Edition
Languages
Concepts

Table of Contents (20) Chapters

Title Page
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface
Getting Started with Data Mining Classifying with scikit-learn Estimators Predicting Sports Winners with Decision Trees Recommending Movies Using Affinity Analysis Features and scikit-learn Transformers Social Media Insight using Naive Bayes Follow Recommendations Using Graph Mining Beating CAPTCHAs with Neural Networks Authorship Attribution Clustering News Articles Object Detection in Images using Deep Neural Networks Working with Big Data Next Steps...

Summary


In this chapter, we introduced data mining using Python. If you could run the code in this section (note that the full code is available in the supplied code package), then your computer is set up for much of the rest of the book. Other Python libraries will be introduced in later chapters to perform more specialized tasks.

We used the Jupyter Notebook to run our code, which allows us to immediately view the results of a small section of the code. Jupyter Notebook is a useful tool that will be used throughout the book.

We introduced a simple affinity analysis, finding products that are purchased together. This type of exploratory analysis gives an insight into a business process, an environment, or a scenario. The information from these types of analysis can assist in business processes, find the next big medical breakthrough, or create the next artificial intelligence.

Also, in this chapter, there was a simple classification example using the OneR algorithm. This simple algorithm simply finds the best feature and predicts the class that most frequently had this value in the training dataset.

To expand on the outcomes of this chapter, think about how you would implement a variant of OneR that can take multiple feature/value pairs into consideration. Take a shot at implementing your new algorithm and evaluating it. Remember to test your algorithm on a separate dataset to the training data. Otherwise, you run the risk of over fitting your data.

Over the next few chapters, we will expand on the concepts of classification and affinity analysis. We will also introduce classifiers in the scikit-learn package and use them to do our machine learning, rather than writing the algorithms ourselves.

You have been reading a chapter from
Learning Data Mining with Python, - Second Edition
Published in: Apr 2017 Publisher: Packt ISBN-13: 9781787126787
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}