Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Java for Data Science

You're reading from  Java for Data Science

Product type Book
Published in Jan 2017
Publisher Packt
ISBN-13 9781785280115
Pages 386 pages
Edition 1st Edition
Languages
Authors (2):
Richard M. Reese Richard M. Reese
Profile icon Richard M. Reese
Jennifer L. Reese Jennifer L. Reese
Profile icon Jennifer L. Reese
View More author details

Table of Contents (19) Chapters

Java for Data Science
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
Getting Started with Data Science Data Acquisition Data Cleaning Data Visualization Statistical Data Analysis Techniques Machine Learning Neural Networks Deep Learning Text Analysis Visual and Audio Analysis Mathematical and Parallel Techniques for Data Analysis Bringing It All Together

Machine learning applied to data science


Machine learning has become increasingly important for data science analysis as it has been for a multitude of other fields. A defining characteristic of machine learning is the ability of a model to be trained on a set of representative data and then later used to solve similar problems. There is no need to explicitly program an application to solve the problem. A model is a representation of the real-world object.

For example, customer purchases can be used to train a model. Subsequently, predictions can be made about the types of purchases a customer might subsequently make. This allows an organization to tailor ads and coupons for a customer and potentially providing a better customer experience.

Training can be performed in one of several different approaches:

  • Supervised learning: The model is trained with annotated, labeled, data showing corresponding correct results

  • Unsupervised learning: The data does not contain results, but the model is expected to find relationships on its own

  • Semi-supervised: A small amount of labeled data is combined with a larger amount of unlabeled data

  • Reinforcement learning: This is similar to supervised learning, but a reward is provided for good results

There are several approaches that support machine learning. In Chapter 6, Machine Learning, we will illustrate three techniques:

  • Decision trees: A tree is constructed using features of the problem as internal nodes and the results as leaves

  • Support vector machines: This is used for classification by creating a hyperplane that partitions the dataset and then makes predictions

  • Bayesian networks: This is used to depict probabilistic relationships between events

A Support Vector Machine (SVM) is used primarily for classification type problems. The approach creates a hyperplane to categorize data, which can be envisioned as a geometric plane that separates two regions. In a two-dimensional space, it will be a line. In a three-dimensional space, it will be a two-dimensional plane. In Chapter 6, Machine Learning, we will demonstrate how to use the approach using a set of data relating to the propensity of individuals to camp. We will use the Weka class, SMO, to demonstrate this type of analysis.

The following figure depicts a hyperplane using a distribution of two types of data points. The lines represent possible hyperplanes that separate these points. The lines clearly separate the data points except for one outlier.

Once the model has been trained, the possible hyperplanes are considered and predictions can then be made using similar data.

You have been reading a chapter from
Java for Data Science
Published in: Jan 2017 Publisher: Packt ISBN-13: 9781785280115
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}