Search icon
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Java for Data Science

You're reading from  Java for Data Science

Product type Book
Published in Jan 2017
Publisher Packt
ISBN-13 9781785280115
Pages 386 pages
Edition 1st Edition
Languages
Authors (2):
Richard M. Reese Richard M. Reese
Profile icon Richard M. Reese
Jennifer L. Reese Jennifer L. Reese
Profile icon Jennifer L. Reese
View More author details

Table of Contents (19) Chapters

Java for Data Science
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
1. Getting Started with Data Science 2. Data Acquisition 3. Data Cleaning 4. Data Visualization 5. Statistical Data Analysis Techniques 6. Machine Learning 7. Neural Networks 8. Deep Learning 9. Text Analysis 10. Visual and Audio Analysis 11. Mathematical and Parallel Techniques for Data Analysis 12. Bringing It All Together

The use of statistical methods in data science


Statistical analysis is the key to many data science tasks. It is used for many types of analysis ranging from the computation of simple mean and medium to complex multiple regression analysis. Chapter 5, Statistical Data Analysis Techniques, introduces this type of analysis and the Java support available.

Statistical analysis is not always an easy task. In addition, advanced statistical techniques often require a particular mindset to fully comprehend, which can be difficult to learn. Fortunately, many techniques are not that difficult to use and various libraries mitigate some of these techniques' inherent complexity.

Regression analysis, in particular, is an important technique for analyzing data. The technique attempts to draw a line that matches a set of data. An equation representing the line is calculated and can be used to predict future behavior. There are several types of regression analysis, including simple and multiple regression. They vary by the number of variables being considered.

The following graph shows the straight line that closely matches a set of data points representing the population of Belgium over several decades:

Simple statistical techniques, such as mean and standard deviation, can be computed using basic Java. They can also be handled by libraries such as Apache Commons. For example, to calculate the median, we can use the Apache Commons DescriptiveStatistics class. This is illustrated next where the median of an array of doubles is calculated. The numbers are added to an instance of this class, as shown here:

double[] testData = {12.5, 18.3, 11.2, 19.0, 22.1, 14.3, 16.2, 
    12.5, 17.8, 16.5, 12.5}; 
DescriptiveStatistics statTest =  
    new SynchronizedDescriptiveStatistics(); 
for(double num : testData){ 
   statTest.addValue(num); 
} 

The getPercentile method returns the value stored at the percentile specified in its argument. To find the median, we use the value of 50.

out.println("The median is " + statTest.getPercentile(50)); 
 

Our output is as follows:

The median is 16.2

In Chapter 5, Statistical Data Analysis Techniques, we will demonstrate how to perform regression analysis using the Apache Commons SimpleRegression class.

You have been reading a chapter from
Java for Data Science
Published in: Jan 2017 Publisher: Packt ISBN-13: 9781785280115
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}