Java Data Analysis

Get the most out of the popular Java libraries and tools to perform efficient data analysis
Preview in Mapt

Java Data Analysis

John R. Hubbard

Get the most out of the popular Java libraries and tools to perform efficient data analysis
Mapt Subscription
FREE
$29.99/m after trial
eBook
$28.00
RRP $39.99
Save 29%
Print + eBook
$49.99
RRP $49.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$28.00
$49.99
$29.99 p/m after trial
RRP $39.99
RRP $49.99
Subscription
eBook
Print + eBook
Start 14 Day Trial

Frequently bought together


Java Data Analysis Book Cover
Java Data Analysis
$ 39.99
$ 28.00
Machine Learning: End-to-End guide for Java developers Book Cover
Machine Learning: End-to-End guide for Java developers
$ 75.99
$ 53.20
Buy 2 for $35.00
Save $80.98
Add to Cart

Book Details

ISBN 139781787285651
Paperback412 pages

Book Description

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the aim of discovering useful information. Java is one of the most popular languages to perform your data analysis tasks.

This book will help you learn the tools and techniques in Java to conduct data analysis without any hassle. After getting a quick overview of what data science is and the steps involved in the process, you’ll learn the statistical data analysis techniques and implement them using the popular Java APIs and libraries. Through practical examples, you will also learn the machine learning concepts such as classification and regression.

In the process, you’ll familiarize yourself with tools such as Rapidminer and WEKA and see how these Java-based tools can be used effectively for analysis. You will also learn how to analyze text and other types of multimedia. Learn to work with relational, NoSQL, and time-series data. This book will also show you how you can utilize different Java-based libraries to create insightful and easy to understand plots and graphs.

By the end of this book, you will have a solid understanding of the various data analysis techniques, and how to implement them using Java.

Table of Contents

Chapter 1: Introduction to Data Analysis
Origins of data analysis
The scientific method
Actuarial science
Calculated by steam
A spectacular example
Herman Hollerith
ENIAC
VisiCalc
Data, information, and knowledge
Why Java?
Java Integrated Development Environments
Summary
Chapter 2: Data Preprocessing
Data types
Variables
Data points and datasets
Relational database tables
Hash tables
File formats
Generating test datasets
Summary
Chapter 3: Data Visualization
Tables and graphs
Time series
Java implementation
Moving average
Data ranking
Frequency distributions
The normal distribution
The exponential distribution
Java example
Summary
Chapter 4: Statistics
Descriptive statistics
Random sampling
Random variables
Probability distributions
Cumulative distributions
The binomial distribution
Multivariate distributions
Conditional probability
The independence of probabilistic events
Contingency tables
Bayes' theorem
Covariance and correlation
The standard normal distribution
The central limit theorem
Confidence intervals
Hypothesis testing
Summary
Chapter 5: Relational Databases
The relation data model
Relational databases
Foreign keys
Relational database design
Summary
Chapter 6: Regression Analysis
Linear regression
Polynomial regression
Summary
Chapter 7: Classification Analysis
Decision trees
Bayesian classifiers
Logistic regression
Summary
Chapter 8: Cluster Analysis
Measuring distances
The curse of dimensionality
Hierarchical clustering
Summary
Chapter 9: Recommender Systems
Utility matrices
Similarity measures
Cosine similarity
A simple recommender system
Amazon's item-to-item collaborative filtering recommender
Implementing user ratings
Large sparse matrices
Using random access files
The Netflix prize
Summary
Chapter 10: NoSQL Databases
The Map data structure
SQL versus NoSQL
The Mongo database system
The Library database
Java development with MongoDB
The MongoDB extension for geospatial databases
Indexing in MongoDB
Why NoSQL and why MongoDB?
Other NoSQL database systems
Summary
Chapter 11: Big Data Analysis with Java
Scaling, data striping, and sharding
Google's PageRank algorithm
Google's MapReduce framework
Some examples of MapReduce applications
The WordCount example
Scalability
Matrix multiplication with MapReduce
MapReduce in MongoDB
Apache Hadoop
Hadoop MapReduce
Summary

What You Will Learn

  • Develop Java programs that analyze data sets of nearly any size, including text
  • Implement important machine learning algorithms such as regression, classification, and clustering
  • Interface with and apply standard open source Java libraries and APIs to analyze and visualize data
  • Process data from both relational and non-relational databases and from time-series data
  • Employ Java tools to visualize data in various forms
  • Understand multimedia data analysis algorithms and implement them in Java.

Authors

Table of Contents

Chapter 1: Introduction to Data Analysis
Origins of data analysis
The scientific method
Actuarial science
Calculated by steam
A spectacular example
Herman Hollerith
ENIAC
VisiCalc
Data, information, and knowledge
Why Java?
Java Integrated Development Environments
Summary
Chapter 2: Data Preprocessing
Data types
Variables
Data points and datasets
Relational database tables
Hash tables
File formats
Generating test datasets
Summary
Chapter 3: Data Visualization
Tables and graphs
Time series
Java implementation
Moving average
Data ranking
Frequency distributions
The normal distribution
The exponential distribution
Java example
Summary
Chapter 4: Statistics
Descriptive statistics
Random sampling
Random variables
Probability distributions
Cumulative distributions
The binomial distribution
Multivariate distributions
Conditional probability
The independence of probabilistic events
Contingency tables
Bayes' theorem
Covariance and correlation
The standard normal distribution
The central limit theorem
Confidence intervals
Hypothesis testing
Summary
Chapter 5: Relational Databases
The relation data model
Relational databases
Foreign keys
Relational database design
Summary
Chapter 6: Regression Analysis
Linear regression
Polynomial regression
Summary
Chapter 7: Classification Analysis
Decision trees
Bayesian classifiers
Logistic regression
Summary
Chapter 8: Cluster Analysis
Measuring distances
The curse of dimensionality
Hierarchical clustering
Summary
Chapter 9: Recommender Systems
Utility matrices
Similarity measures
Cosine similarity
A simple recommender system
Amazon's item-to-item collaborative filtering recommender
Implementing user ratings
Large sparse matrices
Using random access files
The Netflix prize
Summary
Chapter 10: NoSQL Databases
The Map data structure
SQL versus NoSQL
The Mongo database system
The Library database
Java development with MongoDB
The MongoDB extension for geospatial databases
Indexing in MongoDB
Why NoSQL and why MongoDB?
Other NoSQL database systems
Summary
Chapter 11: Big Data Analysis with Java
Scaling, data striping, and sharding
Google's PageRank algorithm
Google's MapReduce framework
Some examples of MapReduce applications
The WordCount example
Scalability
Matrix multiplication with MapReduce
MapReduce in MongoDB
Apache Hadoop
Hadoop MapReduce
Summary

Book Details

ISBN 139781787285651
Paperback412 pages
Read More

Read More Reviews

Recommended for You

Machine Learning: End-to-End guide for Java developers Book Cover
Machine Learning: End-to-End guide for Java developers
$ 75.99
$ 53.20
Big Data Analytics with Java Book Cover
Big Data Analytics with Java
$ 39.99
$ 28.00
Java: Data Science Made Easy Book Cover
Java: Data Science Made Easy
$ 67.99
$ 47.60
Neural Networks with R Book Cover
Neural Networks with R
$ 31.99
$ 22.40
Statistics for Data Science Book Cover
Statistics for Data Science
$ 31.99
$ 22.40
R Data Analysis Projects Book Cover
R Data Analysis Projects
$ 39.99
$ 28.00