Java Data Science Cookbook

Recipes to help you overcome your data science hurdles using Java
Preview in Mapt

Java Data Science Cookbook

Rushdi Shams

Recipes to help you overcome your data science hurdles using Java
Mapt Subscription
FREE
$29.99/m after trial
eBook
$28.00
RRP $39.99
Save 29%
Print + eBook
$49.99
RRP $49.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$28.00
$49.99
$29.99 p/m after trial
RRP $39.99
RRP $49.99
Subscription
eBook
Print + eBook
Start 30 Day Trial

Frequently bought together


Java Data Science Cookbook Book Cover
Java Data Science Cookbook
$ 39.99
$ 28.00
Learning JavaScript Data Structures and Algorithms - Third Edition Book Cover
Learning JavaScript Data Structures and Algorithms - Third Edition
$ 35.99
$ 25.20
Buy 2 for $35.00
Save $40.98
Add to Cart

Book Details

ISBN 139781787122536
Paperback372 pages

Book Description

If you are looking to build data science models that are good for production, Java has come to the rescue. With the aid of strong libraries such as MLlib, Weka, DL4j, and more, you can efficiently perform all the data science tasks you need to.

This unique book provides modern recipes to solve your common and not-so-common data science-related problems. We start with recipes to help you obtain, clean, index, and search data. Then you will learn a variety of techniques to analyze, learn from, and retrieve information from data. You will also understand how to handle big data, learn deeply from data, and visualize data.

Finally, you will work through unique recipes that solve your problems while taking data science to production, writing distributed data science applications, and much more—things that will come in handy at work.

Table of Contents

Chapter 1: Obtaining and Cleaning Data
Introduction
Retrieving all filenames from hierarchical directories using Java
Retrieving all filenames from hierarchical directories using Apache Commons IO
Reading contents from text files all at once using Java 8
Reading contents from text files all at once using Apache Commons IO
Extracting PDF text using Apache Tika
Cleaning ASCII text files using Regular Expressions
Parsing Comma Separated Value (CSV) Files using Univocity
Parsing Tab Separated Value (TSV) file using Univocity
Parsing XML files using JDOM
Writing JSON files using JSON.simple
Reading JSON files using JSON.simple
Extracting web data from a URL using JSoup
Extracting web data from a website using Selenium Webdriver
Reading table data from a MySQL database
Chapter 2: Indexing and Searching Data
Introduction
Indexing data with Apache Lucene
Searching indexed data with Apache Lucene
Chapter 3: Analyzing Data Statistically
Introduction
Generating descriptive statistics
Generating summary statistics
Generating summary statistics from multiple distributions
Computing frequency distribution
Counting word frequency in a string
Counting word frequency in a string using Java 8
Computing simple regression
Computing ordinary least squares regression
Computing generalized least squares regression
Calculating covariance of two sets of data points
Calculating Pearson's correlation of two sets of data points
Conducting a paired t-test
Conducting a Chi-square test
Conducting the one-way ANOVA test
Conducting a Kolmogorov-Smirnov test
Chapter 4: Learning from Data - Part 1
Introduction
Creating and saving an Attribute-Relation File Format (ARFF) file
Cross-validating a machine learning model
Classifying unseen test data
Classifying unseen test data with a filtered classifier
Generating linear regression models
Generating logistic regression models
Clustering data points using the KMeans algorithm
Clustering data from classes
Learning association rules from data
Selecting features/attributes using the low-level method, the filtering method, and the meta-classifier method
Chapter 5: Learning from Data - Part 2
Introduction
Applying machine learning on data using Java Machine Learning (Java-ML) library
Classifying data points using the Stanford classifier
Classifying data points using Massive Online Analysis (MOA)
Classifying multilabeled data points using Mulan
Chapter 6: Retrieving Information from Text Data
Introduction
Detecting tokens (words) using Java
Detecting sentences using Java
Detecting tokens (words) and sentences using OpenNLP
Retrieving lemma, part-of-speech, and recognizing named entities from tokens using Stanford CoreNLP
Measuring text similarity with Cosine Similarity measure using Java 8
Extracting topics from text documents using Mallet
Classifying text documents using Mallet
Classifying text documents using Weka
Chapter 7: Handling Big Data
Introduction
Training an online logistic regression model using Apache Mahout
Applying an online logistic regression model using Apache Mahout
Solving simple text mining problems with Apache Spark
Clustering using KMeans algorithm with MLib
Creating a linear regression model with MLib
Classifying data points with Random Forest model using MLib
Chapter 8: Learn Deeply from Data
Introduction
Creating a Word2vec neural net using Deep Learning for Java (DL4j)
Creating a Deep Belief neural net using Deep Learning for Java (DL4j)
Creating a deep autoencoder using Deep Learning for Java (DL4j)
Chapter 9: Visualizing Data
Introduction
Plotting a 2D sine graph
Plotting histograms
Plotting a bar chart
Plotting box plots or whisker diagrams
Plotting scatter plots
Plotting donut plots
Plotting area graphs

What You Will Learn

  • Find out how to clean and make datasets ready so you can acquire actual insights by removing noise and outliers
  • Develop the skills to use modern machine learning techniques to retrieve information and transform data to knowledge. retrieve information from large amount of data in text format.
  • Familiarize yourself with cutting-edge techniques to store and search large volumes of data and retrieve information from large amounts of data in text format
  • Develop basic skills to apply big data and deep learning technologies on large volumes of data
  • Evolve your data visualization skills and gain valuable insights from your data
  • Get to know a step-by-step formula to develop an industry-standard, large-scale, real-life data product
  • Gain the skills to visualize data and interact with users through data insights

Authors

Table of Contents

Chapter 1: Obtaining and Cleaning Data
Introduction
Retrieving all filenames from hierarchical directories using Java
Retrieving all filenames from hierarchical directories using Apache Commons IO
Reading contents from text files all at once using Java 8
Reading contents from text files all at once using Apache Commons IO
Extracting PDF text using Apache Tika
Cleaning ASCII text files using Regular Expressions
Parsing Comma Separated Value (CSV) Files using Univocity
Parsing Tab Separated Value (TSV) file using Univocity
Parsing XML files using JDOM
Writing JSON files using JSON.simple
Reading JSON files using JSON.simple
Extracting web data from a URL using JSoup
Extracting web data from a website using Selenium Webdriver
Reading table data from a MySQL database
Chapter 2: Indexing and Searching Data
Introduction
Indexing data with Apache Lucene
Searching indexed data with Apache Lucene
Chapter 3: Analyzing Data Statistically
Introduction
Generating descriptive statistics
Generating summary statistics
Generating summary statistics from multiple distributions
Computing frequency distribution
Counting word frequency in a string
Counting word frequency in a string using Java 8
Computing simple regression
Computing ordinary least squares regression
Computing generalized least squares regression
Calculating covariance of two sets of data points
Calculating Pearson's correlation of two sets of data points
Conducting a paired t-test
Conducting a Chi-square test
Conducting the one-way ANOVA test
Conducting a Kolmogorov-Smirnov test
Chapter 4: Learning from Data - Part 1
Introduction
Creating and saving an Attribute-Relation File Format (ARFF) file
Cross-validating a machine learning model
Classifying unseen test data
Classifying unseen test data with a filtered classifier
Generating linear regression models
Generating logistic regression models
Clustering data points using the KMeans algorithm
Clustering data from classes
Learning association rules from data
Selecting features/attributes using the low-level method, the filtering method, and the meta-classifier method
Chapter 5: Learning from Data - Part 2
Introduction
Applying machine learning on data using Java Machine Learning (Java-ML) library
Classifying data points using the Stanford classifier
Classifying data points using Massive Online Analysis (MOA)
Classifying multilabeled data points using Mulan
Chapter 6: Retrieving Information from Text Data
Introduction
Detecting tokens (words) using Java
Detecting sentences using Java
Detecting tokens (words) and sentences using OpenNLP
Retrieving lemma, part-of-speech, and recognizing named entities from tokens using Stanford CoreNLP
Measuring text similarity with Cosine Similarity measure using Java 8
Extracting topics from text documents using Mallet
Classifying text documents using Mallet
Classifying text documents using Weka
Chapter 7: Handling Big Data
Introduction
Training an online logistic regression model using Apache Mahout
Applying an online logistic regression model using Apache Mahout
Solving simple text mining problems with Apache Spark
Clustering using KMeans algorithm with MLib
Creating a linear regression model with MLib
Classifying data points with Random Forest model using MLib
Chapter 8: Learn Deeply from Data
Introduction
Creating a Word2vec neural net using Deep Learning for Java (DL4j)
Creating a Deep Belief neural net using Deep Learning for Java (DL4j)
Creating a deep autoencoder using Deep Learning for Java (DL4j)
Chapter 9: Visualizing Data
Introduction
Plotting a 2D sine graph
Plotting histograms
Plotting a bar chart
Plotting box plots or whisker diagrams
Plotting scatter plots
Plotting donut plots
Plotting area graphs

Book Details

ISBN 139781787122536
Paperback372 pages
Read More

Read More Reviews

Recommended for You

Neural Network Programming with Java - Second Edition Book Cover
Neural Network Programming with Java - Second Edition
$ 35.99
$ 25.20
Java 9 Concurrency Cookbook - Second Edition Book Cover
Java 9 Concurrency Cookbook - Second Edition
$ 39.99
$ 28.00
Java 9 with JShell Book Cover
Java 9 with JShell
$ 39.99
$ 28.00
Java: Data Science Made Easy Book Cover
Java: Data Science Made Easy
$ 67.99
$ 47.60
Learning Java Lambdas Book Cover
Learning Java Lambdas
$ 23.99
$ 16.80
Deep Learning with Hadoop Book Cover
Deep Learning with Hadoop
$ 31.99
$ 22.40