Big Data Analytics with R

Utilize R to uncover hidden patterns in your Big Data

Big Data Analytics with R

This ebook is included in a Mapt subscription
Simon Walkowiak

3 customer reviews
Utilize R to uncover hidden patterns in your Big Data
$43.99
$54.99
RRP $43.99
RRP $54.99
eBook
Print + eBook
Subscribe and access every Packt eBook & Video.
 
  • 4,000+ eBooks & Videos
  • 40+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 
Preview in Mapt

Book Details

ISBN 139781786466457
Paperback506 pages

Book Description

Big Data analytics is the process of examining large and complex data sets that often exceed the computational capabilities. R is a leading programming language of data science, consisting of powerful functions to tackle all problems related to Big Data processing.

The book will begin with a brief introduction to the Big Data world and its current industry standards. With introduction to the R language and presenting its development, structure, applications in real world, and its shortcomings. Book will progress towards revision of major R functions for data management and transformations. Readers will be introduce to Cloud based Big Data solutions (e.g. Amazon EC2 instances and Amazon RDS, Microsoft Azure and its HDInsight clusters) and also provide guidance on R connectivity with relational and non-relational databases such as MongoDB and HBase etc. It will further expand to include Big Data tools such as Apache Hadoop ecosystem, HDFS and MapReduce frameworks. Also other R compatible tools such as Apache Spark, its machine learning library Spark MLlib, as well as H2O.

Table of Contents

Chapter 1: The Era of Big Data
Big Data – The monster re-defined
Big Data toolbox - dealing with the giant
R – The unsung Big Data hero
Summary
Chapter 2: Introduction to R Programming Language and Statistical Environment
Learning R
Revisiting R basics
Applied data science with R
Summary
Chapter 3: Unleashing the Power of R from Within
Traditional limitations of R
To the memory limits and beyond
Parallel R
Boosting R performance with the data.table package and other tools
Summary
Chapter 4: Hadoop and MapReduce Framework for R
Hadoop architecture
A single-node Hadoop in Cloud
HDInsight - a multi-node Hadoop cluster on Azure
Summary
Chapter 5: R with Relational Database Management Systems (RDBMSs)
Relational Database Management Systems (RDBMSs)
SQLite with R
MariaDB with R on a Amazon EC2 instance
PostgreSQL with R on Amazon RDS
Summary
Chapter 6: R with Non-Relational (NoSQL) Databases
Introduction to NoSQL databases
MongoDB with R
HBase with R
Summary
Chapter 7: Faster than Hadoop - Spark with R
Spark for Big Data analytics
Spark with R on a multi-node HDInsight cluster
Summary
Chapter 8: Machine Learning Methods for Big Data in R
What is machine learning?
GLM example with Spark and R on the HDInsight cluster
Naive Bayes with H2O on Hadoop with R
Neural Networks with H2O on Hadoop with R
Summary
Chapter 9: The Future of R - Big, Fast, and Smart Data
The current state of Big Data analytics with R
The future of R
Where to go next
Summary

What You Will Learn

  • Learn about current state of Big Data processing using R programming language and its powerful statistical capabilities
  • Deploy Big Data analytics platforms with selected Big Data tools supported by R in a cost-effective and time-saving manner
  • Apply the R language to real-world Big Data problems on a multi-node Hadoop cluster, e.g. electricity consumption across various socio-demographic indicators and bike share scheme usage
  • Explore the compatibility of R with Hadoop, Spark, SQL and NoSQL databases, and H2O platform

Authors

Table of Contents

Chapter 1: The Era of Big Data
Big Data – The monster re-defined
Big Data toolbox - dealing with the giant
R – The unsung Big Data hero
Summary
Chapter 2: Introduction to R Programming Language and Statistical Environment
Learning R
Revisiting R basics
Applied data science with R
Summary
Chapter 3: Unleashing the Power of R from Within
Traditional limitations of R
To the memory limits and beyond
Parallel R
Boosting R performance with the data.table package and other tools
Summary
Chapter 4: Hadoop and MapReduce Framework for R
Hadoop architecture
A single-node Hadoop in Cloud
HDInsight - a multi-node Hadoop cluster on Azure
Summary
Chapter 5: R with Relational Database Management Systems (RDBMSs)
Relational Database Management Systems (RDBMSs)
SQLite with R
MariaDB with R on a Amazon EC2 instance
PostgreSQL with R on Amazon RDS
Summary
Chapter 6: R with Non-Relational (NoSQL) Databases
Introduction to NoSQL databases
MongoDB with R
HBase with R
Summary
Chapter 7: Faster than Hadoop - Spark with R
Spark for Big Data analytics
Spark with R on a multi-node HDInsight cluster
Summary
Chapter 8: Machine Learning Methods for Big Data in R
What is machine learning?
GLM example with Spark and R on the HDInsight cluster
Naive Bayes with H2O on Hadoop with R
Neural Networks with H2O on Hadoop with R
Summary
Chapter 9: The Future of R - Big, Fast, and Smart Data
The current state of Big Data analytics with R
The future of R
Where to go next
Summary

Book Details

ISBN 139781786466457
Paperback506 pages
Read More
From 3 reviews

Read More Reviews