Big Data Analytics with R and Hadoop


Big Data Analytics with R and Hadoop
eBook: $29.99
Formats: PDF, PacktLib, ePub and Mobi formats
$25.49
save 15%!
Print + free eBook + free PacktLib access to the book: $79.98    Print cover: $49.99
$49.99
save 37%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Overview
Table of Contents
Author
Reviews
Support
Sample Chapters
  • Write Hadoop MapReduce within R
  • Learn data analytics with R and the Hadoop platform
  • Handle HDFS data within R
  • Understand Hadoop streaming with R
  • Encode and enrich datasets into R

Book Details

Language : English
Paperback : 238 pages [ 235mm x 191mm ]
Release Date : November 2013
ISBN : 178216328X
ISBN 13 : 9781782163282
Author(s) : Vignesh Prajapati
Topics and Technologies : All Books, Big Data and Business Intelligence, Open Source

Table of Contents

Preface
Chapter 1: Getting Ready to Use R and Hadoop
Chapter 2: Writing Hadoop MapReduce Programs
Chapter 3: Integrating R and Hadoop
Chapter 4: Using Hadoop Streaming with R
Chapter 5: Learning Data Analytics with R and Hadoop
Chapter 6: Understanding Big Data Analysis with Machine Learning
Chapter 7: Importing and Exporting Data from Various DBs
Appendix: References
Index
  • Chapter 1: Getting Ready to Use R and Hadoop
    • Installing R
    • Installing RStudio
    • Understanding the features of R language
      • Using R packages
      • Performing data operations
      • Increasing community support
      • Performing data modeling in R
    • Installing Hadoop
      • Understanding different Hadoop modes
      • Understanding Hadoop installation steps
        • Installing Hadoop on Linux, Ubuntu flavor (single node cluster)
        • Installing Hadoop on Linux, Ubuntu flavor (multinode cluster)
        • Installing Cloudera Hadoop on Ubuntu
    • Understanding Hadoop features
      • Understanding HDFS
        • Understanding the characteristics of HDFS
      • Understanding MapReduce
    • Learning the HDFS and MapReduce architecture
      • Understanding the HDFS architecture
        • Understanding HDFS components
      • Understanding the MapReduce architecture
        • Understanding MapReduce components
      • Understanding the HDFS and MapReduce architecture by plot
    • Understanding Hadoop subprojects
    • Summary
    • Chapter 2: Writing Hadoop MapReduce Programs
      • Understanding the basics of MapReduce
      • Introducing Hadoop MapReduce
        • Listing Hadoop MapReduce entities
        • Understanding the Hadoop MapReduce scenario
          • Loading data into HDFS
          • Executing the Map phase
          • Shuffling and sorting
          • Reducing phase execution
        • Understanding the limitations of MapReduce
        • Understanding Hadoop's ability to solve problems
        • Understanding the different Java concepts used in Hadoop programming
      • Understanding the Hadoop MapReduce fundamentals
        • Understanding MapReduce objects
        • Deciding the number of Maps in MapReduce
        • Deciding the number of Reducers in MapReduce
        • Understanding MapReduce dataflow
        • Taking a closer look at Hadoop MapReduce terminologies
      • Writing a Hadoop MapReduce example
        • Understanding the steps to run a MapReduce job
          • Learning to monitor and debug a Hadoop MapReduce job
          • Exploring HDFS data
        • Understanding several possible MapReduce definitions to solve business problems
      • Learning the different ways to write Hadoop MapReduce in R
        • Learning RHadoop
        • Learning RHIPE
        • Learning Hadoop streaming
      • Summary
      • Chapter 3: Integrating R and Hadoop
        • Introducing RHIPE
          • Installing RHIPE
            • Installing Hadoop
            • Installing R
            • Installing protocol buffers
            • Environment variables
            • The rJava package installation
            • Installing RHIPE
          • Understanding the architecture of RHIPE
          • Understanding RHIPE samples
            • RHIPE sample program (Map only)
            • Word count
          • Understanding the RHIPE function reference
            • Initialization
            • HDFS
            • MapReduce
        • Introducing RHadoop
          • Understanding the architecture of RHadoop
          • Installing RHadoop
          • Understanding RHadoop examples
            • Word count
          • Understanding the RHadoop function reference
            • The hdfs package
            • The rmr package
        • Summary
        • Chapter 4: Using Hadoop Streaming with R
          • Understanding the basics of Hadoop streaming
          • Understanding how to run Hadoop streaming with R
            • Understanding a MapReduce application
            • Understanding how to code a MapReduce application
            • Understanding how to run a MapReduce application
              • Executing a Hadoop streaming job from the command prompt
              • Executing the Hadoop streaming job from R or an RStudio console
            • Understanding how to explore the output of MapReduce application
              • Exploring an output from the command prompt
              • Exploring an output from R or an RStudio console
            • Understanding basic R functions used in Hadoop MapReduce scripts
            • Monitoring the Hadoop MapReduce job
          • Exploring the HadoopStreaming R package
            • Understanding the hsTableReader function
            • Understanding the hsKeyValReader function
            • Understanding the hsLineReader function
            • Running a Hadoop streaming job
              • Executing the Hadoop streaming job
          • Summary
          • Chapter 5: Learning Data Analytics with R and Hadoop
            • Understanding the data analytics project life cycle
              • Identifying the problem
              • Designing data requirement
              • Preprocessing data
              • Performing analytics over data
              • Visualizing data
            • Understanding data analytics problems
              • Exploring web pages categorization
                • Identifying the problem
                • Designing data requirement
                • Preprocessing data
                • Performing analytics over data
                • Visualizing data
              • Computing the frequency of stock market change
                • Identifying the problem
                • Designing data requirement
                • Preprocessing data
                • Performing analytics over data
                • Visualizing data
              • Predicting the sale price of blue book for bulldozers – case study
                • Identifying the problem
                • Designing data requirement
                • Preprocessing data
                • Performing analytics over data
                • Understanding Poisson-approximation resampling
            • Summary
            • Chapter 6: Understanding Big Data Analysis with Machine Learning
              • Introduction to machine learning
                • Types of machine-learning algorithms
              • Supervised machine-learning algorithms
                • Linear regression
                  • Linear regression with R
                  • Linear regression with R and Hadoop
                • Logistic regression
                  • Logistic regression with R
                  • Logistic regression with R and Hadoop
              • Unsupervised machine learning algorithm
                • Clustering
                  • Clustering with R
                  • Performing clustering with R and Hadoop
              • Recommendation algorithms
                • Steps to generate recommendations in R
                • Generating recommendations with R and Hadoop
              • Summary
              • Chapter 7: Importing and Exporting Data from Various DBs
                • Learning about data files as database
                  • Understanding different types of files
                  • Installing R packages
                  • Importing the data into R
                  • Exporting the data from R
                • Understanding MySQL
                  • Installing MySQL
                  • Installing RMySQL
                  • Learning to list the tables and their structure
                  • Importing the data into R
                  • Understanding data manipulation
                • Understanding Excel
                  • Installing Excel
                  • Importing data into R
                  • Exporting the data to Excel
                • Understanding MongoDB
                  • Installing MongoDB
                    • Mapping SQL to MongoDB
                    • Mapping SQL to MongoQL
                  • Installing rmongodb
                  • Importing the data into R
                  • Understanding data manipulation
                • Understanding SQLite
                  • Understanding features of SQLite
                  • Installing SQLite
                  • Installing RSQLite
                  • Importing the data into R
                  • Understanding data manipulation
                • Understanding PostgreSQL
                  • Understanding features of PostgreSQL
                  • Installing PostgreSQL
                  • Installing RPostgreSQL
                  • Exporting the data from R
                • Understanding Hive
                  • Understanding features of Hive
                  • Installing Hive
                    • Setting up Hive configurations
                  • Installing RHive
                  • Understanding RHive operations
                • Understanding HBase
                  • Understanding HBase features
                  • Installing HBase
                  • Installing thrift
                  • Installing RHBase
                  • Importing the data into R
                  • Understanding data manipulation
                • Summary
                • Appendix: References
                  • R + Hadoop help materials
                  • R groups
                  • Hadoop groups
                  • R + Hadoop groups
                  • Popular R contributors
                  • Popular Hadoop contributors

                  Vignesh Prajapati

                  Vignesh Prajapati, from India, is a Big Data enthusiast, a Pingax (www.pingax.com) consultant and a software professional at Enjay. He is an experienced ML Data engineer. He is experienced with Machine learning and Big Data technologies such as R, Hadoop, Mahout, Pig, Hive, and related Hadoop components to analyze datasets to achieve informative insights by data analytics cycles. He pursued B.E from Gujarat Technological University in 2012 and started his career as Data Engineer at Tatvic. His professional experience includes working on the development of various Data analytics algorithms for Google Analytics data source, for providing economic value to the products. To get the ML in action, he implemented several analytical apps in collaboration with Google Analytics and Google Prediction API services. He also contributes to the R community by developing the RGoogleAnalytics' R library as an open source code Google project and writes articles on Data-driven technologies. Vignesh is not limited to a single domain; he has also worked for developing various interactive apps via various Google APIs, such as Google Analytics API, Realtime API, Google Prediction API, Google Chart API, and Translate API with the Java and PHP platforms. He is highly interested in the development of open source technologies. Vignesh has also reviewed the Apache Mahout Cookbook for Packt Publishing. This book provides a fresh, scope-oriented approach to the Mahout world for beginners as well as advanced users. Mahout Cookbook is specially designed to make users aware of the different possible machine learning applications, strategies, and algorithms to produce an intelligent as well as Big Data application.

                  Code Downloads

                  Download the code and support files for this book.


                  Submit Errata

                  Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.

                  Sample chapters

                  You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

                  Frequently bought together

                  Big Data Analytics with R and Hadoop +    Getting Started with Citrix XenApp 6.5 =
                  50% Off
                  the second eBook
                  Price for both: ₨406.40

                  Buy both these recommended eBooks together and get 50% off the cheapest eBook.

                  What you will learn from this book

                  • Integrate R and Hadoop via RHIPE, RHadoop, and Hadoop streaming
                  • Develop and run a MapReduce application that runs with R and Hadoop
                  • Handle HDFS data from within R using RHIPE and RHadoop
                  • Run Hadoop streaming and MapReduce with R
                  • Import and export from various data sources to R

                  In Detail

                  Big data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations, and other useful information. Such information can provide competitive advantages over rival organizations and result in business benefits, such as more effective marketing and increased revenue. New methods of working with big data, such as Hadoop and MapReduce, offer alternatives to traditional data warehousing.

                  Big Data Analytics with R and Hadoop is focused on the techniques of integrating R and Hadoop by various tools such as RHIPE and RHadoop. A powerful data analytics engine can be built, which can process analytics algorithms over a large scale dataset in a scalable manner. This can be implemented through data analytics operations of R, MapReduce, and HDFS of Hadoop.

                  You will start with the installation and configuration of R and Hadoop. Next, you will discover information on various practical data analytics examples with R and Hadoop. Finally, you will learn how to import/export from various data sources to R. Big Data Analytics with R and Hadoop will also give you an easy understanding of the R and Hadoop connectors RHIPE, RHadoop, and Hadoop streaming.

                  Approach

                  Big Data Analytics with R and Hadoop is a tutorial style book that focuses on all the powerful big data tasks that can be achieved by integrating R and Hadoop.

                  Who this book is for

                  This book is ideal for R developers who are looking for a way to perform big data analytics with Hadoop. This book is also aimed at those who know Hadoop and want to build some intelligent applications over Big data with R packages. It would be helpful if readers have basic knowledge of R.

                  Code Download and Errata
                  Packt Anytime, Anywhere
                  Register Books
                  Print Upgrades
                  eBook Downloads
                  Video Support
                  Contact Us
                  Awards Voting Nominations Previous Winners
                  Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
                  Resources
                  Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software