Hands-On Data Science with R

More Information
  • Understand the R programming language and its ecosystem of packages for data science
  • Obtain and clean your data before processing
  • Master essential exploratory techniques for summarizing data
  • Examine various machine learning prediction, models
  • Explore the H2O analytics platform in R for deep learning
  • Apply data mining techniques to available datasets
  • Work with interactive visualization packages in R
  • Integrate R with Spark and Hadoop for large-scale data analytics

R is the most widely used programming language, and when used in association with data science, this powerful combination will solve the complexities involved with unstructured datasets in the real world. This book covers the entire data science ecosystem for aspiring data scientists, right from zero to a level where you are confident enough to get hands-on with real-world data science problems.

The book starts with an introduction to data science and introduces readers to popular R libraries for executing data science routine tasks. This book covers all the important processes in data science such as data gathering, cleaning data, and then uncovering patterns from it. You will explore algorithms such as machine learning algorithms, predictive analytical models, and finally deep learning algorithms. You will learn to run the most powerful visualization packages available in R so as to ensure that you can easily derive insights from your data.

Towards the end, you will also learn how to integrate R with Spark and Hadoop and perform large-scale data analytics without much complexity.

  • Explore the popular R packages for data science
  • Use R for efficient data mining, text analytics and feature engineering
  • Become a thorough data science professional with the help of hands-on examples and use-cases in R
Page Count 420
Course Length 12 hours 36 minutes
ISBN 9781789139402
Date Of Publication 30 Nov 2018


Vitor Bianchi Lanzetta

Vitor Bianchi Lanzetta (@vitorlanzetta) has a master's degree in Applied Economics (University of São Paulo—USP) and works as a data scientist in a tech start-up named RedFox Digital Solutions. He has also authored a book called R Data Visualization Recipes. The things he enjoys the most are statistics, economics, and sports of all kinds (electronics included). His blog, made in partnership with Ricardo Anjoleto Farias (@R_A_Farias), can be found at ArcadeData dot org, they kindly call it R-Cade Data.

Nataraj Dasgupta

Nataraj Dasgupta is the vice president of advanced analytics at RxDataScience Inc. Nataraj has been in the IT industry for more than 19 years, and has worked in the technical and analytics divisions of Philip Morris, IBM, UBS Investment Bank, and Purdue Pharma. At Purdue Pharma, Nataraj led the data science division, where he developed the company's award-winning big data and machine learning platform. Prior to Purdue, at UBS, he held the role of Associate Director, working with high-frequency and algorithmic trading technologies in the foreign exchange trading division of the bank.

Ricardo Anjoleto Farias

Ricardo Anjoleto Farias is an economist who graduated from the Universidade Estadual de Maringá in 2014. In addition to being a sports enthusiast (electronic or otherwise) and enjoying a good barbecue, he also likes math, statistics, and correlated studies. His first contact with R was when he embarked on his master's degree, and since then, he has tried to improve his skills with this powerful tool.