Data Manipulation with R - Second Edition

More Information
  • Learn about R data types and their basic operations
  • Work efficiently with string, factor, and date variables using stringr
  • Understand group-wise data manipulation
  • Work with different layouts of R datasets and interchange between layouts for varied purposes
  • Manage bigger datasets using pylr and dpylr
  • Perform data manipulation with add-on packages such as plyr, reshape, stringr, lubridate, and sqldf
  • Manipulate datasets using SQL statements with the sqldf package
  • Clean and structure raw data for data mining using text manipulation

This book starts with the installation of R and how to go about using R and its libraries. We then discuss the mode of R objects and its classes and then highlight different R data types with their basic operations.

The primary focus on group-wise data manipulation with the split-apply-combine strategy has been explained with specific examples. The book also contains coverage of some specific libraries such as lubridate, reshape2, plyr, dplyr, stringr, and sqldf. You will not only learn about group-wise data manipulation, but also learn how to efficiently handle date, string, and factor variables along with different layouts of datasets using the reshape2 package.

By the end of this book, you will have learned about text manipulation using stringr, how to extract data from twitter using twitteR library, how to clean raw data, and how to structure your raw data for data mining.

  • Perform data manipulation with add-on packages such as plyr, reshape, stringr, lubridate, and sqldf
  • Learn about factor manipulation, string processing, and text manipulation techniques using the stringr and dplyr libraries
  • Enhance your analytical skills in an intuitive way through step-by-step working examples
Page Count 130
Course Length 3 hours 54 minutes
ISBN 9781785288814
Date Of Publication 30 Mar 2015


Jaynal Abedin

Jaynal Abedin is currently doing research as a PhD student at Unit for Biomedical Data Analytics (BDA) of INSIGHT at the National University of Ireland Galway. His research work is focused on the sports science and sports medicine area in a targeted project with ORRECO --an Irish startup company that provides evidence-based advice to individual athletes through biomarker and GPS data. Before joining INSIGHT as a PhD student he was leading a team of statisticians at an international public health research organization (icddr,b). His primary role there was to develop internal statistical capabilities for researchers who come from various disciplines. He was involved in designing and delivering statistical training to the researchers. He has a bachelors and masters degree in statistics, and he has written two books in R programming: Data Manipulation with R and R Graphs Cookbook (Second Edition) with Packt. His current research interests are predictive modeling to predict probable injury of an athlete and scoring extremeness of multivariate data to get an early signal of an anomaly. Moreover, he has an excellent reputation as a freelance R programmer and statistician in an online platform such as upwork.

Kishor Kumar Das

Kishor Kumar Das is a statistician at the International Centre for Diarrhoeal Disease Research, Bangladesh, an internationally recognized organization that focuses mainly on public health research. He completed his MSc and BSc in applied statistics from the Institute of Statistical Research and Training, University of Dhaka, Bangladesh. He has extensively used R for data processing, statistical analysis, and graphs for more than 10 years. His research interests are survival analysis, machine learning, and statistical computing.