Reader small image

You're reading from  Big Data Analytics with R

Product typeBook
Published inJul 2016
Reading LevelBeginner
PublisherPackt
ISBN-139781786466457
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
Simon Walkowiak
Simon Walkowiak
author image
Simon Walkowiak

Simon Walkowiak is a cognitive neuroscientist and a managing director of Mind Project Ltd a Big Data and Predictive Analytics consultancy based in London, United Kingdom. As a former data curator at the UK Data Service (UKDS, University of Essex) European largest socio-economic data repository, Simon has an extensive experience in processing and managing large-scale datasets such as censuses, sensor and smart meter data, telecommunication data and well-known governmental and social surveys such as the British Social Attitudes survey, Labour Force surveys, Understanding Society, National Travel survey, and many other socio-economic datasets collected and deposited by Eurostat, World Bank, Office for National Statistics, Department of Transport, NatCen and International Energy Agency, to mention just a few. Simon has delivered numerous data science and R training courses at public institutions and international companies. He has also taught a course in Big Data Methods in R at major UK universities and at the prestigious Big Data and Analytics Summer School organized by the Institute of Analytics and Data Science (IADS).
Read more about Simon Walkowiak

Right arrow

Chapter 9.  The Future of R - Big, Fast, and Smart Data

Congratulations on reaching the final chapter. In the last part of this book we will review the Big Data approaches presented earlier and will discuss the future of Big Data analytics using R . Whenever possible you will be provided with links and references to online and printed resources which you may use to expand your skills further in selected topics on Big Data with R. After reading this chapter you will be able to:

  • Summarize major Big Data technologies available on the market and explain how they can be integrated with the R language

  • Indicate the current position of R and its distributions in the landscape of statistical tools for Big Data analytics

  • Identify potential opportunities for future development of the R language and how it can become an integral part of Big Data workflows

The current state of Big Data analytics with R


This section will serve as a critical evaluation and summary of the R language's ability to process very large, out-of memory data and its connectivity with a variety of existing Big Data platforms and tools.

Out-of-memory data on a single machine

We began the book with a brief revision of the most common techniques used to analyze data with the R language (Chapter 2, Introduction to R Programming Language and Statistical Environment). We guided you from importing the data into R, through data management and processing methods, cross-tabulations, aggregations, hypothesis testing, and visualizations. We then explained major limitations of the R language in terms of its requirement of memory resources for data storage and its speed of processing. We said that the data must fit within the available RAM installed on your computer if you were to use only a single machine for data processing in  the R language. However, as a system runs other processes...

The future of R


In the following brief sections, we are going to try to imagine how R may develop within the next several years to facilitate Big, Fast, and Smart data processing.

Big Data

We hope that by reading this book you have gained an appreciation for the R language and what can potentially be achieved by integrating it with currently available Big Data tools. As the last few years have brought us many new Big Data technologies, it has to be said that the full connectivity of R with these new frameworks may take some time. The availability of approaches utilizing R to process large datasets on a single machine is still quite limited due to traditional limitations of the R language itself. The ultimate solution to this problem may only be achieved by defining the language from scratch, but this is obviously an extreme and largely impractical idea. There is a lot of hope associated with Microsoft R Open, but as these are still quite early days for this new distribution, we need to wait...

Where to go next


After reading this book and going through all its tutorials you should have enough skills to let you perform scalable and distributed analysis of very large datasets using the R language. The usefulness of the material contained in this book hugely depends on other tools your current Big Data processing stack includes. Although we have presented you with a wide array of applications and frameworks which are common ingredients of Big Data workflows, for example Hadoop, Spark, SQL, and NoSQL databases, we appreciate that your personal needs and business requirements may vary.

In order to address your particular data-related problems and accomplish Big Data tasks, which may include a myriad of data analytics platforms, other programming languages, and various statistical methods or machine learning algorithms, you may need to develop a specific skill set and make sure to constantly grow your expertise in this dynamically evolving field. Throughout this book, we have included...

Summary


In the last chapter of this book, we have summarized the current position of the R language in the diverse landscape of Big Data tools and frameworks. We have also identified the potential opportunities of the R language to evolve into a leading Big Data statistical environment by tackling some of the most frequently encountered limitations and barriers. Finally, we have explored and elaborated on the requirements which R language will most likely meet within the next several years to provide even greater support for user-friendly, Big, Fast, and Smart data analytics.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Big Data Analytics with R
Published in: Jul 2016Publisher: PacktISBN-13: 9781786466457
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Simon Walkowiak

Simon Walkowiak is a cognitive neuroscientist and a managing director of Mind Project Ltd a Big Data and Predictive Analytics consultancy based in London, United Kingdom. As a former data curator at the UK Data Service (UKDS, University of Essex) European largest socio-economic data repository, Simon has an extensive experience in processing and managing large-scale datasets such as censuses, sensor and smart meter data, telecommunication data and well-known governmental and social surveys such as the British Social Attitudes survey, Labour Force surveys, Understanding Society, National Travel survey, and many other socio-economic datasets collected and deposited by Eurostat, World Bank, Office for National Statistics, Department of Transport, NatCen and International Energy Agency, to mention just a few. Simon has delivered numerous data science and R training courses at public institutions and international companies. He has also taught a course in Big Data Methods in R at major UK universities and at the prestigious Big Data and Analytics Summer School organized by the Institute of Analytics and Data Science (IADS).
Read more about Simon Walkowiak