Packt+ | Advance your knowledge in tech

You're reading from Jupyter for Data Science

Product typeBook

Published inOct 2017

Reading LevelBeginner

PublisherPackt

ISBN-139781785880070

Edition1st Edition

Languages

Python

Tools

Jupyter

Concepts

Data Analysis

Author (1)

Dan Toomey

Chapter 5. R with Jupyter

In this chapter we will be using R coding within Jupyter. I think R is one of the primary languages expected to be used within Jupyter. The full extent of the language is available to Jupyter users.

How to set up R for Jupyter

In the past, it was necessary to install the separate components of Jupyter, Python, and so on to have a working system. With Continuum Analytics, the process of installing Jupyter and adding the R engine to the solution set for Jupyter is easy and works on both Windows and Mac.

Assuming you have installed conda already, we have one command to add support for R programming to Jupyter:

conda install -c r r-essentials

Note

At this point, when you start Jupyter, one of the kernels listed will now be R.

R data analysis of the 2016 US election demographics

To get a flavor of the resources available to R developers, we can look at the 2016 election data. In this case, I am drawing from Wikipedia (https://en.wikipedia.org/wiki/United_States_presidential_election,_2016), specifically the table named 2016 presidential vote by demographic subgroup. We have the following coding below.

Define a helper function so we can print out values easily. The new printf function takes any arguments passed (...) and passes them along to sprintf:

printf <- function(...)print(sprintf(...))

I have stored the separate demographic statistics into different TSV (tab-separated value) files, which can be read in using the following coding. For each table, we use the read.csv function and specify the field separator as a tab instead of the default comma. We then use the head function to display information about the data frame that was loaded:

age <- read.csv("Documents/B05238_05_age.tsv", sep="\t")head(age)education...

Analyzing 2016 voter registration and voting

Similarly, we can look at voter registration versus actual voting (using census data from https://www.census.gov/data/tables/time-series/demo/voting-and-registration/p20-580.html).

First, we load our dataset and display head information to visually check for accurate loading:

df <- read.csv("Documents/B05238_05_registration.csv")summary(df)

So, we have some registration and voting information by state. Use R to automatically plot all the data in x and y format using the plot command:

plot(df)

We are specifically looking at the relationship between registering to vote and actually voting. We can see in the following graphic that most of the data is highly correlated (as evidenced by the 45 degree angles of most of the relationships):

We can produce somewhat similar results using Python, but the graphic display is not even close.

Import all of the packages we are using for the example:

from numpy import corrcoef, sum, log, arange
from numpy.random import...

Analyzing changes in college admissions

We can look at trends in college admissions acceptance rates over the last few years. For this analysis, I am using the data on https://www.ivywise.com/ivywise-knowledgebase/admission-statistics.

First, we read in our dataset and show the summary points, from head to validate:

df <- read.csv("Documents/acceptance-rates.csv")summary(df)head(df)

We see the summary data for school acceptance rates as follows:

It's interesting to note that the acceptance rate varies so widely, from a low of 5 percent to a high of 41 percent in 2017.

Let us look at the data plots, again, to validate that the data points are correct:

plot(df)

From the correlation graphics shown, it does not look like we can use the data points from 2007. The graphs show a big divergence between 2007 and the other years, whereas the other three have good correlations.

So, we have 3 consecutive years of data from 25 major US universities. We can convert the data into a time series using a few steps...

Predicting airplane arrival time

R has built-in functionality for splitting up a data frame between training and testing sets, building a model based on the training set, predicting results using the model and the testing set, and then visualizing how well the model is working.

For this example, I am using airline arrival and departure times versus scheduled arrival and departure times from http://stat-computing.org/dataexpo/2009/the-data.html for 2008. The dataset is distributed as a .bz2 file that unpacks into a CSV file. I like this dataset, as the initial row count is over 7 million and it all works nicely in Jupyter.

We first read in the airplane data and display a summary. There are additional columns in the dataset that we are not using:

df <- read.csv("Documents/2008-airplane.csv")summary(df)...CRSElapsedTime      AirTime          ArrDelay          DepDelay       Min.   :-141.0   Min.   :   0     Min.   :-519.00   Min.   :-534.00   1st Qu.:  80.0   1st Qu.:  55     1st Qu.: -10.00...

Summary

In this chapter, we first set up R as one of the engines available for a notebook. Then we used some rudimentary R to analyze voter demographics for the presidential election. We looked at voter registration versus actual voting. Next, we analyzed the trend in college admissions. Finally, we looked at using a predictive model to determine whether flights would be delayed or not.

In the next chapter, we will look into wrangling data in different ways under Jupyter.

The rest of the chapter is locked

You have been reading a chapter from

Jupyter for Data Science

Published in: Oct 2017Publisher: PacktISBN-13: 9781785880070

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Dan Toomey

Dan Toomey has been developing application software for over 20 years. He has worked in a variety of industries and companies, in roles from sole contributor to VP/CTO-level. For the last few years, he has been contracting for companies in the eastern Massachusetts area. Dan has been contracting under Dan Toomey Software Corp. Dan has also written R for Data Science, Jupyter for Data Sciences, and the Jupyter Cookbook, all with Packt.
Read more about Dan Toomey

Other recommended products

Related to this chapter

Jupyter Cookbook

Jupyter has garnered a strong interest in the data science community of late, as it makes common data processing and analysis tasks much simpler. This book is for data science professionals who want to master various tasks related to Jupyter to create efficient, easy-to-share applications related to data analysis and visualization.

BookApr 2018238 pages

Learning Jupyter 5

In this book, you will learn how to build interactive dashboards in a Jupyter notebook. Explore JupyterHub and various Jupyter widgets through which you can easily perform 3D data visualization, 3D plotting, and geospatial analytics. This book helps you understand BeakerX to create interactive tables and interact with spreadsheets.

BookAug 2018282 pages

JupyterLab Quick Start Guide

Jupyterlab is a web-based data science interface and natural evolution of Jupyter Notebooks. This guide will take you through the core commands and functionalities of JupyterLab. You will learn to customize and enhance your JupyterLab productivity by installing additional extensions.

BookDec 2019160 pages

Hands-On Exploratory Data Analysis with R

Hands-On Exploratory Data Analysis with R puts the complete process of exploratory data analysis into a practical demonstration in one nutshell. You will understand the concepts of data analysis right from data ingestion, data cleaning, data manipulation to applying statistical techniques and visualizing hidden patterns.

BookMay 2019266 pages

Regression Analysis with R

Regression analysis is a statistical process which enables prediction of relationships between variables. This book will give you a rundown explaining what regression analysis is, explaining you the process from scratch. Each chapter starts with explaining the theoretical concepts and once the reader gets comfortable with the theory, we move to the practical examples to support the understanding. By the end of this book you will know all the concepts and pain-points related to regression analysis, and you will be able to implement your learning in your projects.

BookJan 2018422 pages

Practical Data Science Cookbook

As an increasing amount of data is generated each year, and the need to analyze and operationalize it is more important than ever. Companies that know what to do with their data have a competitive advantage over companies that don't. This drives a higher demand for knowledgeable and competent data professionals. By sequentially working through the steps presented in each chapter, you will quickly familiarize yourself with the data science process, and learn how to apply it to a variety of situations with examples using the two most popular programming languages for data analysis - R and Python.

BookJun 2017434 pages

R Data Analysis Cookbook

Data analytics with R has emerged as a very important focus for organizations of all kinds. R enables even those with only an intuitive grasp of the underlying concepts, without a deep mathematical background, to unleash powerful and detailed examinations of their data. This book empowers you by showing you ways to use R to generate professional analysis reports. The book also teaches you to quickly adapt the example code for your own needs and save yourself the time needed to construct code from scratch.

BookSep 2017560 pages

Advanced Analytics with R and Tableau

R is the go-to tool for statistics and data mining while Tableau offers an interface to filter data, plug and play with rich visualizations to describe insights from your data. When combined these two tools makes it easier to harness interesting patterns and communicate stories. This book covers various analytical techniques like prediction, classification, clustering and best practices to visualize it using interactive dashboard with drop-downs, sliders, and other visual cues of Tableau. Get to know how R can be used in conjunction with Tableau and implement powerful machine learning techniques making big data analytics accessible and presentable through Tableau workbooks.

BookAug 2017178 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages