Reader small image

You're reading from  Learning R Programming

Product typeBook
Published inOct 2016
Reading LevelBeginner
PublisherPackt
ISBN-139781785889776
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Kun Ren
Kun Ren
author image
Kun Ren

Kun Ren has used R for nearly 4 years in quantitative trading, along with C++ and C#, and he has worked very intensively (more than 8-10 hours every day) on useful R packages that the community does not offer yet. He contributes to packages developed by other authors and reports issues to make things work better. He is also a frequent speaker at R conferences in China and has given multiple talks. Kun also has a great social media presence. Additionally, he has substantially contributed to various projects, which is evident from his GitHub account: https://github.com/renkun-ken https://cn.linkedin.com/in/kun-ren-76027530 http://renkun.me/ http://renkun.me/formattable/ http://renkun.me/pipeR/ http://renkun.me/rlist/
Read more about Kun Ren

Right arrow

Chapter 7. Working with Data

In the previous chapters, you learned the most commonly used object types and functions to work in R. We know how to create and modify vectors, lists, and data frames, how to define our own functions and how to use proper expressions to translate our logic in mind to R code in the editor. With these objects, functions, and expressions, we can start working with data.

In this chapter, we will set out on a journey of working with data and cover the following topics:

  • Reading and writing data in a file

  • Visualizing data with plot functions

  • Analyzing data with simple statistical models and data mining tools

Reading and writing data


The first step in any kind of data analysis in R is to load data, that is, to import a dataset into the environment. Before that, we have to figure out the type of data file and choose appropriate tools to read the data.

Reading and writing text-format data in a file

Among all the file types used to store data, perhaps the most widely used one is CSV. In a typical CSV file, the first line is the header of columns, and each subsequent line represents a data record with columns separated by commas. Here is an example of student records written in this format:

Name,Gender,Age,Major
Ken,Male,24,Finance
Ashley,Female,25,Statistics
Jennifer,Female,23,Computer Science

Importing data via RStudio IDE

RStudio provides an interactive way to import data. You can navigate to Tools | Import Dataset | From Local File and choose a local file in a text format, such as .csv and .txt. Then, you can adjust the parameters and preview the resulting data frame:

Note that you should check Strings...

Visualizing data


In the previous section, we introduced a number of functions to import data, the first step in most data analysis. It is usually a good practice to look at the data before pouring it into a model, so that is what we will do in the next step. The reason is simple—different models have different strengths, and no model is universally the best choice for all cases since they have a different set of assumptions. Arbitrarily applying a model without checking the data against its assumptions usually results in misleading conclusions.

An initial way to choose a model and perform such checks is to just visually examine the data by looking at its boundaries and patterns. In other words, we need to visualize the data first. In this section, you will learn the basic graphic functions to produce simple charts to visualize a given dataset.

We will use the datasets in the nycflights13 and babynames packages. If you don't have them installed, run the following code:

install.package(c("nycflights13...

Analyzing data


In practical data analysis, most time is spent on data cleansing, that is, to filter and transform the original data (or raw data) to a form that is easier to analyze. The filtering and transforming process is also called data manipulation. We will dedicate an entire chapter to this topic.

In this section, we directly assume that the data is ready for analysis. We won't go deep into the models, but will apply some simple models to leave you an impression of how to fit a model with data, how to interact with fitted models, and how to apply a fitted model to make predictions.

Fitting a linear model

The simplest model in R is the linear model, that is, we use a linear function to describe the relationship between two random variables under a certain set of assumptions. In the following example, we will create a linear function that maps xto 3 + 2 * x. Then we generate a normally-distributed random numeric vector x, and generate y by f(x) plus some independent noise:

f <- function...

Summary


In this chapter, you learned how to read and write data in various formats, how to visualize data with plot functions, and how to apply basic models on the data. Now, you know the basic tools and interface of working with data. However, you may learn more data analysis tools from other sources.

For statistical and econometric models, I recommend that you read not only text books of statistics and econometrics but also R books that focus on statistical analysis. For machine learning models such as artificial neural networks, support vector machines, and random forests, I recommend that you read machine learning books and go to CRAN Task View: Machine Learning & Statistical Learning (https://cran.r-project.org/web/views/MachineLearning.html).

Since this book is focused on the R programming language rather than any specific model, we will continue our journey in the next chapter by going deeper into R. If you are not familiar with how R code works, you can hardly predict what will...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learning R Programming
Published in: Oct 2016Publisher: PacktISBN-13: 9781785889776
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Kun Ren

Kun Ren has used R for nearly 4 years in quantitative trading, along with C++ and C#, and he has worked very intensively (more than 8-10 hours every day) on useful R packages that the community does not offer yet. He contributes to packages developed by other authors and reports issues to make things work better. He is also a frequent speaker at R conferences in China and has given multiple talks. Kun also has a great social media presence. Additionally, he has substantially contributed to various projects, which is evident from his GitHub account: https://github.com/renkun-ken https://cn.linkedin.com/in/kun-ren-76027530 http://renkun.me/ http://renkun.me/formattable/ http://renkun.me/pipeR/ http://renkun.me/rlist/
Read more about Kun Ren