Reader small image

You're reading from  Learning R Programming

Product typeBook
Published inOct 2016
Reading LevelBeginner
PublisherPackt
ISBN-139781785889776
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Kun Ren
Kun Ren
author image
Kun Ren

Kun Ren has used R for nearly 4 years in quantitative trading, along with C++ and C#, and he has worked very intensively (more than 8-10 hours every day) on useful R packages that the community does not offer yet. He contributes to packages developed by other authors and reports issues to make things work better. He is also a frequent speaker at R conferences in China and has given multiple talks. Kun also has a great social media presence. Additionally, he has substantially contributed to various projects, which is evident from his GitHub account: https://github.com/renkun-ken https://cn.linkedin.com/in/kun-ren-76027530 http://renkun.me/ http://renkun.me/formattable/ http://renkun.me/pipeR/ http://renkun.me/rlist/
Read more about Kun Ren

Right arrow

Chapter 6. Working with Strings

In the previous chapter, you learned many built-in functions in several categories to work with basic objects. You learned how to access object classes, types, and dimensions; how to do logical, math, and basic statistical calculations; and how to perform simple analytic tasks such as root solving. These functions are the building blocks of our solution to specific problems.

String-related functions are a very important category of functions. They will be introduced in this chapter. In R, texts are stored in character vectors, and a good number of functions and techniques are useful to manipulate and analyze texts. In this chapter, you will learn the basics and useful techniques of working with strings, including the following topics:

  • Basic manipulation of character vectors

  • Converting between date/time objects and their string representations

  • Using regular expressions to extract information from texts

Getting started with strings


Character vectors in R are used to store text data. You previously learned that in contrast with many other programming languages, a character vector is not a vector of single characters, letters, or alphabet symbols such as a, b, c. Rather, it is a vector of strings.

R also provides a variety of built-in functions to deal with character vectors. Many of them also perform vectorized operations so they can process numerous string values in one step.

In this section, you will learn more about printing, combining, and transforming texts stored in character vectors.

Printing texts

Perhaps the most basic thing we can do with texts is to view them. R provides several ways to view texts in the console.

The simplest way is to directly type the string in quotation marks:

"Hello"
## [1] "Hello"

Like a numeric vector of floating numbers, a character vector is a vector of character values, or strings. Hello is in the first position and is the only element of the character...

Formatting date/time


In data analysis, it is common to encounter date and time data types. Perhaps, the simplest functions related with date are Sys.Date(), which returns the current date, and Sys.time(), which returns the current time.

As the book is being rendered, the date is printed as follows:

Sys.Date()
## [1] "2016-02-26"

And the time is:

Sys.time()
## [1] "2016-02-26 22:12:25 CST"

From the output, the date and time look like character vectors, but actually they are not:

current_date <- Sys.Date()
as.numeric(current_date)
## [1] 16857
current_time <- Sys.time()
as.numeric(current_time)
## [1] 1456495945

They are, in essence, numeric values relative to an origin and have special methods to do date/time calculations. For a date, its numeric value means the number of days passed after 1970-01-01. For a time, its numeric value means the number of seconds passed after 1970-01-01 00:00.00 UTC.

Parsing text as date/time

We can create a date relative to a customized...

Using regular expressions


For research, you may need to download data from open-access websites or authentication-required databases. These data sources provide data in various formats, and most of the data supplied are very likely well-organized. For example, many economic and financial databases provide data in the CSV format, which is a widely supported text format to represent tabular data. A typical CSV format looks like this:

id,name,score 
1,A,20 
2,B,30 
3,C,25

In R, it is convenient to call read.csv() to import a CSV file as a data frame with the right header and data types because the format is a natural representation of a data frame.

However, not all data files are well organized, and dealing with poorly organized data is painstaking. Built-in functions such as read.table() and read.csv() work in many situations, but they may not help at all for such format-less data.

For example, if you need to analyze raw data (messages.txt) organized in a CSV-like format as shown...

Summary


In this chapter, you learned about a number of built-in functions for manipulating character vectors and converting between date/time objects and their string representations. You also learned about the basic idea of regular expressions, a very powerful tool to check and filter string data and extract information from raw texts.

With the vocabulary we built in this and previous chapters, we are now able to work with basic data structures. In the next chapter, you will learn about some tools and techniques to work with data. We will get started with reading and writing simple data files, producing graphics of various types, applying basic statistical analysis and data-mining models on simple datasets, and using numeric methods to solve root-solving and optimization problems.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learning R Programming
Published in: Oct 2016Publisher: PacktISBN-13: 9781785889776
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Kun Ren

Kun Ren has used R for nearly 4 years in quantitative trading, along with C++ and C#, and he has worked very intensively (more than 8-10 hours every day) on useful R packages that the community does not offer yet. He contributes to packages developed by other authors and reports issues to make things work better. He is also a frequent speaker at R conferences in China and has given multiple talks. Kun also has a great social media presence. Additionally, he has substantially contributed to various projects, which is evident from his GitHub account: https://github.com/renkun-ken https://cn.linkedin.com/in/kun-ren-76027530 http://renkun.me/ http://renkun.me/formattable/ http://renkun.me/pipeR/ http://renkun.me/rlist/
Read more about Kun Ren