Packt+ | Advance your knowledge in tech

You're reading from Learning Quantitative Finance with R

Product type Book

Published in Mar 2017

Publisher Packt

ISBN-13 9781786462411

Pages 284 pages

Edition 1st Edition

Languages

Concepts

Financial Technology

Authors (2):

Dr. Param Jeet

PRASHANT VATS

View More author details

Table of Contents (16) Chapters

Learning Quantitative Finance with R

Credits

About the Authors

About the Reviewer

www.PacktPub.com

Customer Feedback

Preface

1. Introduction to R

2. Statistical Modeling

3. Econometric and Wavelet Analysis

4. Time Series Modeling

5. Algorithmic Trading

6. Trading Using Machine Learning

7. Risk Management

8. Optimization

9. Derivative Pricing

Importing and exporting different data types

In R, we can read the files stored from outside the R environment. We can also write the data into files which can be stored and accessed by the operating system. In R, we can read and write different formats of files, such as CSV, Excel, TXT, and so on. In this section, we are going to discuss how to read and write different formats of files.

The required files should be present in the current directory to read them. Otherwise, the directory should be changed to the required destination.

The first step for reading/writing files is to know the working directory. You can find the path of the working directory by running the following code:

>print (getwd())

This will give the paths for the current working directory. If it is not your desired directory, then please set your own desired directory by using the following code:

>setwd("")

For instance, the following code makes the folder C:/Users the working directory:

  >setwd("C:/Users")

How to read and write a CSV format file

A CSV format file is a text file in which values are comma separated. Let us consider a CSV file with the following content from stock-market data:

`Date`	`Open`	`High`	`Low`	`Close`	`Volume`	`Adj Close`
14-10-2016	2139.68	2149.19	2132.98	2132.98	3.23E+09	2132.98
13-10-2016	2130.26	2138.19	2114.72	2132.55	3.58E+09	2132.55
12-10-2016	2137.67	2145.36	2132.77	2139.18	2.98E+09	2139.18
11-10-2016	2161.35	2161.56	2128.84	2136.73	3.44E+09	2136.73
10-10-2016	2160.39	2169.6	2160.39	2163.66	2.92E+09	2163.66

To read the preceding file in R, first save this file in the working directory, and then read it (the name of the file is Sample.csv) using the following code:

>data<-read.csv("Sample.csv") 
>print(data)

When the preceding code gets executed, it will give the following output:

       Date    Open    High     Low   Close     Volume     Adj.Close 
1  14-10-2016 2139.68 2149.19 2132.98 2132.98 3228150000   2132.98 
2  13-10-2016 2130.26 2138.19 2114.72 2132.55 3580450000   2132.55 
3  12-10-2016 2137.67 2145.36 2132.77 2139.18 2977100000   2139.18 
4  11-10-2016 2161.35 2161.56 2128.84 2136.73 3438270000   2136.73 
5  10-10-2016 2160.39 2169.60 2160.39 2163.66 2916550000   2163.66

Read.csv by default produces the file in DataFrame format; this can be checked by running the following code:

>print(is.data.frame(data))

Now, whatever analysis you want to do, you can perform it by applying various functions on the DataFrame in R, and once you have done the analysis, you can write your desired output file using the following code:

>write.csv(data,"result.csv") 
>output <- read.csv("result.csv") 
>print(output)

When the preceding code gets executed, it writes the output file in the working directory folder in CSV format.

XLSX

Excel is the most common format of file for storing data, and it ends with extension .xls or .xlsx.

The xlsx package will be used to read or write .xlsx files in the R environment.

Installing the xlsx package has dependency on Java, so Java needs to be installed on the system. The xlsx package can be installed using the following command:

>install.packages("xlsx")

When the previous command gets executed, it will ask for the nearest CRAN mirror, which the user has to select to install the package. We can verify that the package has been installed or not by executing the following command:

>any(grepl("xlsx",installed.packages()))

If it has been installed successfully, it will show the following output:

[1] TRUE
Loading required package: rJava
Loading required package: methods
Loading required package: xlsxjars

We can load the xlsx library by running the following script:

>library("xlsx")

Now let us save the previous sample file in .xlsx format and read it in the R environment, which can be done by executing the following code:

>data <- read.xlsx("Sample.xlsx", sheetIndex = 1) 
>print(data)

This gives a DataFrame output with the following content:

       Date    Open    High     Low   Close     Volume    Adj.Close 
1 2016-10-14 2139.68 2149.19 2132.98 2132.98 3228150000   2132.98 
2 2016-10-13 2130.26 2138.19 2114.72 2132.55 3580450000   2132.55 
3 2016-10-12 2137.67 2145.36 2132.77 2139.18 2977100000   2139.18 
4 2016-10-11 2161.35 2161.56 2128.84 2136.73 3438270000   2136.73 
5 2016-10-10 2160.39 2169.60 2160.39 2163.66 2916550000   2163.66

Similarly, you can write R files in .xlsx format by executing the following code:

>output<-write.xlsx(data,"result.xlsx") 
>output<- read.csv("result.csv") 
>print(output)

Web data or online sources of data

The Web is one main source of data these days, and we want to directly bring the data from web form to the R environment. R supports this:

URL <- "http://ichart.finance.yahoo.com/table.csv?s=^GSPC" 
snp <- as.data.frame(read.csv(URL)) 
head(snp)

When the preceding code is executed, it directly brings the data for the S&P500 index into R in DataFrame format. A portion of the data has been displayed by using the head() function here:

        Date    Open    High     Low   Close     Volume   Adj.Close 
1 2016-10-14 2139.68 2149.19 2132.98 2132.98 3228150000   2132.98 
2 2016-10-13 2130.26 2138.19 2114.72 2132.55 3580450000   2132.55 
3 2016-10-12 2137.67 2145.36 2132.77 2139.18 2977100000   2139.18 
4 2016-10-11 2161.35 2161.56 2128.84 2136.73 3438270000   2136.73 
5 2016-10-10 2160.39 2169.60 2160.39 2163.66 2916550000   2163.66 
6 2016-10-07 2164.19 2165.86 2144.85 2153.74 3619890000   2153.74

Similarly, if we execute the following code, it brings the DJI index data into the R environment: its sample is displayed here:

>URL <- "http://ichart.finance.yahoo.com/table.csv?s=^DJI" 
>dji <- as.data.frame(read.csv(URL)) 
>head(dji)

This gives the following output:

        Date     Open     High      Low    Close   Volume  Adj.Close 
1 2016-10-14 18177.35 18261.11 18138.38 18138.38 87050000  18138.38 
2 2016-10-13 18088.32 18137.70 17959.95 18098.94 83160000  18098.94 
3 2016-10-12 18132.63 18193.96 18082.09 18144.20 72230000  18144.20 
4 2016-10-11 18308.43 18312.33 18061.96 18128.66 88610000  18128.66 
5 2016-10-10 18282.95 18399.96 18282.95 18329.04 72110000  18329.04 
6 2016-10-07 18295.35 18319.73 18149.35 18240.49 82680000  18240.49

Please note that we will be mostly using the snp and dji indexes for example illustrations in the rest of the book and these will be referred to as snp and dji.

Databases

A relational database stores data in normalized format, and to perform statistical analysis, we need to write complex and advance queries. But R can connect to various relational databases such as MySQL Oracle, and SQL Server, easily and convert the data tables into DataFrames. Once the data is in DataFrame format, doing statistical analysis is easy to perform using all the available functions and packages.

In this section, we will take the example of MySQL as reference.

R has a built-in package, RMySQL , which provides connectivity with the database; it can be installed using the following command:

>install.packages("RMySQL")

Once the package is installed, we can create a connection object to create a connection with the database. It takes username, password, database name, and localhost name as input. We can give our inputs and use the following command to connect with the required database:

>mysqlconnection = dbConnect(MySQL(), user = '...', password = '...', dbname = '..',host = '.....')

When the database is connected, we can list the table that is present in the database by executing the following command:

>dbListTables(mysqlconnection)

We can query the database using the function dbSendQuery(), and the result is returned to R by using function fetch(). Then the output is stored in DataFrame format:

>result = dbSendQuery(mysqlconnection, "select * from <table name>") 
>data.frame = fetch(result) 
>print(data.fame)

When the previous code gets executed, it returns the required output.

We can query with a filter clause, update rows in database tables, insert data into a database table, create tables, drop tables, and so on by sending queries through dbSendQuery().