Basic plots and the ggplot2 package
This section will review how to make basic plots using the built-in R functions and the ggplot2 package to plot graphics.
Basic plots in R include histograms and scatterplots. To plot a histogram, we use the hist() function:
The output is shown in the following plot:
You can plot mathematical formulas with the plot() function as follows:
The output is shown in the following plot:
You can graph a univariate mathematical function on an interval using the curve() function with the from and to arguments to set the left and right endpoints, respectively. The expr argument allows you to set a numeric vector or function that returns a numeric vector as an output, as follows:
In the following figure, the plot to your left shows the curve for cox(x) and the plot to the right shows the curve for x^2. As you can see, using the from and to arguments, we can specify the x values to show in our figure.
You can also graph scatterplots using the plot() function. For example, we can use the iris dataset as part of R to plot Sepal.Length versus Sepal.Width as follows:
The output is shown in the following plot:
R has built-in functions that allow you to plot other types of graphics such as the barplots(), dotchart(), pie(), and boxplot() functions. The following are some examples using the VADeaths dataset:
The output is shown in the following plot:
However, when working with data frames, it is often much simpler to use the ggplot2 package to make a bar plot, since your data will not have to be converted to a vector or matrix first. However, you need to be aware that ggplot2 often requires that your data be stored in a data frame in long format and not wide format.
The following is an example of data stored in wide format. In this example, we look at the expression level of the MYC and BRCA2 genes in two different cell lines, after these cells were treated with a vehicle-control, drug1 or drug2 for 48 hours:
The following is the data rewritten in long format:
Instead of rewriting the data frame by hand, this process can be automated using the melt() function, which is a part of the reshape2 package:
Now, we can plot the data using ggplot2 as follows:
The output is shown in the following plot:
Another useful trick to know is how to add error bars to bar plots. Here, we have a summary data frame of standard deviation (sd), standard error (se), and confidence interval (ci) for the geneExpdata.long dataset as follows:
The result is shown in the following plot:
Going back to the VADeaths example, we could also plot a Cleveland dot plot (dot chart) as follows:
Note
Note that the built-in dotchart() function requires that the data be stored as a vector or matrix.
The result is shown in the following plot:
The following are some other graphics you can generate with built-in R functions:
You can generate pie charts with the pie() function as follows:
You can generate box-and-whisker plots with the boxplot() function as follows:
Note
Note that unlike other built-in R graphing functions, the boxplot() function takes data frames as the input.
Using our cell line drug treatment experiment, we can graph MYC expression for all cell lines by condition. The result is shown in the following plot:
The following is another example using the iris dataset to plot Petal.Width by Species:
The result is shown in the following plot: