In any given discipline, from business to academia, there is a need for data analysis. Among the most popular tools to analyze data sets is R, a programming language that allows you to easily perform statistical analyses and create data visualizations.
In this post, I'm going to share with you the best tools to manipulate datasets in R such that they're easy to analyze. In addition, you'll be introduced to a wide variety of visualization tools that are sure to bring your data to life. If you haven't used R before, no problem! I'll get you set up with the proper software.
This week, save 50% on some of our top R products or pick up any 5 for $50. It's the perfect opportunity to push your analytical skills forward, and get even more out of R...
Downloading R is all well and good, but doing significant work with the language is best done using an integrated development environment, or IDE. One of the most popular choices is RStudio, with support for Mac, Windows, Debian, Ubuntu, and RedHat. Downloading is painless: click this link and choose the proper download for your system and architecture. After installing, fire up RStudio. You're now ready to begin programming with R!
Learning a new programming language is tough, especially if you haven't learned one before. Luckily, there are dozens of great resources available to teach you the ins-and-outs of R. While MOOCs are always an option, you might have better luck using sites like DataCamp or Code School. If you'd rather go old school, I’ll recommend the PDF R for Beginners.
The coolest option that I've seen of late is a package called swirl. This package allows you to learn about R right within RStudio. If I were relearning the language, this would be my first stop.
R by itself can do quite a bit, but the real fun comes in with packages. Put simply, packages extend the functionality of R to do just about anything users can dream of. Don't believe me? Check out all 8,153 of the packages currently available (as of 26/03/2016).
At their core, R packages are just libraries of specially-created R functions. Rather than making an R function and keeping it for the good of one, programmers in the R community share their R functions by packaging them up and sharing them on the Comprehensive R Archive Network (CRAN).
Not all packages are going to come in handy to beginners. That's why I listed some that are integral to any work in R, whether you're a newcomer or a PhD-holding statistican.
- swirl: You can learn R right within RStudio using this package. Lessons take 15-20 minutes, so you're guaranteed to walk away with having learned something, even if only on a coffee break.
- tidyr AKA "Tidy R": Cleans up datasets. This package was actually made by the developers of RStudio. As RStudio describes, tidyr allows users to easily manipulate datasets by categorizing columns of your data. Performing statistical analysis on those columns then becomes a cinche.
- dplyr: Goes hand-in-hand with tidyr. Easily creates data tables (think Excel table), more frequenty referred to by the R community as data frames.
- ggplot2: Considered one of the most important visualization packages in all of R. Its syntax can be a little scary, but once you see a couple of examples, it can be fully utilized to make great visualizations. This really is the R community's visual gold standard.
- htmlwidgets: Allows users to make visualizations that can then be easily exported on the Internet. htmlwidgets is used by a bevy of other packages. You can see them all at this link.
- LightningR: An up-and-coming visualization tool that I've worked with in the past. Lightning's visualizations utilize the best technology in web graphics, and their gallery of visualizations speaks for itself.
The R packages listed above are just a few of my favorites, and are especially good for just starting out. Doing anything with R the first time around can be challenging, and so limiting the number of packages you utilize is important. Keep it simple!
Installing and utilizing packages is an easy three-step process:
Toinstall, enter the command install.packages("<package_name>"), where <package_name> is the name of your package. Next, load the package using the command library(<package_name).
At this point, any functions within the installed package areready to use. Call the function by typing <function>(), where <function> is the function name.
When it comes to utilizing packages, documentation is your best friend. Luckily, any package available on CRAN will have documentation, or perhaps its own site!
Peter Shultz is a student at the University of Michigan, studying computer science.