Reader small image

You're reading from  Julia Cookbook

Product typeBook
Published inSep 2016
Reading LevelBeginner
Publisher
ISBN-139781785882012
Edition1st Edition
Languages
Concepts
Right arrow
Authors (2):
Jalem Raj Rohit
Jalem Raj Rohit
author image
Jalem Raj Rohit

Jalem Raj Rohit is an IIT Jodhpur graduate with a keen interest in recommender systems, machine learning, and serverless and distributed systems. Raj currently works as a senior consultantdata scienceand NLP at Episource, before which he worked at Zomato and Kayako. He contributes to open source projects in Python, Go, and Julia. He also speaks at tech conferences about serverless engineering and machine learning.
Read more about Jalem Raj Rohit

View More author details
Right arrow

Chapter 5. Working with Visualizations

In this chapter, we will cover the following recipes:

  • Plotting basic arrays

  • Plotting dataframes

  • Exploratory data analysis through plots

  • Line plots

  • Scatter plots

  • Histograms

  • Aesthetic customizations

Introduction


In this chapter, you will learn how to visualize and present data and analyze the findings from the data science approach you have adopted to solve a particular problem. There are various types of visualization to display your findings: bar plots, the scatter plots, pie charts, and so on, and it is very important to choose an appropriate method that can reflect your findings and work in a sensible and an aesthetically pleasing manner.

Importance of visualizations and reporting in data science:

Visualization is the art of displaying quantitative information in a sensible, legible, and aesthetically pleasing way. It consists of plotting quantitative information in the form of various graphs as well as putting forward or compiling the analyses and the results in a precise and a legible report.

Visualizations and reporting should always be done in such a way that the person or the group to whom they are being presented to should be able to follow and appreciate it with minimal background...

Plotting basic arrays


Arrays are one of the fundamental data structures used in data analysis to store various types of data. They are also a quick way to store columns or dimensions in data, for statistical analysis as well as exploratory analysis through plots and visualization. Arrays are also very easy to plot, as they are simple. When a visualization is being done with two columns of a dataset, it means that the two column values are taken in the form of separate arrays and then plotted against each other, which again makes arrays very important.

Getting ready

To get started with this recipe, you have to install the Gadfly library. This can be done using the following command:

Pkg.add("Gadfly")

Next, to import the library, we can import it by calling by its name, which is Gadfly. This can be done as follows:

using Gadfly

How to do it...

For this recipe, you need to perform the following steps:

  1. Firstly, let's generate two random arrays a and b and plot them against each other. We can use...

Plotting dataframes


Dataframes are one of the datastructures on which most analytics and machine learning implementations are done. It is the most popular and best way for representing tabular data. They are made up of several arrays and similar data structures, and they can store data in multiple formats, including logical data, string data, and numeric data. So, visualizations can be done against one or multiple columns of the same dataframe, which makes it easy for the analyst to express numerical information in the dataframe.

Getting ready

To get started with this recipe, you have to install the Gadfly library as you did in the previous recipe.

As we will be using the datasets from R packages, we also need to import the RDatasets package. This can be done simply by the using ... syntax, which we use for importing packages:

using RDatasets

How to do it...

For this recipe, you need to perform the following steps:

  1. Firstly, we will learn how to plot different columns of a dataframe against each...

Plotting functions


In data science and statistical modeling, there are several instances where an analyst needs to use several functions for both transforming and exploratory analytics steps. So, one can plot them in Gadfly in a very simple way, which can used to plot separate functions as well as to stack several functions in a single plot.

Getting ready

As we already specified, we will use the Gadfly plotting library for this recipe too. So, follow the installation steps from the previous recipes.

How to do it...

  1. Let's start with a basic function plot to get familiar with the syntax. So, a good basic function to start is the sin() function, which can be invoked as sin. The function can be included directly in the plot command, along with the upper and lower limits of the x axis. The syntax is: plot(function, lower_limt, upper_limit). This can be done as follows:

    plot(sin, 0, 30)
    

  2. Similarly, if we want to plot multiple functions on a single plot, we can do just like we did in the previous...

Exploratory data analytics through plots


Exploratory data analytics is one of the most important processes in a data science workflow. It is simply a thorough exploration of the data to find any possible patterns that can be identified through basic statistics and the shape of the data. It is mostly done with the help of plots, as visual information is much easier to comprehend than complex statistical terms. So, in this recipe, we will go through some exploratory analytics methods with the help of plots.

Getting ready

The Gadfly library, which we used for our recipes, also contains most of the plots that are frequently used for exploratory data analytics. We will use the same library for this purpose too. So, to install the library, you can follow the installation steps mentioned in the previous recipes.

We will also use datasets from the RDatasets package, which contains datasets that are in the data repository of the R programming language. So, to install the RDatasets package and invoke...

Line plots


Line plots, as we have already seen in the preceding examples, are very effective when it comes to exploratory data analytics. They can be used both to understand correlations and look at data trends. So, by further making use of aesthetics, we can make them more interesting and informative.

Getting ready

We will use the Gadfly library, which we have used in the preceding recipes. So, to install the library, you can follow the installation steps mentioned in the previous recipes.

How to do it...

  1. Let's start with a basic line plot, which plots their incidences of melanoma in the respective years. So, this plot can be seen as a typical time series plot, where the x axis is a time variable and the y axis is the variable that is parameterized by time. So, to plot this, we simply need to include the dataset in the plot() function and include the Geom.line aesthetic, as follows:

    plot(dataset("Lattice", "melanoma"), x = "Year", y = "Incidence", Geom.line)
    

  2. We can also have multiple line...

Scatter plots


Scatter plots are the most basic plots in exploratory analytics. They help the analyst get a rough idea of the data distribution and the relationship between the corresponding columns, which in turn helps identify some prominent patterns in the data.

Getting ready

We will use the Gadfly library, which we used in the preceding recipes. So, to install the library, you can follow the installation steps mentioned in the previous recipes.

How to do it...

  1. Let's start off with plotting a simple scatter plot of iris features: the length and the width. This will help us identify the relationship between the two features of the flower. This can be done using a line plot similar to the one in the preceding recipe, but including the aesthetic Geom.point instead of Geom.line in the plot() function. This can be done as follows:

    plot(dataset("datasets", "iris"), x = "SepalLength", y = "SepalWidth", Geom.point)
    

  2. Next, we will try to put in some aesthetics on the plot to make it more informative...

Histograms


Histograms are one of the best ways for visualizing and finding out the three main statistics of a dataset: the mean, median, and mode. Histograms also help analysts get a very clear understanding of the distribution of data. The ability to plot categorical data as well as numerical data is what makes the histogram unique.

Getting ready

We will use the Gadfly library, which we used for understanding and plotting data in the preceding recipes. So, to install the library, you can follow the installation steps mentioned in the previous recipes.

How to do it...

  1. A basic histogram is a simple set of stacked bars, which shows the distribution of a particular feature in a dataset. This can be plotted using the plot() function, with the Geom.histogram attribute as the aesthetic parameter. We will use the diamonds dataset for the purpose. This can be done as follows:

    plot(dataset("ggplot2", "diamonds"), x = "Price", Geom.histogram)
    

  2. As with earlier plots, color aesthetics can be used to differentiate...

Aesthetic customizations


As we have already gone through how to plot the most important visualizations and their customizations in the Gadfly library, we will also see how to customize them even further. The Gadfly library allows the analyst to almost completely tweak and customize their visualizations so that they can be better fitted to the dataset properties are very flexible for our purposes.

Getting ready

We will use the Gadfly library, which we used in the preceding recipes. So, to install the library, you can follow the installation steps mentioned in the previous recipes.

How to do it...

  1. The limits of the axes can be customized or transformed to the logarithmic scale with the Scale.x_log parameter in the plot() function. This would help in visualizing exponentially increasing data or data in different scales. We will scale the x axis in this example. This can be done as follows:

    plot(x = rand(10), y = rand(10), Scale.x_log)
    

  2. The minimum and maximum values in the plot or in a particular...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Julia Cookbook
Published in: Sep 2016Publisher: ISBN-13: 9781785882012
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Jalem Raj Rohit

Jalem Raj Rohit is an IIT Jodhpur graduate with a keen interest in recommender systems, machine learning, and serverless and distributed systems. Raj currently works as a senior consultantdata scienceand NLP at Episource, before which he worked at Zomato and Kayako. He contributes to open source projects in Python, Go, and Julia. He also speaks at tech conferences about serverless engineering and machine learning.
Read more about Jalem Raj Rohit