Packt+ | Advance your knowledge in tech

You're reading from Julia Cookbook

Product typeBook

Published inSep 2016

Reading LevelBeginner

Publisher

ISBN-139781785882012

Edition1st Edition

Languages

Julia

Concepts

Data Science

Authors (2):

Raj R Jalem

Jalem Raj Rohit

View More author details

Chapter 5. Working with Visualizations

In this chapter, we will cover the following recipes:

Plotting basic arrays
Plotting dataframes
Exploratory data analysis through plots
Line plots
Scatter plots
Histograms
Aesthetic customizations

Introduction

In this chapter, you will learn how to visualize and present data and analyze the findings from the data science approach you have adopted to solve a particular problem. There are various types of visualization to display your findings: bar plots, the scatter plots, pie charts, and so on, and it is very important to choose an appropriate method that can reflect your findings and work in a sensible and an aesthetically pleasing manner.

Importance of visualizations and reporting in data science:

Visualization is the art of displaying quantitative information in a sensible, legible, and aesthetically pleasing way. It consists of plotting quantitative information in the form of various graphs as well as putting forward or compiling the analyses and the results in a precise and a legible report.

Visualizations and reporting should always be done in such a way that the person or the group to whom they are being presented to should be able to follow and appreciate it with minimal background...

Plotting basic arrays

Arrays are one of the fundamental data structures used in data analysis to store various types of data. They are also a quick way to store columns or dimensions in data, for statistical analysis as well as exploratory analysis through plots and visualization. Arrays are also very easy to plot, as they are simple. When a visualization is being done with two columns of a dataset, it means that the two column values are taken in the form of separate arrays and then plotted against each other, which again makes arrays very important.

Getting ready

To get started with this recipe, you have to install the Gadfly library. This can be done using the following command:

Pkg.add("Gadfly")

Next, to import the library, we can import it by calling by its name, which is Gadfly. This can be done as follows:

using Gadfly

How to do it...

For this recipe, you need to perform the following steps:

Firstly, let's generate two random arrays a and b and plot them against each other. We can use...

Plotting dataframes

Dataframes are one of the datastructures on which most analytics and machine learning implementations are done. It is the most popular and best way for representing tabular data. They are made up of several arrays and similar data structures, and they can store data in multiple formats, including logical data, string data, and numeric data. So, visualizations can be done against one or multiple columns of the same dataframe, which makes it easy for the analyst to express numerical information in the dataframe.

Getting ready

To get started with this recipe, you have to install the Gadfly library as you did in the previous recipe.

As we will be using the datasets from R packages, we also need to import the RDatasets package. This can be done simply by the using ... syntax, which we use for importing packages:

using RDatasets

How to do it...

For this recipe, you need to perform the following steps:

Firstly, we will learn how to plot different columns of a dataframe against each...

Plotting functions

In data science and statistical modeling, there are several instances where an analyst needs to use several functions for both transforming and exploratory analytics steps. So, one can plot them in Gadfly in a very simple way, which can used to plot separate functions as well as to stack several functions in a single plot.

Getting ready

As we already specified, we will use the Gadfly plotting library for this recipe too. So, follow the installation steps from the previous recipes.

How to do it...

Let's start with a basic function plot to get familiar with the syntax. So, a good basic function to start is the sin() function, which can be invoked as sin. The function can be included directly in the plot command, along with the upper and lower limits of the x axis. The syntax is: plot(function, lower_limt, upper_limit). This can be done as follows:
```
plot(sin, 0, 30)
```
Similarly, if we want to plot multiple functions on a single plot, we can do just like we did in the previous...

Exploratory data analytics through plots

Exploratory data analytics is one of the most important processes in a data science workflow. It is simply a thorough exploration of the data to find any possible patterns that can be identified through basic statistics and the shape of the data. It is mostly done with the help of plots, as visual information is much easier to comprehend than complex statistical terms. So, in this recipe, we will go through some exploratory analytics methods with the help of plots.

Getting ready

The Gadfly library, which we used for our recipes, also contains most of the plots that are frequently used for exploratory data analytics. We will use the same library for this purpose too. So, to install the library, you can follow the installation steps mentioned in the previous recipes.

We will also use datasets from the RDatasets package, which contains datasets that are in the data repository of the R programming language. So, to install the RDatasets package and invoke...

Line plots

Line plots, as we have already seen in the preceding examples, are very effective when it comes to exploratory data analytics. They can be used both to understand correlations and look at data trends. So, by further making use of aesthetics, we can make them more interesting and informative.

Getting ready

We will use the Gadfly library, which we have used in the preceding recipes. So, to install the library, you can follow the installation steps mentioned in the previous recipes.

How to do it...

Let's start with a basic line plot, which plots their incidences of melanoma in the respective years. So, this plot can be seen as a typical time series plot, where the x axis is a time variable and the y axis is the variable that is parameterized by time. So, to plot this, we simply need to include the dataset in the plot() function and include the Geom.line aesthetic, as follows:
```
plot(dataset("Lattice", "melanoma"), x = "Year", y = "Incidence", Geom.line)
```
We can also have multiple line...

Scatter plots

Scatter plots are the most basic plots in exploratory analytics. They help the analyst get a rough idea of the data distribution and the relationship between the corresponding columns, which in turn helps identify some prominent patterns in the data.

Getting ready

We will use the Gadfly library, which we used in the preceding recipes. So, to install the library, you can follow the installation steps mentioned in the previous recipes.

How to do it...

Let's start off with plotting a simple scatter plot of iris features: the length and the width. This will help us identify the relationship between the two features of the flower. This can be done using a line plot similar to the one in the preceding recipe, but including the aesthetic Geom.point instead of Geom.line in the plot() function. This can be done as follows:
```
plot(dataset("datasets", "iris"), x = "SepalLength", y = "SepalWidth", Geom.point)
```
Next, we will try to put in some aesthetics on the plot to make it more informative...

Histograms

Histograms are one of the best ways for visualizing and finding out the three main statistics of a dataset: the mean, median, and mode. Histograms also help analysts get a very clear understanding of the distribution of data. The ability to plot categorical data as well as numerical data is what makes the histogram unique.

Getting ready

We will use the Gadfly library, which we used for understanding and plotting data in the preceding recipes. So, to install the library, you can follow the installation steps mentioned in the previous recipes.

How to do it...

A basic histogram is a simple set of stacked bars, which shows the distribution of a particular feature in a dataset. This can be plotted using the plot() function, with the Geom.histogram attribute as the aesthetic parameter. We will use the diamonds dataset for the purpose. This can be done as follows:
```
plot(dataset("ggplot2", "diamonds"), x = "Price", Geom.histogram)
```
As with earlier plots, color aesthetics can be used to differentiate...

Aesthetic customizations

As we have already gone through how to plot the most important visualizations and their customizations in the Gadfly library, we will also see how to customize them even further. The Gadfly library allows the analyst to almost completely tweak and customize their visualizations so that they can be better fitted to the dataset properties are very flexible for our purposes.

Getting ready

We will use the Gadfly library, which we used in the preceding recipes. So, to install the library, you can follow the installation steps mentioned in the previous recipes.

How to do it...

The limits of the axes can be customized or transformed to the logarithmic scale with the Scale.x_log parameter in the plot() function. This would help in visualizing exponentially increasing data or data in different scales. We will scale the x axis in this example. This can be done as follows:
```
plot(x = rand(10), y = rand(10), Scale.x_log)
```
The minimum and maximum values in the plot or in a particular...

The rest of the chapter is locked

You have been reading a chapter from

Julia Cookbook

Published in: Sep 2016Publisher: ISBN-13: 9781785882012

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Raj R Jalem

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages