Reader small image

You're reading from  Learning Jupyter

Product typeBook
Published inNov 2016
Reading LevelIntermediate
PublisherPackt
ISBN-139781785884870
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Dan Toomey
Dan Toomey
author image
Dan Toomey

Dan Toomey has been developing application software for over 20 years. He has worked in a variety of industries and companies, in roles from sole contributor to VP/CTO-level. For the last few years, he has been contracting for companies in the eastern Massachusetts area. Dan has been contracting under Dan Toomey Software Corp. Dan has also written R for Data Science, Jupyter for Data Sciences, and the Jupyter Cookbook, all with Packt.
Read more about Dan Toomey

Right arrow

Chapter 4. Jupyter Julia Scripting

Julia is a language specifically designed to be used for high performance, numerical computing. Most importantly, it differs from the previous scripting languages covered in this book (R and, to a certain extent, Python) in that Julia is a full language, not limited to data handling.

In this chapter, we will cover the following topics:

  • Adding Julia scripting to your installation

  • Basic Julia in Jupyter

  • Julia limitations in Jupyter

  • Standard Julia capabilities

  • Julia visualizations in Jupyter

  • Julia Vega plotting

  • Julia parallel processing

  • Julia control flow

  • Julia regular expressions

  • Julia unit testing

Adding Julia scripting to your installation


We will install on Mac and Windows. There are separate steps for making Julia scripting available in your Jupyter installation.

Adding Julia scripts to Jupyter on a Mac

If you are running on a Mac, you are in luck. The Mac installation of Jupyter includes Julia 0.4.5, as can be seen in the New menu:

Adding Julia scripts to Jupyter on Windows

If you are running on a Windows machine, there are a few steps to get Julia on Jupyter. Remember, this environment was really developed for Linux.

First, we need to install Julia on your Windows machine. Navigate to the Julia download page (http://julialang.org/downloads/), download the correct version, which is Julia 0.4.5 for Windows (in my case, I used the Windows self-extracting EXE file for 32-bit machines), and run the installation with standard defaults.

Note

You must run the Julia installation as Administrator on your machine. After downloading the file, open the Downloads folder, right-click on the Julia...

Basic Julia in Jupyter


In this example we will use the Iris dataset for some standard analysis. So, start a new Julia notebook and call it Julia Iris. We can enter a small script to see how the steps progress for a Julia script.

This script uses another package for plotting, Gadfly. You would have to go through similar steps as before to install the package before operating the script.

Enter the following script into separate cells of your notebook:

using RDatasets, DataFrames, Gadfly
set_default_plot_size(5inch, 5inch/golden);
plot(dataset("datasets","iris"), x="SepalWidth",
        y="SepalLength", color="Species")

RDataSets is a library containing several of the commonly used R datasets, such as Iris. This is a simple script-we define our libraries that we are going to use, set the size of the plot area, and plot out the Iris data points (color coded to species).

So, you would end up with a starting screen that looks like the following screenshot:

We should take note of a few aspects...

Julia limitations in Jupyter


I have written Julia scripts and accessed different Julia libraries without issue in Jupyter. I have not noticed any limitations on its use or any performance degradation. I imagine some aspects of Julia that are very screen dependent (such as using the Julia Webstack to build a website) may be hampered by conflicting uses of the same concept.

I have repeatedly seen updates being run when I am attempting to run a Julia script, as in the following screenshot. I am not sure why they decided to always update the underlying tool rather than use what is in play and have the user specify whether to update libraries:

I have also noticed that once a Julia notebook is opened, even though I have closed the page, it will still display Running on the home page. I don't recall seeing this behavior with the other script languages available.

Another issue has been trying to use a secured package in my script, for example, plotly. It appears to be a clean process to get credentials...

Standard Julia capabilities


Similar to functions in other languages, Julia can perform most of the rudimentary statistics on your data using the describe function, as in the example script that follows:

using RDatasets
describe(dataset("datasets", "iris"))

This script accesses the Iris dataset and displays summary statistics on the dataset.

If we were to build a notebook to show describe in use against the iris dataset (loaded in the previous example), we would end up with a display like this:

You can see the standard statistics generated for each of the variables in the dataset. I thought it was interesting that the count and percentage of NA values in the dataset are provided. I have found that I usually have to double-check to exclude this data using other languages. Here, it is a quick, built-in reminder.

Julia visualizations in Jupyter


The most popular tool for visualizations in Julia is the Gadfly package. We can add the Gadfly package (as described at the beginning of this chapter) using the add function:

Pkg.add("Gadfly")

From then on, we can make reference to the Gadfly package in any script using the following command:

using Gadfly

Julia Gadfly scatterplot

We can use the plot() function with standard defaults (no type arguments) to generate a scatterplot. For example, with the simple script:

using Gadfly
srand(111)
plot(x=rand(7), y=rand(7))

Note

We use the srand() function in all examples that use random results. The srand() function sets the random number seed value, so all results in this chapter are reproducible.

We generate a nice, clean scatterplot, as shown in the following screenshot:

Julia Gadfly histogram

We can produce other graph types as well, for example, a histogram using this script:

using Gadfly
srand(111)
plot(x=randn(113), Geom.histogram(bincount=10))

This script generates...

Julia Vega plotting


Another popular graphics package is Vega. The main feature of Vega is the ability to describe your graphic using language primitives, such as JSON. Vega produces most of the standard plots. Here is an example script using Vega for a pie chart:

#Pkg.add("Vega")
#Pkg.add("Compat")
#Pkg.add("Patchwork")
using Vega
using Compat
using Patchwork
Patchwork.load_js_runtime()
stock = ["chairs", "tables", "desks", "rugs", "lamps"];
quantity = [15, 10, 10, 5, 20];
piechart(x = stock, y = quantity)

The generated graphic produced in Jupyter is shown in the following screenshot:

Note

Vega gives you the option on the resultant display to Save As PNG.

Julia PyPlot plotting

Another plotting package available is PyPlot. PyPlot is one of the standard Python visualization libraries and is directly accessible from Julia. We can take this small script to produce an interesting visualization:

#Pkg.add("PyPlot")
using PyPlot
precipitation = [0,0,0,0,0,0,0,0,0,0,0.12,0.01,0,0,0,0.37,0,0,0,0,
...

Julia parallel processing


An advanced built-in feature of Julia is to use parallel processing in your script. Normally, you can specify the number of processes that you want to use, directly in Julia. However, in Jupyter, you would use the addproc() function to add an additional process available for use in your script. For example, this small script:

addprocs(1)
srand(111)
r = remotecall(rand, 2, 3, 4)
s = @spawnat 2 1 .+ fetch(r)
fetch(s)

This example makes a call to rand, the random number generator with that code executing on the 2nd parameter to the function call (process 2), and then passes the remaining arguments to the rand function there (making rand generate a 3 x 4 matrix of random numbers). spawnat is a macro that evaluates the processes mentioned. Then, fetch accesses the result of the spawned processes.

We can see the results in the example in Jupyter as shown in the following screenshot:

Julia control flow


Julia has a complete set of control flows. As an example, we could have a small function that determines the larger of two numbers:

function larger(x, y) 
    if (x>y) 
        return x
    end
    return y
end
println(larger(7,8))

There are several features that you must note:

  • The end statement for the if statement

  • end the closing of the function

  • Indentation of the statements within the function

  • Indentation of the handling of a true condition within an if statement

If we run this in Jupyter, we see the output shown in the following screenshot:

Julia regular expressions


Julia has built-in regular expression handling-as do most modern programming languages. There is no need for a using statement, as regular expressions are basic features of strings in Julia.

We could have a small script that verifies whether a string matches a phone number, for example:

ismatch(r"^\([0-9]{3}\)[0-9]{3}-[0-9]{4}$", "(781)244-1212")
ismatch(r"^\([0-9]{3}\)[0-9]{3}-[0-9]{4}$", "-781-244-1212")

When run in Jupyter, we see the following results, that is the first number is conformant to the format and the second is not:

Julia unit testing


As a full language, Julia has unit testing abilities to make sure your code is performing as expected. The unit tests usually reside in the tests folder.

Two of the standard functions available for unit testing in Julia are FactCheck and Base.Test. They both do the same thing, but react differently to failed tests. FactCheck will generate an error message that will not stop processing on a failure. If you provide an error handler, that error handler can take control of the test.

Base.Test will throw an exception and stop processing on the first test failure. In that regard, it is probably not useful as a unit testing function so much as a runtime test that you may put in place to make sure parameters are within reason, or otherwise, just stop processing before something bad happens.

Both packages are built-in to the standard Julia distributions.

As an example, we can create a unit tests notebook that does the same tests and see the resulting, different responses for errors...

Summary


In this chapter, we added the ability to use Julia scripts in your Jupyter Notebook. We added a Julia library not included in the standard Julia installation. We saw basic features of Julia in use. We outlined some of the limitations encountered using Julia in Jupyter. We displayed graphics using some of the graphics packages available, including Gadfly, Winston, Vega, and PyPlot. We saw parallel processing in action. We saw a small control flow example, and lastly, we saw how to add unit testing to your Julia script.

In the next chapter, we will learn all about using JavaScript in a Jupyter Notebook.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learning Jupyter
Published in: Nov 2016Publisher: PacktISBN-13: 9781785884870
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Dan Toomey

Dan Toomey has been developing application software for over 20 years. He has worked in a variety of industries and companies, in roles from sole contributor to VP/CTO-level. For the last few years, he has been contracting for companies in the eastern Massachusetts area. Dan has been contracting under Dan Toomey Software Corp. Dan has also written R for Data Science, Jupyter for Data Sciences, and the Jupyter Cookbook, all with Packt.
Read more about Dan Toomey