Understanding Julia's Ecosystem

Julia is a new programming language compared to other existing popular programming languages. Julia was presented publicly to the world and became open source in February of 2012. It all started in 2009, when three developers—Viral Shah, Stefan Karpinski, and Jeff Bezanson at the Massachusetts Institute of Technology (MIT), under the supervision of Professor Alan Edelman in the Applied Computing group—started working on a project. This lead to Julia. All of the principal developers are still actively involved with the JuliaLang. They are committed not just to the core language but to the different libraries that have evolved in its ecosystem. Julia is based on solid principles, which we will learn throughout the book. It is becoming more famous day by day, continuously gaining in the ranks of the TIOBE index (currently at 43), and gaining traction on Stack Overflow. Researchers are attracted to it, especially those from a scientific computing background.

Anyone can check the source code, which is available on GitHub (https://github.com/JuliaLang/julia). The current release at the time of writing this book is 0.6 with 633 contributors, 39,010 commits, and 9,398 stars on GitHub. Most of the core is written in Julia itself and there are a few chunks of code in C/C++, Lisp, and Scheme.

This chapter will take you through the installation and a basic understanding of all the necessary components of Julia. This chapter covers the following topics:

What makes Julia unique?
Installing Julia
Julia's importance in data science
Using REPL
Using Jupyter Notebook
What is Juno?
Package management
A brief about multiple dispatch
Understanding LLVM and JIT

What makes Julia unique?

Scientific computing requires the highest computing requirements. Over the years, the scientific community has used dynamic languages, which are comparatively much slower, to build their applications. A major reason for this is that applications are generally developed by physicists, biologists, financial experts, and other domain experts who, despite having experience with programming, are not seasoned developers. These experts always prefer dynamic languages over statically typed languages, which could have given them better performance, simply because they ease development and readability. However, there are now special packages to improve the performance of the code, such as Numba for Python. As the compiler techniques and language design has advanced, it is now possible to eliminate the trade-off between performance and dynamic prototyping. The requirement was to build a language, that is easy to read and code in, like Python, which is a dynamic language and gives the performance of C, which is a statically typed language. In 2012, a new language emerged—Julia. It is a general purpose programming language highly suited for scientific and technical computing. Julia's performance is comparable to C/C++ measured on the different benchmarks available on the JuliaLang's homepage and simultaneously provides an environment that can be used effectively for prototyping, like Python. Julia is able to achieve such performance because of its design and Low Level Virtual Machine (LLVM)-based just-in-time (JIT) compiler. These enable it to approach the performance of C and Fortran. We will be reading more about LLVM and JIT at the end of the chapter. The following quote is from the development team of Julia—the gist of why Julia was created (source: https://julialang.org/blog/2012/02/why-we-created-julia):

We are greedy: we want more. We want a language that's open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that's homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like MATLAB. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as MATLAB, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled. (Did we mention it should be as fast as C?)

Julia is highly influenced by Python because of its readability and rapid prototyping capabilities, by R because of the support it gives to mathematical and statistical operations, by MATLAB (also GNU Octave) because of the vectorized numerical functions, especially matrices, and by some other languages too. Some of these languages have been in existence for more than 20 years now. Julia borrows ideologies from many of these languages and tries to bring the best of all these worlds together, and quietly succeeds too!

Features and advantages of Julia

Julia is really good at scientific computing but is not restricted to just that, as it can also be used for web and general purpose programming. Julia's development team aims to create a remarkable and previously unseen combination of power and efficiency in one single language without compromising ease of use. Most of Julia's core is implemented in Julia. Julia's parser is written in Scheme. Julia's efficient and cross-platform I/O is provided by the libuv of Node.js.

Some of Julia's features are mentioned as follows:

It is designed for distributed and parallel computation.
Julia provides an extensive library of mathematical functions with great numerical accuracy.
Julia gives the functionality of multiple dispatch. It will be explained in detail in coming chapters. Multiple dispatch refers to using many combinations of argument types to define function behaviors. Julia provides efficient, specialized, and automatic generation of code for different argument types.
The Pycall package enables Julia to call Python functions in its code and MATLAB packages using the MATLAB.jl package. Functions and libraries written in C can also be called directly without any need for APIs or wrappers.
Julia provides powerful shell-like capabilities for managing other processes in the system.
Unlike other languages, user-defined types in Julia are compact and quite fast as built-ins.
Scientific computations makes great use of vectorized code to gain performance benefits. Julia eliminates the need to vectorize code to gain performance. De-vectorized code written in Julia can be as fast as the vectorized code.
It uses lightweight green threading, also known as tasks or coroutines, cooperative multitasking, or one-shot continuations.
Julia has a powerful type system. The conversions provided are elegant and extensible.
It has efficient support for Unicode.
It has facilities for metaprogramming and Lisp-like macros.
It has a built-in package manager (Pkg).
It's free and open source with an MIT license.

Installing Julia

As mentioned earlier, Julia is open source and is available for free. It can be downloaded from the website at http://julialang.org/downloads/.

The website has links to documentation, tutorials, learning, videos, and examples. The documentation can be downloaded in popular formats, as shown in the following screenshot:

It is highly recommended to use the generic binaries for Linux provided on the julialang.org website.

As Ubuntu and Fedora are widely used Linux distributions, a few developers were kind enough to make the installation on these distributions easier by providing it through package manager. We will go through them in the following sections.

Julia on Ubuntu (Linux)

Ubuntu and its derived distributions are one of the most famous Linux distributions. Julia's deb packages (self-extracting binaries) are available on the website of JuliaLang, mentioned earlier. These are available for both 32-bit and 64-bit distributions. One can also add Personal Package Archive (PPA), which is treated as an apt repository to build and publish Ubuntu source packages. In the Terminal, type the following commands:

$ sudo apt-get add-repository ppa:staticfloat/juliareleases
$ sudo apt-get update

This adds the PPA and updates the package index in the repository. Now install Julia using the following command:

$ sudo apt-get install Julia

The installation is complete. To check whether the installation is successful in the Terminal, type the following:

$ julia --version

This gives the installed Julia's version:

$ julia version 0.5.0

To uninstall Julia, simply use apt to remove it:

$ sudo apt-get remove julia

Julia on Fedora/CentOS/Red Hat (Linux)

For Fedora/RHEL/CentOS or distributions based on them, enable the EPEL repository for your distribution version. Then, click on the link provided. Enable Julia's repository using the following:

$ dnf copr enable nalimilan/julia

Or copy the relevant .repo file available at:

/etc/yum.repos.d/

Finally, in the Terminal type the following:

$ yum install julia

Julia on Windows

Go to the Julia download page (https://julialang.org/downloads/) and get the .exe file provided according to your system's architecture (32-bit/64-bit). The architecture can be found on the property settings of the computer. If it is amd64 or x86_64, then go for 64-bit binary (.exe), otherwise go for 32-bit binary. Julia is installed on Windows by running the downloaded .exe file, which will extract Julia into a folder. Inside this folder is a batch file called julia.exe, which can be used to start the Julia console.

Julia on Mac

Users with macOS need to click on the downloaded .dmg file to run the disk image. After that, drag the app icon into the Applications folder. It may prompt you to ask if you want to continue, as the source has been downloaded from the internet and so is not considered secure. Click on Continue if it was downloaded from the official Julia language website. Julia can also be installed using Homebrew on a Mac, as follows:

$ brew update
$ brew tap staticfloat/julia
$ brew install julia

The installation is complete. To check whether the installation is successful in the Terminal, type the following:

$ julia --version

This gives you the Julia version installed.

Building from source

Building from source could be challenging for beginners. We assume that you are on Linux (Ubuntu) right now and are building from source. This provides the latest build of Julia, which may not be completely stable. Perform the following steps to build Julia from source:

On the downloads page of the Julia website, download the source. You can choose Tarball or GitHub. It is recommended to use GitHub. To clone the repo, GitHub must be installed on the machine. Otherwise, choose Download as ZIP. Here is the link: https://github.com/JuliaLang/julia.git.
To build Julia, it requires various compilers: g++, gfortran, and m4. We need to install them first, if not installed already, using the $ sudo apt-get install gfortran g++ m4 command.
Traverse inside the Julia directory and start the make process, using the following command:

      $ cd julia
      $ make

On a successful build, Julia can be started up with the ./julia command.
If you used GitHub to download the source, you can stay up to date by compiling the newest version using the following commands:

      $ git pull 
      $ make clean 
      $ make

Understanding the directory structure of Julia's source

Building from source on Windows and macOS is also straightforward. It can be found at https://github.com/juliaLang/julia/.

Julia's source stack

Let's have a look at the directories and their content:

Directory	Contents
`base/`	Julia's standard library
`contrib/`	Miscellaneous set of scripts, configuration files
`deps/`	External dependencies
`doc/src/manual`	Source for user manual
`doc/src/stdlib`	Source for standard library function help text
`examples/`	Example Julia programs
`src/`	Source for Julia's language core
`test/`	Test suits
`test/perf`	Benchmark suits
`ui/`	Source for various frontends

A brief explanation about the directories mentioned earlier:

The base/ directory consists of most of the standard library.
The src/ directory contains the core of the language.
There is also an examples directory containing some good code examples, which can be helpful when learning Julia. It is highly recommended to use these in parallel.

On the successful build on Linux, these directories can be found in the Julia's folder. These are usually present in the build directory.

Julia's importance in data science

In the last decade, data science has become a buzzword, with Harvard Business Review naming it the sexiest job of the 21st century. What is a data scientist? The answer was published in The Guardian (https://www.theguardian.com/careers/2015/jun/30/whats-a-data-scientist-and-how-do-i-become-one):

A data scientist takes raw data and marries it with analysis to make it accessible and more valuable for an organization. To do this, they need a unique blend of skills—a solid grounding in maths and algorithms and a good understanding of human behaviors, as well as knowledge of the industry they're working in, to put their findings into context. From here, they can unlock insights from the datasets and start to identify trends.

The technical skills of a data scientist are varied but, generally, they are good at programming, and have a very strong background in mathematics—especially statistics, skills in machine learning, and knowledge of big data. A data scientist is required to have in-depth understanding of the domain he/she is working in. Julia was designed for scientific and numerical computation. And with the advent of big data, there is a requirement to have a language that can work on huge amounts of data. Although we have Spark and MapReduce (Hadoop) as processing engines that are generally used with Python, Scala, and Java, Julia with Intel's High Performance Analytics Toolkit can provide an alternative option. It may also be worth noting that Julia excels at parallel computing but is much easier to write and prototype than Spark/Hadoop.

One great feature of Julia is that it solves the 2-language problem. Generally, with Python and R, code that is doing most of the heavy workload is written in C/C++ and it is then called. This is not required with Julia, as it can perform comparably to C/C++. Therefore, complete code—including code that does heavy computations—can be written in Julia itself.

Benchmarks

We mentioned the speed of Julia above, and that's what sets this language apart from traditional dynamically typed languages. Speed is its specialty. So, how fast can Julia be? The following micro-benchmark results were obtained on a single core (serial execution) on an Intel(R) Xeon(R) CPU E7-8850 2.00 GHz CPU with 1 TB of 1067 MHz DDR3 RAM running Linux:

	Julia 0.4.0	Python 3.4.3	R 3.2.2	MATLAB R2015b	Go go1.5	Java 1.8.0_45
`fib`	2.11	77.76	533.52	26.89	1.86	1.21
`parse_int`	1.45	17.02	45.73	802.52	1.20	3.35
`quicksort`	1.15	32.89	264.54	4.92	1.29	2.60
`mandel`	0.79	15.32	53.16	7.58	1.11	1.35
`pi_sum`	1.00	21.99	9.56	1.00	1.00	1.00
`rand_mat_stat`	1.66	17.93	14.56	14.52	2.96	3.92
`rand_mat_mul`	1.02	1.14	1.57	1.12	1.42	2.36

These benchmark times are relative to C (smaller is better, C performance = 1.0). Benchmarks can be misleading and are not always true. Good coding practices need to be followed and exactly identical conditions are required to measure them side by side. Julia has been quite open about how it measured these benchmarks and the code is available at https://github.com/JuliaLang/julia/tree/master/test/perf/micro.

Julia is significantly faster than Python. There is a huge difference in performance in the benchmarks. However, some libraries for numerical computation available to Python are written in C, and here it performs nearly equivalent to Julia.
R was specifically designed for statisticians. It has a huge set of libraries for statistics and numerical computation and is available for free. It used to be the language of choice for data scientists (now Python is preferred). R is single-threaded and is a lot slower than Julia.
MATLAB is not a free product. It comes with a paid license (students may get discounts). It is used by statisticians and academicians for some specific use cases. The above benchmarks run a lot slower on MATLAB.
Go is designed from scratch for system programming. It was created by Google and the source code is available on GitHub, where it is actively developed. Go performs really well on these benchmarks, but it is not designed for numerical and scientific computing.
Java performs well. It beats Julia in some benchmarks and Julia beats it in others. But we need to consider the development time associated with it. Julia is designed in a way that it can be used even for rapid prototyping. That makes it unique.

Julia is therefore well-suited to data science problems. Its ecosystem may not be as comprehensive as other languages right now, but it is growing at a great pace.

Using REPL

Read-Eval-Print-Loop (REPL) is an interactive shell or the language shell that provides the functionality to test out pieces of code. Julia provides an interactive shell with a JIT compiler (used by Julia) at the backend. We can give input in a line, it is compiled and evaluated, and the result is given in the next line:

Julia's shell can be started easily, just by writing Julia in the Terminal. This brings up the logo and information about the version. This julia> is a Julia prompt and we can write expressions, statements, and functions, just as we could write them in a code file. The benefit of having the REPL is that we can test out our code for possible errors. Also, it is a good environment for beginners. We can type in the expressions and press Enter key to evaluate them. A Julia library, or custom-written Julia program, can be included in REPL using include.

For example, let's create a file with the name hello.jl and write a function with the name helloworld(), which just prints the line Hello World from hello.jl:

Now, use this function inside the REPL by writing an include statement for the specified file:

When we include the file inside the REPL, we see that it gave the information about the function present inside the file. We used the function and it printed the desired line. Julia also stores all the commands written in the REPL in the .julia_history folder. This file is located at C:\Users\%username% on Windows, or ~/.julia_history on Linux or macOS. Similar to Linux Terminal, we can reverse-search using Ctrl + R keys in Julia's shell. This helps us to find the previous commands or to debug the errors if necessary.

Using help in Julia

There is another nice feature provided in the REPL of Julia, which is help. One can use this by typing a question mark (?) and the prompt will change to:

help?>

It is used to give information about the functions, types, macros, and operators used in Julia:

In the preceding screenshot, we used help?> to give information about the include and "+" key (please read the information provided for "+" key).

Julia also provides the functionality to use regular shell commands from REPL. This can be used by writing ";" key inside the REPL, which will change the prompt to:

shell>

In the preceding screenshot, we used the ls command (dir in Windows) to list the files in the current directory and did a ping to 8.8.8.8. This is helpful, as we don't have to leave the REPL for tasks that we need to do on the Terminal. Tab-completion works fine in shell> mode too, as it does in julia> mode. There are some interactive functions and macros available, which increase the productivity from the REPL:

whos(): This gives information about the global symbols currently present:

We can see it also gives the size of the module.

@which: It gives the information about the method that would be called for the particular function and arguments:

     julia> @which sin(10)
     sin(x::Real) at math.jl:204

versioninfo(): Although when we start the Julia REPL, it gives some information about the version, if we want more detailed information, then versioninfo() is used:

     julia> versioninfo()
     Julia Version 0.5.0
     Commit 3c9d753 (2016-09-19 18:14 UTC)
     Platform Info:
       System: Linux (x86_64-linux-gnu)
       CPU: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
       WORD_SIZE: 64
       BLAS: libopenblas (NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY 
       Haswell)
       LAPACK: liblapack.so.3
       LIBM: libopenlibm
       LLVM: libLLVM-3.7.1 (ORCJIT, broadwell)

edit("any_file_in_home_directory"): This is used to edit a file in the home directory.

Let's edit the same file that we used in the earlier examples:

julia> edit("hello.jl")

This brings up a default editor based on the OS being used, like Vim.

@edit rand(): If required, we can also edit the definition of the built-in functions. This will bring up a similar window as the preceding one, which is for the default editor.
less("any_file_in_same_directory"): This is similar to the shell utility for showing the file.
clipboard("some_text"): This is used to copy some_text to the system clipboard.
clipboard(): This is used to paste the contents of the keyboard to the REPL. Useful when copying some commands from elsewhere to the REPL.
dump(): This displays information about a Julia object on the screen.
names(): This gets an array of the names exported by a module.
fieldnames(): This is used to get the array of the data fields belonging to a particular symbol.
workspace(): This brings out a clean workspace (all user defined variables are erased) and cleans up the top-level module (Main).

Plots in REPL

There is also the possibility to create plots in REPL. These are simple plots, but look quite nice and are very light on memory. To do this, we need to add a package. We will study package management later, but for now, we can directly add a package and start using it.

julia> Pkg.add("UnicodePlots")

This has been provided by a developer called Christof Stocker. Remember how to enter the symbol "π"? If yes, then please go through the previous sections. Let's use this to create a simple plot:

We can create many similar plots such as scatterplots, line plots, histograms, and so on using UnicodePlots.

Using Jupyter Notebook

Data science and scientific computing are privileged to have an amazing interactive tool called Jupyter Notebook. With Jupyter Notebook, you can write and run code in an interactive web environment that also has the capability to have visualizations, images, and videos. It makes the testing of equations and prototyping a lot easier, has the support of over 40 programming languages, and is completely open source.

GitHub supports Jupyter Notebooks (static). A Notebook with a record of computation can be shared through the Jupyter Notebook viewer or other cloud storage. Jupyter Notebooks are extensively used for coding machine-learning algorithms, statistical modeling and numerical simulation, and data munging. Jupyter Notebook is implemented in Python, but you can run the code in any of the 40 languages, provided you have their kernel. You can check whether Python is installed on your system or not by typing the following into the Terminal:

$ python --version
Python 2.7.12 :: Anaconda 4.1.1 (64-bit)

This will give the version of Python if it is on the system. It is best to have Python 2.7.x, 3.5.x, or a later version. If Python is not installed, then you can install it by downloading it from the official website for Windows.

It is highly recommended to install Anaconda if you are new to Python and data science. Commonly used packages for data science, numerical, and scientific computing—including Jupyter Notebook—come bundled with Anaconda, making it the preferred way to set up an environment. Instructions can be found at https://www.continuum.io/downloads.

Otherwise, for Linux (Ubuntu), typing the following should work:

$ sudo apt-get install python

Jupyter is present in the Anaconda package, but you can check whether the Jupyter package is up to date by typing in the following:

$ jupyter --version

If, for some reason, it is not present, it can be installed using:

$ conda install jupyter

Another way to install Jupyter is by using the pip command:

$ pip install jupyter

Now, to use Julia with Jupyter, we need the IJulia package. This can be installed using Julia's package manager. In the REPL, type the following commands:

julia> Pkg.update()
julia> Pkg.add("IJulia")

The first step updates the metadata and current packages and the second step installs the desired package. After installing IJulia, we can create a new Notebook by selecting Julia under the Notebooks section in Jupyter, as shown in the following screenshot:

Here, we can create a new Julia Notebook. Notebooks are very easy to use. The following screenshot shows the menu bar and the options available with Jupyter Notebook:

The symbols are quite self-explanatory. It also supports adding markdown to the notebook. GitHub is also Jupyter Notebook-friendly. Just type a command and click the run icon or press Shift + Enter keys on the code block to run the code. The following screenshot shows that the command prints two lines and does an arithmetic operation:

Jupyter is a great interactive environment when visualizations or graphs are involved. The following code creates a nice plot (we will learn about Gadfly and RDatasets in later chapters):

# Use Plotting library and RDatasets package for iris dataset
using Gadfly
using RDatasets
# Create a dataframe
# We will learn about dataframes in later chapters
iris = dataset("datasets", "iris")
# This will create an image
p = plot(iris, x=:SepalLength, y=:SepalWidth, color=:Species, Geom.point)

The following screenshot classifies different species of flowers by their color:

Juno IDE Atom is an open source text editor developed using Electron, a framework by GitHub. Electron is a framework designed to build cross-platform apps using HTML, JavaScript, CSS, and Node.js. It is completely configurable and can be modified according to your needs. Basic tutorials are available on their website at https://atom.io/. There are thousands of open source packages available for Atom. Most of them are there to add or improve functionality, and there are thousands of themes to change the look and feel of the editor.

There are many features of Atom:

Split the editor in panes to work side by side
Open complete projects to navigate easily
Autocomplete
Available for Linux, Mac, and Windows

In the following screenshot, we can see an open project, multiple panes, a stylized interface, and syntax highlighting. Atom is not just limited to this:

What is Juno?

Juno is built on Atom. It is a powerful and free environment for the Julia language. This contains many powerful features, such as:

Multiple cursors
Fuzzy file finding
Vim key bindings

Juno provides an environment that combines the features of the Jupyter Notebook and the productivity of the IDE. It is very easy to use and is a completely live environment. With Atom, you can install new packages through the Settings panel or through the command line using the apm command. So, if we need to install a new package and we know its name (let's say xyz), we can just write:

$ apm install xyz

There are many apm commands, which can be read using --help command:

$ apm --help

The following screenshot shows the Juno IDE installing process through Settings. We just need to go to the Settings tab and install new packages:

It is recommended to install Juno and go through it. It is an amazing development environment and should be used when you become a little more familiar with the language. Juno will look familiar to RStudio and Yhat's Rodeo users. The following screenshot from Juno's website shows us the coding panel, plots, console, and workspace:

We don't have to move to newer windows just to check the plot or don't need to manually find out the variables currently in the variable.

Package management

Julia provides a built-in package manager. Using Pkg, we can install libraries written in Julia. For unregistered packages, we can also compile them from their source or use the standard package manager of the operating system. A list of registered packages is maintained at http://pkg.julialang.org. The Pkg module is provided in the base installation. The Pkg module contains all the package manager commands.

Pkg.status() – package status

The Pkg.status() is a function that prints out a list of currently installed packages with a summary. This is handy when you need to know whether the package you want to use is installed or not. When the Pkg command is run for the first time, the package directory is automatically created. The command requires that the Pkg.status() function returns a valid list of the packages installed. The list of packages given by the Pkg.status() function are of registered versions, which are managed by Pkg. We currently have lots of packages installed, whose versions we can check, given as follows:

The Pkg.installed() function can also be used to return a dictionary of all the installed packages with their versions:

julia> Pkg.installed()
Dict{String,VersionNumber} with 54 entries:
 "Juno"              => v"0.2.5"
 "Lazy"              => v"0.11.4"
 "ZMQ"               => v"0.4.0"
 "DataStructures"    => v"0.4.6"
 "Compat"            => v"0.9.4"
 "RData"             => v"0.0.4"
 "GZip"              => v"0.2.20"
 ...

You can see that the number of packages is the same, but the way the information is presented is different. Packages can be incomplete or in more complicated states. These states are generally indicated by annotations to the right of the installed package version. Pkg.status() will print branch information to the right of the version.

Pkg.add() – adding packages

Julia's package manager is declarative and intelligent. You only have to tell it what you want and it will figure out which version to install and resolve dependencies if there are any.

We can use Pkg.add(package_name) to add packages and Pkg.rm(package_name) to remove packages. This is the recommended way.

Otherwise, we can add the list of requirements that we want and it resolves which packages and their versions to install. The ~/.julia/v0.5/REQUIRE file contains the package requirements. We can open it using a text editor such as vi or atom, or use the Pkg.edit() function in Julia's shell to edit this file:

$ cat ~/.julia/v0.5/REQUIRE
IJulia
ScikitLearn
RDatasets
Plots
UnicodePlots
PyPlot
PlotlyJS

After editing the file, run Pkg.resolve() to install or remove the packages. This installation and removal will be based on the packages mentioned in the REQUIRE file.

Remember earlier we used Pkg.add("IJulia") to install the IJulia package? When we don't want to have a package installed on our system anymore, Pkg.rm() is used to remove the requirement from the REQUIRE file. Similar to Pkg.add(), Pkg.rm() first removes the requirement of the package from the REQUIRE file and then updates the list of installed packages by running Pkg.resolve() to match. Pkg.add() and Pkg.rm() are convenient and are generally used to add and remove requirements for a single package. But when there is a need to add or remove multiple packages, we can call Pkg.edit() to manually change the contents of REQUIRE and then update our packages accordingly. The Pkg.edit() function does not roll back the contents of REQUIRE if Pkg.resolve() fails. In that scenario, we have to run Pkg.edit() again to fix the file's contents ourselves.

Working with unregistered packages

Often, we would like to be able to use packages created by our team members or someone who has published them on GitHub but that are not in the registered packages of Pkg. Julia allows us to do that by using GitHub clones. Julia packages are hosted on GitHub repositories and can be cloned using mechanisms supported by Git. The index of registered packages is maintained at METADATA.jl. For unofficial packages, we can use the following:

julia> Pkg.clone("git://github.com/path/unofficialPackage/Package.jl.git")

Julia repository names end with .jl on GitHub (the additional .git indicates a bare GitHub repository). This prevents Julia and other languages from colliding with repositories with the same name. This also makes the search easy. Sometimes, unregistered packages have dependencies that require fulfilling before use. If that is the scenario, a REQUIRE file is needed at the top of the source tree of the unregistered package. The unregistered packages' dependencies on the registered packages are determined by this REQUIRE file. When we run Pkg.clone(url), these dependencies are automatically installed.

Pkg.update() – package update

It's good to have updated packages. Julia, which is under active development, has its packages frequently updated and new functionalities are added. To update all of the packages, type the following:

Pkg.update()

Under the hood, new changes are pulled into the METADATA file in the directory located at ~/.julia/v0.5/, and checks are run for any new registered package versions that may have been published since the last update. If there are new registered package versions, Pkg.update() attempts to update the packages that are not dirty and are checked out on a branch. This update process satisfies top-level requirements by computing the optimal set of package versions to be installed. The packages with specific versions that must be installed are defined in the REQUIRE file in Julia's directory (~/.julia/v0.5/).

METADATA repository

Registered packages are downloaded and installed using the official METADATA.jl repository. A different METADATA repository location can also be provided if required:

julia> Pkg.init("https://julia.customrepo.com/METADATA.jl.git", "branch")

Developing packages

Julia allows us to view the source code, and, as it is tracked by GitHub, the full development history of all the installed packages is available. We can also make changes and commit to our own repository, or do bug fixes and contribute enhancements upstream. You may also want to create your own packages and publish them at some point in time. Julia's package manager allows you to do that too. It is a requirement that GitHub is installed on the system and the developer needs an account with their hosting provider of choice (GitHub, Bitbucket, and so on). Having the ability to communicate over SSH is preferred—to enable that, upload your public ssh-key to your hosting provider.

Creating a new package

It is preferable to have the REQUIRE file in the package repository. This should have the bare minimum of a description of the Julia version. For example, if we would like to create a new Julia package called HelloWorld, we would have the following:

Pkg.generate("HelloWorld", "MIT")

Here, HelloWorld is the package that we want to create and MIT is the license that our package will have. The license should be known to the package generator. Pkg.generate currently knows about MIT, BSD, and ASL but a user can use any license they want by changing LICENCE.md.

The above process creates a directory at ~/.julia/v0.5/HelloWorld. The directory that is created is initialized as a GitHub repository. Also, all the files required by the package are kept in this directory. This directory is then committed to the repository and can then be pushed to the remote repository for the world to use.

A brief about multiple dispatch

A function is an object, mapping a tuple of arguments using an expression to return a value. When this function object is unable to return a value, it throws an exception. For different types of arguments, the same conceptual function can have different implementations.

For example, we could have a function to add two floating point numbers and another function to add two integers. But, conceptually, we are only adding two numbers. Julia provides a functionality through which different implementations of the same concept can be implemented easily.

The functions don't need to be defined all at once; they are defined in small abstracts. These small abstracts are different argument type combinations and have different behaviors associated with them. The definition of one of these behaviors is called a method. The types and the number of arguments that a method definition accepts is indicated by the annotation of its signatures. Therefore, the most suitable method is applied whenever a function is called with a certain set of arguments.

To apply a method when a function is invoked is known as dispatch. There are two types of dispatch:

dynamic-based on type evaluated at the run-time type
multiple-based on all arguments, not just the receiver

Julia chooses which method should be invoked based on all the arguments. This is known as multiple dispatch. Multiple dispatch is particularly useful for mathematical and scientific code. We shouldn't consider the operations as belonging to one argument any more than any of the others. All of the argument types are considered when implementing a mathematical operator. Multiple dispatch is not limited to mathematical expressions, as it can be used in numerous real-world scenarios and is a powerful paradigm for structuring programs. The code is given as follows:

type MyType
   prop::String
end
MyType(v::Real) = ...
function MyType{T}(v::Vector{T})  # parametric type
   ....
end

Methods in multiple dispatch

The "+" symbol is a function in Julia using multiple dispatch. Multiple dispatch is used by all of Julia's standard functions and operators. For the various possible combinations of argument types and counts, all of them have many methods defining their behavior. A method is restricted to taking certain types of arguments using the ::type-assertion operator:

julia> f(x::Float64, y::Float64) = x + y
f (generic function with 1 method)

The function definition will only be applied for calls where x and y are both values of the Float64 type:

julia> f(10.0, 14.0)
24.0

If we try to apply this definition to other types of arguments, it will give a method error:

julia> f(5,10.0)
ERROR: MethodError: no method matching f(::Int64, ::Float64)
Closest candidates are:
 f(::Float64, ::Float64) at REPL[4]:1

The arguments must be of precisely the same type as defined in the function definition. The function object is created in the first method definition. New method definitions add new behaviors to the existing function object. When a function is invoked, the number and types of the arguments are matched, and the most specific method definition matching will be executed. The following example creates a function with two methods. One method definition takes two arguments of the Float64 type and adds them. The second method definition takes two arguments of the Number type, multiplies them by two and adds them. When we invoke the function with Float64 arguments, the first method definition is applied, and when we invoke the function with integer arguments, the second method definition is applied, as the number can take any numeric values. In the following example, we are playing with floating point numbers and integers using multiple dispatch:

julia> f(x::Number, y::Number) = x + y
f (generic function with 2 methods)
julia> f(100,200)
300

In Julia, all values are instances of the abstract type Any. When the type declaration is not given with ::, that means it is not specifically defined as the type of the argument, therefore Any is the default type of method parameter and it doesn't have the restriction of taking any type of value. Generally, one method definition is written in such a way that it will be applied to certain arguments to which no other method definition applies. This is one of the Julia language's most powerful features. Specialized code can be generated expressively and efficiently, and complex algorithms can be implemented, without paying much attention to low-level implementations, using Julia's multiple dispatch and flexible parametric type system. We will study multiple dispatch in detail in the coming chapters.

Understanding LLVM and JIT

The LLVM project was started as a research project at the University of Illinois. Its aim was to create modern, Static Single Assignment (SSA)-based compilation strategies and type safety, low-level operations, flexibility, and the capability of representing all high-level languages cleanly. It is actually a collection of modular and reusable compiler and toolchain technologies. LLVM doesn't have to do much with traditional virtual machines. Some of the objectives of LLVM are:

The LLVM Core libraries were created to provide a modern source—and target-independent optimizer, along with code generation support for many popular CPUs. These libraries are built around a well-specified code representation known as the LLVM intermediate representation (LLVM IR).
Clang is an LLVM native C/C++/Objective-C compiler, which aims to deliver amazingly fast compiles (for example, about 3x faster than GCC when compiling Objective-C code in a debug configuration), extremely useful error and warning messages and to provide a platform for building great source-level tools.
DragonEgg integrates the LLVM optimizers and code generator with the GCC parsers. This allows LLVM to compile Ada, Fortran, and other languages supported by the GCC compiler frontends, and access to C features not supported by Clang.
The LLDB project builds on libraries provided by LLVM and Clang to provide a great native debugger. It uses the Clang ASTs and expression parser, LLVM JIT, LLVM disassembler, and so on, so that it provides an experience that just works. It is also blazingly fast and much more memory-efficient than GDB at loading symbols.
The SAFECode project is a memory safety compiler for C/C++ programs. It instruments code with runtime checks to detect memory safety errors (for example, buffer overflows) at runtime. It can be used to protect software from security attacks and can also be used as a memory safety error debugging tool, like Valgrind.

In computing, JIT compilation, also known as dynamic translation, is compilation done during the execution of a program–-at runtime—rather than prior to execution. Most often, this consists of translation to machine code, which is then executed directly, but can also refer to translation to another format. A system implementing a JIT compiler typically continuously analyses the code being executed and identifies parts of the code where the speedup gained from compilation would outweigh the overhead of compiling that code. The LLVM JIT compiler can optimize unnecessary static branches out of a program at runtime, and thus is useful for partial evaluation in cases where a program has many options, most of which can easily be deemed unnecessary in a specific environment. This feature is used in the OpenGL pipeline of Mac OS X Leopard (v10.5) to provide support for missing hardware features. Graphics code within the OpenGL stack was left in intermediate representation and then compiled when run on the target machine. On systems with high-end graphics processing units (GPUs), the resulting code was quite thin, passing the instructions onto the GPU with minimal changes. On systems with low-end GPUs, LLVM would compile optional procedures that run on the local central processing unit (CPU) and emulate instructions that the GPU cannot run internally. LLVM improved performance on low-end machines using Intel GMA chipsets. A similar system was developed under the Gallium3D LLVMpipe and incorporated into the GNOME shell to allow it to run without a proper 3D hardware driver loaded.

Summary

In this chapter, we learned how Julia is different and looked at some of its features. We introduced you to how to download Julia and set up the environment on your system. We went through different environments popularly available for Julia, such as REPL, Jupyter notebook, and Juno (Atom). Then, we learned about multiple dispatch and why it is a special feature. In addition, we got a basic understanding of the LLVM and JIT.