Home Data Julia for Data Science

Julia for Data Science

By Anshul Joshi
books-svg-icon Book
eBook $43.99 $29.99
Print $54.99
Subscription $15.99 $10 p/m for three months
$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
eBook $43.99 $29.99
Print $54.99
Subscription $15.99 $10 p/m for three months
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
  1. Free Chapter
    The Groundwork – Julia's Environment
About this book
Julia is a fast and high performing language that's perfectly suited to data science with a mature package ecosystem and is now feature complete. It is a good tool for a data science practitioner. There was a famous post at Harvard Business Review that Data Scientist is the sexiest job of the 21st century. (https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century). This book will help you get familiarised with Julia's rich ecosystem, which is continuously evolving, allowing you to stay on top of your game. This book contains the essentials of data science and gives a high-level overview of advanced statistics and techniques. You will dive in and will work on generating insights by performing inferential statistics, and will reveal hidden patterns and trends using data mining. This has the practical coverage of statistics and machine learning. You will develop knowledge to build statistical models and machine learning systems in Julia with attractive visualizations. You will then delve into the world of Deep learning in Julia and will understand the framework, Mocha.jl with which you can create artificial neural networks and implement deep learning. This book addresses the challenges of real-world data science problems, including data cleaning, data preparation, inferential statistics, statistical modeling, building high-performance machine learning systems and creating effective visualizations using Julia.
Publication date:
September 2016
Publisher
Packt
Pages
346
ISBN
9781785289699

 

Chapter 1. The Groundwork – Julia's Environment

Julia is a fairly young programming language. In 2009, three developers (Stefan Karpinski, Jeff Bezanson, and Viral Shah) at MIT in the Applied Computing group under the supervision of Prof. Alan Edelman started working on a project that lead to Julia. In February 2012, Julia was presented publicly and became open source. The source code is available on GitHub (https://github.com/JuliaLang/julia). The source of the registered packages can also be found on GitHub. Currently, all four of the initial creators, along with developers from around the world, actively contribute to Julia.

Note

The current release is 0.4 and is still away from its 1.0 release candidate.

Based on solid principles, its popularity is steadily increasing in the field of scientific computing, data science, and high-performance computing.

This chapter will guide you through the download and installation of all the necessary components of Julia. This chapter covers the following topics:

  • How is Julia different?

  • Setting up Julia's environment.

  • Using Julia's shell and REPL.

  • Using Jupyter notebooks

  • Package management

  • Parallel computation

  • Multiple dispatch

  • Language interoperability

Traditionally, the scientific community has used slower dynamic languages to build their applications, although they have required the highest computing performance. Domain experts who had experience with programming, but were not generally seasoned developers, always preferred dynamic languages over statically typed languages.

 

Julia is different


Over the years, with the advancement in compiler techniques and language design, it is possible to eliminate the trade-off between performance and dynamic prototyping. So, the scientific computing required was a good dynamic language like Python together with performance like C. And then came Julia, a general purpose programming language designed according to the requirements of scientific and technical computing, providing performance comparable to C/C++, and with an environment productive enough for prototyping like the high-level dynamic language of Python. The key to Julia's performance is its design and Low Level Virtual Machine (LLVM) based Just-in-Time compiler which enables it to approach the performance of C and Fortran.

The key features offered by Julia are:

  • A general purpose high-level dynamic programming language designed to be effective for numerical and scientific computing

  • A Low-Level Virtual Machine (LLVM) based Just-in-Time (JIT) compiler that enables Julia to approach the performance of statically-compiled languages like C/C++

The following quote is from the development team of Julia—Jeff Bezanson, Stefan Karpinski, Viral Shah, and Alan Edelman:

Note

We are greedy: we want more.

We want a language that's open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that's homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.

(Did we mention it should be as fast as C?)

It is quite often compared with Python, R, MATLAB, and Octave. These have been around for quite some time and Julia is highly influenced by them, especially when it comes to numerical and scientific computing. Although Julia is really good at it, it is not restricted to just scientific computing as it can also be used for web and general purpose programming.

The development team of Julia aims to create a remarkable and never done before combination of power and efficiency without compromising the ease of use in one single language. Most of Julia's core is implemented in C/C++. Julia's parser is written in Scheme. Julia's efficient and cross-platform I/O is provided by the Node.js's libuv.

Features and advantages of Julia can be summarized as follows:

  • It's designed for distributed and parallel computation.

  • Julia provides an extensive library of mathematical functions with great numerical accuracy.

  • Julia gives the functionality of multiple dispatch. Multiple dispatch refers to using many combinations of argument types to define function behaviors.

  • The Pycall package enables Julia to call Python functions in its code and Matlab packages using Matlab.jl. Functions and libraries written in C can also be called directly without any need for APIs or wrappers.

  • Julia provides powerful shell-like capabilities for managing other processes in the system.

  • Unlike other languages, user-defined types in Julia are compact and quite fast as built-ins.

  • Data analysis makes great use of vectorized code to gain performance benefits. Julia eliminates the need to vectorize code to gain performance. De-vectorized code written in Julia can be as fast as vectorized code.

  • It uses lightweight "green" threading also known as tasks or coroutines, cooperative multitasking, or one-shot continuations.

  • Julia has a powerful type system. The conversions provided are elegant and extensible.

  • It has efficient support for Unicode.

  • It has facilities for metaprogramming and Lisp-like macros.

  • It has a built-in package manager. (Pkg)

  • Julia provides efficient, specialized and automatic generation of code for different argument types.

  • It's free and open source with an MIT license.

 

Setting up the environment


Julia is available free. It can be downloaded from its website at the following address: http://julialang.org/downloads/. The website also has exhaustive documentation, examples, and links to tutorials and community. The documentation can be downloaded in popular formats.

Installing Julia (Linux)

Ubuntu/Linux Mint is one of the most famous Linux distros, and their deb packages of Julia are also provided. These are available for both 32-bit and 64-bit distributions.

To install Julia, add the PPA (personal package archive). Ubuntu users are privileged enough to have PPA. It is treated as an apt repository to build and publish Ubuntu source packages. In the terminal, type the following:

sudo apt-get add-repository ppa:staticfloat/juliareleases 
sudo apt-get update 

This adds the PPA and updates the package index in the repository.

Now install Julia:

sudo apt-get install Julia  

The installation is complete. To check if the installation is successful in the Terminal type in the following:

julia --version 

This gives the installed Julia's version.

To open the Julia's interactive shell, type julia into the Terminal. To uninstall Julia, simply use apt to remove it:

sudo apt-get remove julia 

For Fedora/RHEL/CentOS or distributions based on them, enable the EPEL repository for your distribution version. Then, click on the link provided. Enable Julia's repository using the following:

dnf copr enable nalimilan/julia

Or copy the relevant .repo file available as follows:

/etc/yum.repos.d/

Finally, in the Terminal type the following:

yum install julia

Installing Julia (Mac)

Users with Mac OS X need to click on the downloaded .dmg file to run the disk image. After that, drag the app icon into the Applications folder. It may prompt you to ask if you want to continue as the source has been downloaded from the Internet and so is not considered secure. Click on continue if it is downloaded for the Julia language official website.

Julia can also be installed using homebrew on the Mac as follows:

brew update 
brew tap staticfloat/julia 
brew install julia 

The installation is complete. To check if the installation is successful in the Terminal, type the following:

julia --version 

This gives you the installed Julia version.

Installing Julia (Windows)

Download the .exe file provided on the download page according to your system's configuration (32-bit/64-bit). Julia is installed on Windows by running the downloaded .exe file, which will extract Julia into a folder. Inside this folder is a batch file called julia.bat, which can be used to start the Julia console.

To uninstall, delete the Julia folder.

Exploring the source code

For enthusiasts, Julia's source code is available and users are encouraged to contribute by adding features or by bug fixing. This is the directory structure of the tree:

base/

Source code for Julia's standard library

contrib/

Editor support for Julia source, miscellaneous scripts

deps/

External dependencies

doc/manual

Source for the user manual

doc/stdlib

Source for standard library function help text

examples/

Example Julia programs

src/

Source for Julia language core

test/

Test suites

test/perf

Benchmark suites

ui/

Source for various frontends

usr/

Binaries and shared libraries loaded by Julia's standard libraries

 

Using REPL


Read-Eval-Print-Loop is an interactive shell or the language shell that provides the functionality to test out pieces of code. Julia provides an interactive shell with a Just-in-Time compiler at the backend. We can give inputs in a line, it is compiled and evaluated, and the result is given in the next line.

The benefit of using the REPL is that we can test out our code for possible errors. Also, it is a good environment for beginners. We can type in the expressions and press Enter to evaluate.

A Julia library, or custom-written Julia program, can be included in the REPL using include. For example, I have a file called hello.jl, which I will include in the REPL by doing the following:

julia> include ("hello.jl") 

Julia also stores all the commands written in the REPL in the .julia_history. This file is located at /home/$USER on Ubuntu, C:\Users\username on Windows, or ~/.julia_history on OS X.

As with a Linux Terminal, we can reverse-search using Ctrl + R in Julia's shell. This is a really nice feature as we can go back in the history of typed commands.

Typing ? in the language shell will change the prompt to:

help?>  

To clear the screen, press Ctrl + L. To come out of the REPL press Ctrl + D or type the following:

julia> exit().

 

Using Jupyter Notebook


Data science and scientific computing are privileged to have an amazing interactive tool called Jupyter Notebook. With Jupyter Notebook you can to write and run code in an interactive web environment, which also has the capability to have visualizations, images, and videos. It makes testing of equations and prototyping a lot easier. It has the support of over 40 programming languages and is completely open source.

GitHub supports Jupyter notebooks. The notebook with the record of computation can be shared via the Jupyter notebook viewer or other cloud storage. Jupyter notebooks are extensively used for coding machine-learning algorithms, statistical modeling and numerical simulation, and data munging.

Jupyter Notebook is implemented in Python but you can run the code in any of the 40 languages provided you have their kernel. You can check if Python is installed on your system or not by typing the following into the Terminal:

python -version 

This will give the version of Python if it is there on the system. It is best to have Python 2.7.x or 3.5.x or a later version.

If Python is not installed then you can install it by downloading it from the official website for Windows. For Linux, typing the following should work:

sudo apt-get install python 

It is highly recommended to install Anaconda if you are new to Python and data science. Commonly used packages for data science, numerical, and scientific computing including Jupyter notebook come bundled with Anaconda making it the preferred way to set up the environment. Instructions can be found at https://www.continuum.io/downloads.

Jupyter is present in the Anaconda package, but you can check if the Jupyter package is up to date by typing in the following:

conda install jupyter 

Another way to install Jupyter is by using pip:

pip install jupyter 

To check if Jupyter is installed properly, type the following in the Terminal:

jupyter -version 

It should give the version of the Jupyter if it is installed.

Now, to use Julia with Jupyter we need the IJulia package. This can be installed using Julia's package manager.

After installing IJulia, we can create a new notebook by selecting Julia under the Notebooks section in Jupyter.

To get the latest version of all your packages, in Julia's shell type the following:

julia> Pkg.update() 

After that add the IJulia package by typing the following:

julia> Pkg.add("IJulia") 

In Linux, you may face some warnings, so it's better to build the package:

julia> Pkg.build("IJulia") 

After IJulia is installed, come back to the Terminal and start the Jupyter notebook:

jupyter notebook 

A browser window will open. Under New, you will find options to create new notebooks with the kernels already installed. As we want to start a Julia notebook we will select Julia 0.4.2. This will start a new Julia notebook. You can try out a simple example.

In this example, we are creating a histogram of random numbers. This is just an example we will be studying the components used in detail in coming chapters.

Popular editors such as Atom and Sublime have a plugin for Julia. Atom has language—julia and Sublime has Sublime—IJulia, both of which can be downloaded from their package managers.

 

Package management


Julia provides a built-in package manager. Using Pkg we can install libraries written in Julia. For external libraries, we can also compile them from their source or use the standard package manager of the operating system. A list of registered packages is maintained at http://pkg.julialang.org.

Pkg is provided in the base installation. The Pkg module contains all the package manager commands.

Pkg.status() – package status

The Pkg.status() is a function that prints out a list of currently installed packages with a summary. This is handy when you need to know if the package you want to use is installed or not.

When the Pkg command is run for the first time, the package directory is automatically created. It is required by the command that the Pkg.status() returns a valid list of the packages installed. The list of packages given by the Pkg.status() are of registered versions which are managed by Pkg.

Pkg.installed() can also be used to return a list of all the installed packages with their versions.

Pkg.add() – adding packages

Julia's package manager is declarative and intelligent. You only have to tell it what you want and it will figure out what version to install and will resolve dependencies if there are any. Therefore, we only need to add the list of requirements that we want and it resolves which packages and their versions to install.

The ~/.julia/v0.4/REQUIRE file contains the package requirements. We can open it using a text editor such as vi or atom, or use Pkg.edit() in Julia's shell to edit this file. After editing the file, run Pkg.resolve() to install or remove the packages.

We can also use Pkg.add(package_name) to add packages and Pkg.rm(package_name) to remove packages. Earlier, we used Pkg.add("IJulia")  to install the IJulia package.

When we don't want to have a package installed on our system anymore, Pkg.rm() is used for removing the requirement from the REQUIRE file. Similar to Pkg.add(), Pkg.rm() first removes the requirement of the package from the REQUIRE file and then updates the list of installed packages by running Pkg.resolve() to match.

Working with unregistered packages

Frequently, we would like to be able to use packages created by our team members or someone who has published on Git but they are not in the registered packages of Pkg. Julia allows us to do that by using a clone. Julia packages are hosted on Git repositories and can be cloned using mechanisms supported by Git. The index of registered packages is maintained at METADATA.jl. For unofficial packages, we can use the following:

Pkg.clone("git://example.com/path/unofficialPackage/Package.jl.git") 

Sometimes unregistered packages have dependencies that require fulfilling before use. If that is the scenario, a REQUIRE file is needed at the top of the source tree of the unregistered package. The dependencies of the unregistered packages on the registered packages are determined by this REQUIRE file. When we run Pkg.clone(url), these dependencies are automatically installed.

Pkg.update() – package update

It's good to have updated packages.  Julia, which is under active development, has its packages frequently updated and new functionalities are added.

To update all of the packages, type the following:

Pkg.update() 

Under the hood, new changes are pulled into the METADATA file in the directory located at ~/.julia/v0.4/ and it checks for any new registered package versions which may have been published since the last update. If there are new registered package versions, Pkg.update() attempts to update the packages which are not dirty and are checked out on a branch. This update process satisfies the top-level requirements by computing the optimal set of package versions to be installed. The packages with specific versions that must be installed are defined in the REQUIRE file in Julia's directory (~/.julia/v0.4/).

METADATA repository

Registered packages are downloaded and installed using the official METADATA.jl repository. A different METADATA repository location can also be provided if required:

julia> Pkg.init("https://julia.customrepo.com/METADATA.jl.git", "branch") 

Developing packages

Julia allows us to view the source code and as it is tracked by Git, the full development history of all the installed packages is available. We can also make our desired changes and commit to our own repository, or do bug fixes and contribute enhancements upstream.

You may also want to create your own packages and publish them at some point in time. Julia's package manager allows you to do that too.

It is a requirement that Git is installed on the system and the developer needs an account at their hosting provider of choice (GitHub, Bitbucket, and so on). Having the ability to communicate over SSH is preferred—to enable that, upload your public ssh-key to your hosting provider.

Creating a new package

It is preferable to have the REQUIRE file in the package repository. This should have the bare minimum of a description of the Julia version.

For example, if we would like to create a new Julia package called HelloWorld we would have the following:

Pkg.generate("HelloWorld", "MIT") 

Here, HelloWorld is the package that we want to create and MIT is the license that our package will have. The license should be known to the package generator.

This will create a directory as follows: ~/.julia/v0.4/HelloWorld. The directory that is created is initialized as a Git repository. Also, all the files required by the package are kept in this directory. This directory is then committed to the repository.

This can now be pushed to the remote repository for the world to use.

 

Parallel computation using Julia


Advancement in modern computing has led to multi-core CPUs in systems and sometimes these systems are combined together in a cluster capable of performing a task which a single system might not be able to perform alone, or if it did it would take an undesirable amount of time. Julia's environment of parallel processing is based on message passing. Multiple processes are allowed for programs in separate memory domains.

Message passing is implemented differently in Julia from other popular environments such as MPI. Julia provides one-sided communication, therefore the programmer explicitly manages only one process in the two-process operation.

Julia's parallel programming paradigm is built on the following:

  • Remote references

  • Remote calls

A request to run a function on another process is called a remote call. The reference to an object by another object on a particular process is called a remote reference. A remote reference is a construct used in most distributed object systems. Therefore, a call which is made with some specific arguments to the objects generally on a different process by the objects of the different process is called the remote call and this will return a reference to the remote object which is called the remote reference.

The remote call returns a remote reference to its result. Remote calls return immediately. The process that made the call proceeds to its next operation. Meanwhile, the remote call happens somewhere else. A call to wait() on its remote reference waits for the remote call to finish. The full value of the result can be obtained using fetch(), and put!() is used to store the result to a remote reference.

Julia uses a single process default. To start Julia with multiple processors use the following:

julia -p n

where n is the number of worker processes. Alternatively, it is possible to create extra processors from a running system by using addproc(n). It is advisable to put n equal to the number of the CPU cores in the system.

pmap and @parallel are the two most frequently used and useful functions.

Julia provides a parallel for loop, used to run a number of processes in parallel. This is used as follows.

Parallel for loop works by having multiple processes assigned iterations and then reducing the result (in this case (+)). It is somewhat similar to the map-reduce concept. Iterations will run independently over different processes and the results obtained by these processes will be combined at the end (like map-reduce). The resultant of one loop can also become the feeder for the other loop. The answer is the resultant of this whole parallel loop.

It is very different than a normal iterative loop because the iterations do not take place in a specified sequence. As the iterations run on different processes, any writes that happens on variables or arrays are not globally visible. The variables used are copied and broadcasted to each process of the parallel for loop.

For example:

arr = zeros(500000) 
@parallel for i=1:500000 
  arr[i] = i 
end 

This will not give the desired result as each process gets their own separate copy of arr. The vector will not be filled in with i as expected. We must avoid such parallel for loops.

pmap refers to parallel map. For example:

This code solves the problem if we have a number of large random matrices and we are required to obtain the singular values, in parallel.

Julia's pmap() is designed differently. It is well suited for cases where a large amount of work is done by each function call, whereas @parallel is suited for handling situations which involve numerous small iterations. Both pmap() and @parallel for utilize worker nodes for parallel computation. However, the node from which the calling process originated does the final reduction in @parallel for.

 

Julia's key feature – multiple dispatch


A function is an object, mapping a tuple of arguments using some expression to a return value. When this function object is unable to return a value, it throws an exception. For different types of arguments the same conceptual function can have different implementations. For example, we can have a function to add two floating point numbers and another function to add two integers. But conceptually, we are only adding two numbers. Julia provides a functionality by which different implementations of the same concept can be implemented easily. The functions don't need to be defined all at once. They are defined in small abstracts. These small abstracts are different argument type combinations and have different behaviors associated with them. The definition of one of these behaviors is called a method.

The types and the number of arguments that a method definition accepts is indicated by the annotation of its signatures. Therefore, the most suitable method is applied whenever a function is called with a certain set of arguments. To apply a method when a function is invoked is known as dispatch. Traditionally, object-oriented languages consider only the first argument in dispatch. Julia is different as all of the function's arguments are considered (not just only the first) and then it choses which method should be invoked. This is known as multiple dispatch.

Multiple dispatch is particularly useful for mathematical and scientific code. We shouldn't consider that the operations belong to one argument more than any of the others. All of the argument types are considered when implementing a mathematical operator. Multiple dispatch is not limited to mathematical expressions as it can be used in numerous real-world scenarios and is a powerful paradigm for structuring the programs.

Methods in multiple dispatch

+ is a function in Julia using multiple dispatch. Multiple dispatch is used by all of Julia's standard functions and operators. For various possible combinations of argument types and count, all of them have many methods defining their behavior. A method is restricted to take certain types of arguments using the :: type-assertion operator:

julia> f(x::Float64, y::Float64) = x + y 

The function definition will only be applied for calls where x and y are both values of type Float64:

julia> f(10.0, 14.0) 
24.0 

If we try to apply this definition to other types of arguments, it will give a method error.

The arguments must be of precisely the same type as defined in the function definition.

The function object is created in the first method definition. New method definitions add new behaviors to the existing function object. When a function is invoked, the number and types of the arguments are matched, and the most specific method definition matching will be executed.

The following example creates a function with two methods. One method definition takes two arguments of the type Float64 and adds them. The second method definition takes two arguments of the type Number, multiplies them by two and adds them. When we invoke the function with Float64 arguments, then the first method definition is applied, and when we invoke the function with Integer arguments, the second method definition is applied as the number can take any numeric values. In the following example, we are playing with floating point numbers and integers using multiple dispatch.

In Julia, all values are instances of the abstract type "Any". When the type declaration is not given with ::, that means it is not specifically defined as the type of the argument, therefore Any is the default type of method parameter and it doesn't have the restriction of taking any type of value. Generally, one method definition is written in such a way that it will be applied to the certain arguments to which no other method definition applies. It is one of the Julia language's most powerful features.

It is efficient with a great ease of expressiveness to generate specialized code and implement complex algorithms without caring much about the low-level implementation using Julia's multiple dispatch and flexible parametric type system.

Ambiguities – method definitions

Sometimes function behaviors are defined in such a way that there isn't a unique method to apply for a certain set of arguments. Julia throws a warning in such cases about this ambiguity, but proceeds by arbitrarily picking a method. To avoid this ambiguity we should define a method to handle such cases.

In the following example, we define a method definition with one argument of the type Any and another argument of the type Float64. In the second method definition, we just changed the order, but this doesn't differentiate it from the first definition. In this case, Julia will give a warning of ambiguous method definition but will allow us to proceed.

 

Facilitating language interoperability


Although Julia can be used to write most kinds of code, there are mature libraries for numerical and scientific computing which we would like to exploit. These libraries can be in C, Fortran or Python. Julia allows the ease of using the existing code written in Python, C, or Fortran. This is done by making Julia perform simple and efficient-to-call C, Fortran, or Python functions.

The C/Fortran libraries should be available to Julia. An ordinary but valid call with ccall is made to this code. This is possible when the code is available as a shared library. Julia's JIT generates the same machine instructions as the native C call. Therefore, it is generally no different from calling through a C code with a minimal overhead.

Importing Python code can be beneficial and sometimes needed, especially for data science, because it already has an exhaustive library of implementations of machine learning and statistical functions. For example, it contains scikit-learn and pandas. To use Python in Julia, we require PyCall.jl. To add PyCall.jl do the following:

Pkg.add("PyCall") 

PyCall contains a macro @pyimport that facilitates importing Python packages and provides Julia wrappers for all of the functions and constants therein, including automatic conversion of types between Julia and Python.

PyCall also provides functionalities for lower-level manipulation of Python objects, including a PyObject type for opaque Python objects. It also has a pycall function (similar to Julia's ccall function), which can be used in Julia to call Python functions with type conversions. PyCall does not use the Python program but links directly to the libpython library. During the Pkg.build, it finds the location of the libpython by Punning python.

Calling Python code in Julia

The @pyimport macro automatically makes the appropriate type conversions to Julia types in most of the scenarios based on a runtime inspection of the Python objects. It achieves better control over these type conversions by using lower-level functions. Using PyCall in scenarios where the return type is known can help in improving the performance, both by eliminating the overhead of runtime type inference, and also by providing more type information to the Julia compiler:

  • pycall(function::PyObject, returntype::Type, args...): This calls the given Python function (typically looked up from a module) with the given args... (of standard Julia types which are converted automatically to the corresponding Python types if possible), converting the return value to returntype (use a returntype of PyObject to return the unconverted Python object reference, or PyAny to request an automated conversion).

  • pyimport(s): This imports the Python modules (a string or symbol) and returns a pointer to it (a PyObject). Functions or other symbols in the module may then be looked up by s[name] where the name is a string (for the raw PyObject) or a symbol (for automatic type conversion). Unlike the @pyimport macro, this does not define a Julia module and members cannot be accessed with an s.name.

 

Summary


In this chapter, we learned how Julia is different and how an LLVM-based JIT compiler enables Julia to approach the performance of C/C++. We introduced you to how to download Julia, install it, and build it from source. The notable features that we found were that the language is elegant, concise, and powerful and it has amazing capabilities for numeric and scientific computing.

We worked on some examples of working with Julia via the command line (REPL) and saw how full of features the language shell is. The features found were tab-completion, reverse-search, and help functions. We also discussed why should we use Jupyter Notebook and went on to set up Jupyter with the IJulia package. We worked on a simple example to use the Jupyter Notebook and Julia's visualization package, Gadfly.

In addition, we learned about Julia's powerful built-in package management and how to add, update, and remove modules. Also, we went through the process of creating our own package and publishing it to the community. We also introduced you to one of the most powerful features of Julia—multiple dispatch—and worked on some basic examples of how to create method definitions to implement multiple dispatch.

In addition, we introduced you to the parallel computation, explaining how it is different from conventional message passing and how to make use of all the compute resources available. We also learned Julia's feature of language interoperability and how we can call a Python module or a library from the Julia program.

 
About the Author
  • Anshul Joshi

    Anshul Joshi is a data scientist with experience in recommendation systems, predictive modeling, neural networks, and high performance computing. His research interests encompass deep learning, artificial intelligence, and computational physics. Most of the time, he can be caught exploring GitHub or trying anything new he can get his hands on. You can also follow his personal blog.

    Browse publications by this author
Latest Reviews (5 reviews total)
The book is outdated. The book is written for Julia 0.4. The current version is 1.2. With the jump to version 1.0 there were breaks in the compatibility.
I'm used to quick and dirty books with Packt, but don't buy this! While I knew that the book was written with an earlier version of Julia in mind, I didn't expect code that is simply wrong; the book is also badly structured, concepts are sometimes not introduced, and the code is not well explained.
Este libro va a lo que importa a la hora de aprender.
Julia for Data Science
Unlock this book and the full library FREE for 7 days
Start now