Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Julia Cookbook
Julia Cookbook

Julia Cookbook: Over 40 recipes to get you up and running with programming using Julia

By Raj R Jalem , Jalem Raj Rohit
€14.99 per month
Book Sep 2016 172 pages 1st Edition
eBook
€25.99 €17.99
Print
€32.99
Subscription
€14.99 Monthly
eBook
€25.99 €17.99
Print
€32.99
Subscription
€14.99 Monthly

What do you get with a Packt Subscription?

Free for first 7 days. $15.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details


Publication date : Sep 30, 2016
Length 172 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781785882012
Category :
Languages :
Concepts :
Table of content icon View table of contents Preview book icon Preview Book

Julia Cookbook

Chapter 1. Extracting and Handling Data

In this chapter, we will cover the following recipes:

  • Why should we use Julia for data science?

  • Handling data with CSV files

  • Handling data with TSV files

  • Working with databases in Julia

  • Interacting with the Web

Introduction


This chapter deals with the importance of the Julia programming language for data science and its applications. It also serves as a guide to handling data in the most available formats and also shows how to crawl and scrape data from the Internet.

Data Science pipelines that are used for production purposes need to be robust and highly fault-tolerant, without which the teams would be exposed highly error-prone models. So, these pipelines contain a subprocess called Extract-Transform-Load (ETL), in which the Extraction step involves pulling the data from a source, the Transform step involves the transforms performed on the dataset as part of the cleansing process, and the Load step is about loading the now clean data into the local databases for use in production. This will chapter will also teach you how to interact with websites by sending and receiving data through HTTP requests. This would be the first step in any data science and analytics pipeline. So, this chapter will cover some of those methods through which data can be ingested into the pipeline through various data sources.

Why should we use Julia for data science?


Now, you are all set up to learn and experience Julia for data science.

Data Science is simply doing science with data. It applies to a surprisingly wide range of domains, such as engineering, business, marketing, and automotive, owing to the availability of a large amount of data in all these industries from which valuable insights can be extracted and understood.

With the growth of industries, the speed, volume, and variety of the data being produced are drastically increasing. And the tools that have to deal with this data are continuously being adapted, which led to the emergence of more evolved, powerful tools such as Julia.

Julia has been growing steadily as a powerful alternative to the current data science tools. Julia's diverse range of statistical packages along with its powerful compiler features make it a very strong competitor to the current top two programming languages of data science: R and Python. However, advanced users of R and Python can use Julia alongside each of them to reap the maximum benefits from the features of both.

Julia, with its ability to compile code that looks and reads like Python into machine code that performs like C, has showed a lot of promise with its efficiency at generating efficient code using the type inference. It is also interesting to note that even the core mathematical library of Julia is written in Julia itself. As it supports distributed parallel execution, numerical accuracy, and a powerful type inference, such as Python, and diverse range of statistical packages, such as R, Julia is a very powerful programming language for the very rapidly evolving domain of data science.

Installing and spinning up the Julia terminal is very easy, as follows:

  1. Download the Julia package suited to your operating system from  http://julialang.org/downloads/ .

  2. Then, fire up Julia's interactive session, which is also called repl (read-eval-print loop). The terminal output would look like this:

  3. Installing and spinning up the Julia terminal is very easy:

  4. Download the Julia package suited to your operating system from  http://julialang.org/downloads/ .

Then, fire up Julia's interactive session, which is also called as repl (read-eval-print loop). The terminal output would look something like this:

Now, you are all set up to learn and experience Julia for Data Science.

Handling data with CSV files


In this section, we will explain ways in which you can handle files with the Comma-separated Values (CSV) file format.

Getting ready

Install the DataFrames package, which is the Julia package for working with data arrays and dataframes. The command for adding the DataFrames packages to the catalog is as follows:

Pkg.add("DataFrames")

Make sure that all the installed packages are up-to-date: Pkg.update()

How to do it...

CSV files, as the name suggests, are files whose contents are separated by commas. CSV files can be accessed and read into the REPL process by executing the following steps:

  1. Assign a variable to the local source directory of the file:

    s = "/Users/username/dir/iris.csv"
    
  2. The readtable() command is used to read the data from the source. The data is read in the form of a Julia DataFrame:

    iris = readtable(s)
    

Data can be written to CSV files from a Julia DataFrame using the following steps:

  1. Create a data structure with some data inside it. For example, let's create a two-dimensional dataframe to view the the process of writing files of different formats better using DataFrames:

    df = DataFrame(A = 1:10, B = 11:20)
    
    • The preceding command creates a two-dimensional dataframe with columns named A and B.

  2. Now, the dataframe created in Step 1 can be exported to an external CSV file by using the following command:

    writetable("data.csv", df)
    

Handling data with TSV files


In this section, we will explain how to handle Tab Separated Values (TSV) files.

Getting ready

The DataFrames package is needed to deal with TSV files. So, as it is already installed as instructed in the previous section, we can move ahead and make sure that all the packages are up-to-date with the following command:

Pkg.update()

How to do it...

TSV files, as the name suggests, are files whose contents are separated by commas. TSV files can be accessed and read into the REPL process by the following method:

  1. Assign a variable to the local source directory of the file:

    s = "/Users/username/dir/data.tsv"
    
  2. The readtable() command is used to read the data from the source. The data is read in the form of a Julia DataFrame:

    data = readtable(s)
    

Data can be written to TSV files from a Julia DataFrame using the following steps:

  1. Create a data structure with some data inside it. For example, let's create a two-dimensional dataframe like the one we created in the previous example:

    using DataFrames
    df = DataFrame(A = 1:10, B = 11:20)
    
  2. Now, the dataframe, which we created in Step 1, can be exported to an external TSV file using the following command:

    writetable("data.csv",df)
    

The writetable() command is clever enough to make out the format of the file from the filename extension.

Working with databases in Julia


In this section, we will explain ways to handle data stored in databases: MySQL and PostgreSQL.

Getting ready

MySQL is an open source relational database. To be able to interact with your MySQL databases from Julia, the database server (along with the relevant Julia package) needs to be installed. Assuming that the database is already set up and the MySQL session is already up and running, install the MySQL bindings for Julia by directly cloning the repository:

Pkg.clone("https://github.com/JuliaComputing/MySQL.jl")

PostgreSQL is an open source object relational database. Similar to the MySQL setup, the server of the PostgreSQL database should be up and running with a session. Now, install the PostgreSQL bindings for Julia by following the given instructions:

  1. Install the DBI package. The DBI package is a database-independent API that complies with almost all database drivers.

  2. The DBI package from Julia can be installed by directly cloning it from its repository using the following statement:

    Pkg.clone("https://github.com/JuliaDB/DBI.jl")
    
  3. Then, install the PostgreSQL library by directly cloning the library's repository using the following statement:

    Pkg.clone("https://github.com/JuliaDB/PostgreSQL.jl")
    
  4. SQLite is a light, server-less, self-contained, transactional SQL database engine. To interact with data in SQLite databases, one has to first install the SQLite server and make sure that it is up and running and displaying a prompt like this:

  5. Now, the SQLite bindings for Julia can be installed through the following steps:

    1. Add the SQLite Julia package by running the following standard package installation command:

      Pkg.add("SQLite")
      

How to do it...

Here, you will learn about connecting to databases and executing queries to manipulate and analyze data. You will also learn about the various protocols and libraries in Julia that will help you interact with databases.

MySQL

A MySQL database can be connected by a simple command that takes in the host, username, password, and database name as parameters. Let's take a look at the following steps:

  1. First, import the MySQL package:

    using MySQL
    
  2. Set up the connection to a MySQL database by including all the required parameters to establish a connection:

    conn = mysql_connect(host, user_name, password, dbname)
    
  3. Now, let's write and run a basic table creation query:

    1. Assign the query statement to a variable.

      query = """ CREATE TABLE Student
                       (
                           ID INT NOT NULL AUTO_INCREMENT,
                           Name VARCHAR(255),
                           Attendance FLOAT,
                           JoinDate DATE,
                           Enrolments INT,
                           PRIMARY KEY (ID)
                       );"""
      
    2. Now to make sure that the query is successfully created, we can get back the response from the connection.

      response = mysql_query(conn, query)
      
    3. Check for a successful connection through conditional statements:

      if (response == 0)
              println("Connection successful. Table created")
      else
          println("Connection failed. Table not created.")
      end
      
  4. Queries on the database can be executed by the execute_query() command, which takes the connection variable and the query as parameters. A sample SELECT query can be executed through the following steps:

    query = """SELECT * FROM Student;"""
    data = execute_query(conn, query)
    
  5. To get the query results in the form of a Julia array, an extra parameter called opformat should be specified:

    data_array = execute_query(conn, query, opformat = MYSQL_ARRAY)
    
  6. Finally, to execute multiple queries at once, use the mysql_execute_multi_query() command:

    query = """INSERT INTO Student (Name) VALUES ('');
    UPDATE Student SET JoinDate = '08-07-15' WHERE LENGTH(Name) > 5;"""
    rows = mysql_execute_multi_query(conn, query)
    println("Rows updated by the query: $rows")
    

PostgreSQL

Data handling within a PostgreSQL database can be done by connecting to the database. Firstly, make sure that the database server is up and running. Now, the data in the database can be handled through the following procedure:

  1. Firstly, import the requisite packages, which are the DBI and the PostgreSQL databases, using the import statements:

    using DBI
    using PostgreSQL
    
  2. In addition, the required packages for the PostgreSQL library are as follows:

    • DataFrames.jl: This has already been installed previously.

    • DataArrays.jl: This can be installed by running the statement Pkg.add("DataArrays")).

  3. Make a connection to a PostgreSQL database of your choice. It is done through the connect function, which takes in the type of database, the username, the password, the port number, and the database name as input parameters. So, the connection can be established using the following statement:

    conn = connect(Postgres, "localhost", "password", "testdb", 5432)
    
  4. If the connection is successful, a message similar to this appears on the screen:

    PostgreSQL.PostgresDatabaseHandle(Ptr{Void}
    
            @0x00007  fa8a559f160,0x00000000,false)
    
  5. Now, prepare the query and tag it to the connection we prepared in the previous step. This can be done using the prepare function, which takes the connection and the query as parameters. So, the execution statement looks something like this:

    query = prepare(conn,  "SELECT 1::int, 2.0::double precision, 
            'name'::character varying, " *  "'name'::character(20);"))
    
  6. As the query is prepared, let's now execute it, just like we did for MySQL. To do this, we have to enter the query variable, which we created in the previous step, into the execute function. It is done as follows:

    result = execute(query)
    
  7. Now that the query execution is over, the connection can be disconnected using the finish and disconnect functions, which take the query and the connection variables as the input parameters, respectively. The statements can be executed as follows:

    finish(query)
    disconnect(conn)
    
  8. Now, the results of the query are in the result variable, which can be used for analytics by either moulding it into a dataframe or any other data structure of your choice. The same method can be used for all operations on PostgreSQL databases, which include addition, updating, and deleting.

  9. This resource would help you better understand the Database-Independent API (DBI), which we use to connect local PostgreSQL databases such as SQLite.

  10. Import the SQLite package into the current session and ensure that the SQLite server is up and running. The package can be imported by running the following command:

    using SQLite
    
  11. Now, a connection to any database can be made through the SQLiteDB() function in Julia Version 3 and the SQLite.DB() function in Julia Version 4.

  12. The connection can be made in Julia version 4 as follows:

    db = SQLite.DB("dbname.sqlite")
    
  13. The connection can be made in Julia version 3 as follows:

    db = SQLiteDB("dbname.sqlite")
    
  14. Now, as the connection is made, queries can be executed using the query() function in Version 3 and the SQLite.query() function in Version 4.

    • In Version 3:

      query(db, "A SQL query")
      
    • In Version 4:

      SQLite.query(db, "A SQL query")
      

The SQLite.jl package also allows the user to use macros and registers for manipulating and using data. However, the concepts are beyond the scope of this chapter.

So, these are some of the ways through which data can be handled in Julia. There are a lot of databases whose connectors directly connect to DBI, such as SQlite, MySQL, and so on, and through which queries and their execution can be carried out, as shown in the PostgreSQL section. Similarly, data can be scraped from the Internet and used for analytics, which can be achieved through a combination of Julia libraries, but that is beyond the scope of this book.

There's more...

MySQL

The following resource helps you learn more about its advanced features and provides information about the MySQL.jl library of Julia. This includes performance benchmarks and details, as well as information on CRUD and testing:

https://github.com/JuliaDB/MySQL.jl

PostgreSQL

Visit https://github.com/JuliaDB/DBI.jl to understand better the DBI we use to connect local PostgreSQL databases:

Visit https://github.com/JuliaDB/DBI.jl for extended and in-depth documentation on the PostgreSQL.jl library, which includes dealing with Amazon web services, and so on.

SQLite

Now, as you have learned the ways in which data can be extracted, manipulated, and worked on from various external sources, there are some more interesting things that the database drivers of Julia can do apart from just executing queries. You can find those at https://github.com/JuliaDB/SQLite.jl/blob/master/OLD_README.md#custom-scalar-functions .

Interacting with the Web


In this section, you will learn how to interact with the Web through HTTP requests, both for getting data and posting data to the Web. You will learn about sending and getting requests to and from websites and also analyzing those responses.

Getting ready

Start by downloading and installing the Requests.jl package of Julia, which is available at Pkg.add("Requests").

Make sure that you have an active Internet connection while reading and using the code in the recipe, as it deals with interacting with live websites on the Web. You can experiment with this recipe on the website http://httpbin.org , as it is designed especially for such experiments and tutorials.

This is how you use the Requests.jl package and import the required modules:

  1. Start by importing the package:

    Pkg.add("Requests")
    
  2. Next, import the necessary modules from the package for quick use. The modules that will be used in this recipe are getpostput, and delete. So, this is how to import the modules:

    import Requests: get, post
    

How to do it...

Here, you will learn how to interact with the Web through the HTTP protocol and requests. You will also learn how to send and receive data, and autofill forms on the Internet, through HTTP requests.

GET request

  1. The GET request is used to request data from a specified web resource. So, this is how we send the GET request to a website:

    get("url of the website")
    
  2. To get requests from a specific web page inside the website, the query parameter of the GET command can be used to specify the web page. This is how you do it:

    get("url of the website"; query = Dict("title" => 
            "page number/page name"))
    
  3. Timeouts can also be set for the GET requests. This would be useful for identifying unresponsive websites/web pages. The timeout parameter in the GET request takes a particular numeric value to be set as the timeout threshold; above this, if the server does not return any data, a timeout request will be thrown. This is how you set it:

    get("url of the website"; timeout = 0.5)
    
    • Here, 0.5 means 50 ms.

  4. Some websites redirect users to different web pages or sometimes to different websites. So, to avoid getting your request repeatedly redirected, you can set the max_redirects and allow_redirects parameters in the GET request. This is how they can be set:

    get("url of the website"; max_redirects = 4)
    
  5. Now, to set the allow_redirects parameter preventing the site from redirecting your GET requests:

    get("url of the website"; allow_redirects = false)
    
    • This would not allow the website to redirect your GET request. If a redirect is triggered, it throws an error.

  6. The POST request submits data to a specific web resource. So, this is how to send a post request to a website:

    post("url of the website")
    
  7. Data can be sent to a web resource through the POST request by adding it into the data parameter in the POST request statement:

    post("url of the website"; data = "Data to be sent")
    
  8. Data for filling forms on the Web also can be sent through the POST request through the same data parameter, but the data should now be sent in the form of a Julia dictionary data structure:

    post("url of the website"; data = Dict(First_Name => "abc",
            Last_Name => "xyz" ))
    
  9. Data such as session cookies can also be sent through the POST request by including the session details inside a Julia Dictionary and including it in the POST request as the cookies parameter:

    post("url of the website"; cookies = Dict("sessionkey" => "key"))
    
  10. Files can also be sent to web resources through the POST requests. This can be done by including the files in the files parameter of the POST request:

    file = "xyz.jl"
    post("url of the website"; files = [FileParam(file), "text/julia", 
            "file_name", "file_name.jl"])
    

There's more...

There are more HTTP requests with which you can interact with web resources such as the PUT and DELETE requests. All of them can be studied in detail from the documentation for the Requests.jl package, which is available at  https://github.com/JuliaWeb/Requests.jl .

Left arrow icon Right arrow icon

Key benefits

  • Follow a practical approach to learn Julia programming the easy way
  • Get an extensive coverage of Julia’s packages for statistical analysis
  • This recipe-based approach will help you get familiar with the key concepts in Julia

Description

Want to handle everything that Julia can throw at you and get the most of it every day? This practical guide to programming with Julia for performing numerical computation will make you more productive and able work with data more efficiently. The book starts with the main features of Julia to help you quickly refresh your knowledge of functions, modules, and arrays. We’ll also show you how to utilize the Julia language to identify, retrieve, and transform data sets so you can perform data analysis and data manipulation. Later on, you’ll see how to optimize data science programs with parallel computing and memory allocation. You’ll get familiar with the concepts of package development and networking to solve numerical problems using the Julia platform. This book includes recipes on identifying and classifying data science problems, data modelling, data analysis, data manipulation, meta-programming, multidimensional arrays, and parallel computing. By the end of the book, you will acquire the skills to work more effectively with your data.

What you will learn

[*] Extract and handle your data with Julia [*] Uncover the concepts of metaprogramming in Julia [*] Conduct statistical analysis with StatsBase.jl and Distributions.jl [*] Build your data science models [*] Find out how to visualize your data with Gadfly [*] Explore big data concepts in Julia

What do you get with a Packt Subscription?

Free for first 7 days. $15.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details


Publication date : Sep 30, 2016
Length 172 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781785882012
Category :
Languages :
Concepts :

Table of Contents

12 Chapters
Julia Cookbook Chevron down icon Chevron up icon
Credits Chevron down icon Chevron up icon
About the Author Chevron down icon Chevron up icon
About the Reviewer Chevron down icon Chevron up icon
www.PacktPub.com Chevron down icon Chevron up icon
Preface Chevron down icon Chevron up icon
Extracting and Handling Data Chevron down icon Chevron up icon
Metaprogramming Chevron down icon Chevron up icon
Statistics with Julia Chevron down icon Chevron up icon
Building Data Science Models Chevron down icon Chevron up icon
Working with Visualizations Chevron down icon Chevron up icon
Parallel Computing Chevron down icon Chevron up icon

Customer reviews

Filter icon Filter
Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%

Filter reviews by


No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.