Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Julia Cookbook
Julia Cookbook

Julia Cookbook: Over 40 recipes to get you up and running with programming using Julia

By Raj R Jalem , Jalem Raj Rohit
£25.99 £17.99
Book Sep 2016 172 pages 1st Edition
eBook
£25.99 £17.99
Print
£32.99
Subscription
£13.99 Monthly
eBook
£25.99 £17.99
Print
£32.99
Subscription
£13.99 Monthly

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Sep 30, 2016
Length 172 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781785882012
Category :
Languages :
Concepts :
Table of content icon View table of contents Preview book icon Preview Book

Julia Cookbook

Chapter 1. Extracting and Handling Data

In this chapter, we will cover the following recipes:

  • Why should we use Julia for data science?

  • Handling data with CSV files

  • Handling data with TSV files

  • Working with databases in Julia

  • Interacting with the Web

Introduction


This chapter deals with the importance of the Julia programming language for data science and its applications. It also serves as a guide to handling data in the most available formats and also shows how to crawl and scrape data from the Internet.

Data Science pipelines that are used for production purposes need to be robust and highly fault-tolerant, without which the teams would be exposed highly error-prone models. So, these pipelines contain a subprocess called Extract-Transform-Load (ETL), in which the Extraction step involves pulling the data from a source, the Transform step involves the transforms performed on the dataset as part of the cleansing process, and the Load step is about loading the now clean data into the local databases for use in production. This will chapter will also teach you how to interact with websites by sending and receiving data through HTTP requests. This would be the first step in any data science and analytics pipeline. So, this chapter will cover some of those methods through which data can be ingested into the pipeline through various data sources.

Why should we use Julia for data science?


Now, you are all set up to learn and experience Julia for data science.

Data Science is simply doing science with data. It applies to a surprisingly wide range of domains, such as engineering, business, marketing, and automotive, owing to the availability of a large amount of data in all these industries from which valuable insights can be extracted and understood.

With the growth of industries, the speed, volume, and variety of the data being produced are drastically increasing. And the tools that have to deal with this data are continuously being adapted, which led to the emergence of more evolved, powerful tools such as Julia.

Julia has been growing steadily as a powerful alternative to the current data science tools. Julia's diverse range of statistical packages along with its powerful compiler features make it a very strong competitor to the current top two programming languages of data science: R and Python. However, advanced users of R and Python can use Julia alongside each of them to reap the maximum benefits from the features of both.

Julia, with its ability to compile code that looks and reads like Python into machine code that performs like C, has showed a lot of promise with its efficiency at generating efficient code using the type inference. It is also interesting to note that even the core mathematical library of Julia is written in Julia itself. As it supports distributed parallel execution, numerical accuracy, and a powerful type inference, such as Python, and diverse range of statistical packages, such as R, Julia is a very powerful programming language for the very rapidly evolving domain of data science.

Installing and spinning up the Julia terminal is very easy, as follows:

  1. Download the Julia package suited to your operating system from  http://julialang.org/downloads/ .

  2. Then, fire up Julia's interactive session, which is also called repl (read-eval-print loop). The terminal output would look like this:

  3. Installing and spinning up the Julia terminal is very easy:

  4. Download the Julia package suited to your operating system from  http://julialang.org/downloads/ .

Then, fire up Julia's interactive session, which is also called as repl (read-eval-print loop). The terminal output would look something like this:

Now, you are all set up to learn and experience Julia for Data Science.

Handling data with CSV files


In this section, we will explain ways in which you can handle files with the Comma-separated Values (CSV) file format.

Getting ready

Install the DataFrames package, which is the Julia package for working with data arrays and dataframes. The command for adding the DataFrames packages to the catalog is as follows:

Pkg.add("DataFrames")

Make sure that all the installed packages are up-to-date: Pkg.update()

How to do it...

CSV files, as the name suggests, are files whose contents are separated by commas. CSV files can be accessed and read into the REPL process by executing the following steps:

  1. Assign a variable to the local source directory of the file:

    s = "/Users/username/dir/iris.csv"
    
  2. The readtable() command is used to read the data from the source. The data is read in the form of a Julia DataFrame:

    iris = readtable(s)
    

Data can be written to CSV files from a Julia DataFrame using the following steps:

  1. Create a data structure with some data inside it. For example, let's create a two-dimensional dataframe to view the the process of writing files of different formats better using DataFrames:

    df = DataFrame(A = 1:10, B = 11:20)
    
    • The preceding command creates a two-dimensional dataframe with columns named A and B.

  2. Now, the dataframe created in Step 1 can be exported to an external CSV file by using the following command:

    writetable("data.csv", df)
    

Handling data with TSV files


In this section, we will explain how to handle Tab Separated Values (TSV) files.

Getting ready

The DataFrames package is needed to deal with TSV files. So, as it is already installed as instructed in the previous section, we can move ahead and make sure that all the packages are up-to-date with the following command:

Pkg.update()

How to do it...

TSV files, as the name suggests, are files whose contents are separated by commas. TSV files can be accessed and read into the REPL process by the following method:

  1. Assign a variable to the local source directory of the file:

    s = "/Users/username/dir/data.tsv"
    
  2. The readtable() command is used to read the data from the source. The data is read in the form of a Julia DataFrame:

    data = readtable(s)
    

Data can be written to TSV files from a Julia DataFrame using the following steps:

  1. Create a data structure with some data inside it. For example, let's create a two-dimensional dataframe like the one we created in the previous example:

    using DataFrames
    df = DataFrame(A = 1:10, B = 11:20)
    
  2. Now, the dataframe, which we created in Step 1, can be exported to an external TSV file using the following command:

    writetable("data.csv",df)
    

The writetable() command is clever enough to make out the format of the file from the filename extension.

Working with databases in Julia


In this section, we will explain ways to handle data stored in databases: MySQL and PostgreSQL.

Getting ready

MySQL is an open source relational database. To be able to interact with your MySQL databases from Julia, the database server (along with the relevant Julia package) needs to be installed. Assuming that the database is already set up and the MySQL session is already up and running, install the MySQL bindings for Julia by directly cloning the repository:

Pkg.clone("https://github.com/JuliaComputing/MySQL.jl")

PostgreSQL is an open source object relational database. Similar to the MySQL setup, the server of the PostgreSQL database should be up and running with a session. Now, install the PostgreSQL bindings for Julia by following the given instructions:

  1. Install the DBI package. The DBI package is a database-independent API that complies with almost all database drivers.

  2. The DBI package from Julia can be installed by directly cloning it from its repository using the following statement:

    Pkg.clone("https://github.com/JuliaDB/DBI.jl")
    
  3. Then, install the PostgreSQL library by directly cloning the library's repository using the following statement:

    Pkg.clone("https://github.com/JuliaDB/PostgreSQL.jl")
    
  4. SQLite is a light, server-less, self-contained, transactional SQL database engine. To interact with data in SQLite databases, one has to first install the SQLite server and make sure that it is up and running and displaying a prompt like this:

  5. Now, the SQLite bindings for Julia can be installed through the following steps:

    1. Add the SQLite Julia package by running the following standard package installation command:

      Pkg.add("SQLite")
      

How to do it...

Here, you will learn about connecting to databases and executing queries to manipulate and analyze data. You will also learn about the various protocols and libraries in Julia that will help you interact with databases.

MySQL

A MySQL database can be connected by a simple command that takes in the host, username, password, and database name as parameters. Let's take a look at the following steps:

  1. First, import the MySQL package:

    using MySQL
    
  2. Set up the connection to a MySQL database by including all the required parameters to establish a connection:

    conn = mysql_connect(host, user_name, password, dbname)
    
  3. Now, let's write and run a basic table creation query:

    1. Assign the query statement to a variable.

      query = """ CREATE TABLE Student
                       (
                           ID INT NOT NULL AUTO_INCREMENT,
                           Name VARCHAR(255),
                           Attendance FLOAT,
                           JoinDate DATE,
                           Enrolments INT,
                           PRIMARY KEY (ID)
                       );"""
      
    2. Now to make sure that the query is successfully created, we can get back the response from the connection.

      response = mysql_query(conn, query)
      
    3. Check for a successful connection through conditional statements:

      if (response == 0)
              println("Connection successful. Table created")
      else
          println("Connection failed. Table not created.")
      end
      
  4. Queries on the database can be executed by the execute_query() command, which takes the connection variable and the query as parameters. A sample SELECT query can be executed through the following steps:

    query = """SELECT * FROM Student;"""
    data = execute_query(conn, query)
    
  5. To get the query results in the form of a Julia array, an extra parameter called opformat should be specified:

    data_array = execute_query(conn, query, opformat = MYSQL_ARRAY)
    
  6. Finally, to execute multiple queries at once, use the mysql_execute_multi_query() command:

    query = """INSERT INTO Student (Name) VALUES ('');
    UPDATE Student SET JoinDate = '08-07-15' WHERE LENGTH(Name) > 5;"""
    rows = mysql_execute_multi_query(conn, query)
    println("Rows updated by the query: $rows")
    

PostgreSQL

Data handling within a PostgreSQL database can be done by connecting to the database. Firstly, make sure that the database server is up and running. Now, the data in the database can be handled through the following procedure:

  1. Firstly, import the requisite packages, which are the DBI and the PostgreSQL databases, using the import statements:

    using DBI
    using PostgreSQL
    
  2. In addition, the required packages for the PostgreSQL library are as follows:

    • DataFrames.jl: This has already been installed previously.

    • DataArrays.jl: This can be installed by running the statement Pkg.add("DataArrays")).

  3. Make a connection to a PostgreSQL database of your choice. It is done through the connect function, which takes in the type of database, the username, the password, the port number, and the database name as input parameters. So, the connection can be established using the following statement:

    conn = connect(Postgres, "localhost", "password", "testdb", 5432)
    
  4. If the connection is successful, a message similar to this appears on the screen:

    PostgreSQL.PostgresDatabaseHandle(Ptr{Void}
    
            @0x00007  fa8a559f160,0x00000000,false)
    
  5. Now, prepare the query and tag it to the connection we prepared in the previous step. This can be done using the prepare function, which takes the connection and the query as parameters. So, the execution statement looks something like this:

    query = prepare(conn,  "SELECT 1::int, 2.0::double precision, 
            'name'::character varying, " *  "'name'::character(20);"))
    
  6. As the query is prepared, let's now execute it, just like we did for MySQL. To do this, we have to enter the query variable, which we created in the previous step, into the execute function. It is done as follows:

    result = execute(query)
    
  7. Now that the query execution is over, the connection can be disconnected using the finish and disconnect functions, which take the query and the connection variables as the input parameters, respectively. The statements can be executed as follows:

    finish(query)
    disconnect(conn)
    
  8. Now, the results of the query are in the result variable, which can be used for analytics by either moulding it into a dataframe or any other data structure of your choice. The same method can be used for all operations on PostgreSQL databases, which include addition, updating, and deleting.

  9. This resource would help you better understand the Database-Independent API (DBI), which we use to connect local PostgreSQL databases such as SQLite.

  10. Import the SQLite package into the current session and ensure that the SQLite server is up and running. The package can be imported by running the following command:

    using SQLite
    
  11. Now, a connection to any database can be made through the SQLiteDB() function in Julia Version 3 and the SQLite.DB() function in Julia Version 4.

  12. The connection can be made in Julia version 4 as follows:

    db = SQLite.DB("dbname.sqlite")
    
  13. The connection can be made in Julia version 3 as follows:

    db = SQLiteDB("dbname.sqlite")
    
  14. Now, as the connection is made, queries can be executed using the query() function in Version 3 and the SQLite.query() function in Version 4.

    • In Version 3:

      query(db, "A SQL query")
      
    • In Version 4:

      SQLite.query(db, "A SQL query")
      

The SQLite.jl package also allows the user to use macros and registers for manipulating and using data. However, the concepts are beyond the scope of this chapter.

So, these are some of the ways through which data can be handled in Julia. There are a lot of databases whose connectors directly connect to DBI, such as SQlite, MySQL, and so on, and through which queries and their execution can be carried out, as shown in the PostgreSQL section. Similarly, data can be scraped from the Internet and used for analytics, which can be achieved through a combination of Julia libraries, but that is beyond the scope of this book.

There's more...

MySQL

The following resource helps you learn more about its advanced features and provides information about the MySQL.jl library of Julia. This includes performance benchmarks and details, as well as information on CRUD and testing:

https://github.com/JuliaDB/MySQL.jl

PostgreSQL

Visit https://github.com/JuliaDB/DBI.jl to understand better the DBI we use to connect local PostgreSQL databases:

Visit https://github.com/JuliaDB/DBI.jl for extended and in-depth documentation on the PostgreSQL.jl library, which includes dealing with Amazon web services, and so on.

SQLite

Now, as you have learned the ways in which data can be extracted, manipulated, and worked on from various external sources, there are some more interesting things that the database drivers of Julia can do apart from just executing queries. You can find those at https://github.com/JuliaDB/SQLite.jl/blob/master/OLD_README.md#custom-scalar-functions .

Interacting with the Web


In this section, you will learn how to interact with the Web through HTTP requests, both for getting data and posting data to the Web. You will learn about sending and getting requests to and from websites and also analyzing those responses.

Getting ready

Start by downloading and installing the Requests.jl package of Julia, which is available at Pkg.add("Requests").

Make sure that you have an active Internet connection while reading and using the code in the recipe, as it deals with interacting with live websites on the Web. You can experiment with this recipe on the website http://httpbin.org , as it is designed especially for such experiments and tutorials.

This is how you use the Requests.jl package and import the required modules:

  1. Start by importing the package:

    Pkg.add("Requests")
    
  2. Next, import the necessary modules from the package for quick use. The modules that will be used in this recipe are getpostput, and delete. So, this is how to import the modules:

    import Requests: get, post
    

How to do it...

Here, you will learn how to interact with the Web through the HTTP protocol and requests. You will also learn how to send and receive data, and autofill forms on the Internet, through HTTP requests.

GET request

  1. The GET request is used to request data from a specified web resource. So, this is how we send the GET request to a website:

    get("url of the website")
    
  2. To get requests from a specific web page inside the website, the query parameter of the GET command can be used to specify the web page. This is how you do it:

    get("url of the website"; query = Dict("title" => 
            "page number/page name"))
    
  3. Timeouts can also be set for the GET requests. This would be useful for identifying unresponsive websites/web pages. The timeout parameter in the GET request takes a particular numeric value to be set as the timeout threshold; above this, if the server does not return any data, a timeout request will be thrown. This is how you set it:

    get("url of the website"; timeout = 0.5)
    
    • Here, 0.5 means 50 ms.

  4. Some websites redirect users to different web pages or sometimes to different websites. So, to avoid getting your request repeatedly redirected, you can set the max_redirects and allow_redirects parameters in the GET request. This is how they can be set:

    get("url of the website"; max_redirects = 4)
    
  5. Now, to set the allow_redirects parameter preventing the site from redirecting your GET requests:

    get("url of the website"; allow_redirects = false)
    
    • This would not allow the website to redirect your GET request. If a redirect is triggered, it throws an error.

  6. The POST request submits data to a specific web resource. So, this is how to send a post request to a website:

    post("url of the website")
    
  7. Data can be sent to a web resource through the POST request by adding it into the data parameter in the POST request statement:

    post("url of the website"; data = "Data to be sent")
    
  8. Data for filling forms on the Web also can be sent through the POST request through the same data parameter, but the data should now be sent in the form of a Julia dictionary data structure:

    post("url of the website"; data = Dict(First_Name => "abc",
            Last_Name => "xyz" ))
    
  9. Data such as session cookies can also be sent through the POST request by including the session details inside a Julia Dictionary and including it in the POST request as the cookies parameter:

    post("url of the website"; cookies = Dict("sessionkey" => "key"))
    
  10. Files can also be sent to web resources through the POST requests. This can be done by including the files in the files parameter of the POST request:

    file = "xyz.jl"
    post("url of the website"; files = [FileParam(file), "text/julia", 
            "file_name", "file_name.jl"])
    

There's more...

There are more HTTP requests with which you can interact with web resources such as the PUT and DELETE requests. All of them can be studied in detail from the documentation for the Requests.jl package, which is available at  https://github.com/JuliaWeb/Requests.jl .

Left arrow icon Right arrow icon

Key benefits

  • Follow a practical approach to learn Julia programming the easy way
  • Get an extensive coverage of Julia’s packages for statistical analysis
  • This recipe-based approach will help you get familiar with the key concepts in Julia

Description

Want to handle everything that Julia can throw at you and get the most of it every day? This practical guide to programming with Julia for performing numerical computation will make you more productive and able work with data more efficiently. The book starts with the main features of Julia to help you quickly refresh your knowledge of functions, modules, and arrays. We’ll also show you how to utilize the Julia language to identify, retrieve, and transform data sets so you can perform data analysis and data manipulation. Later on, you’ll see how to optimize data science programs with parallel computing and memory allocation. You’ll get familiar with the concepts of package development and networking to solve numerical problems using the Julia platform. This book includes recipes on identifying and classifying data science problems, data modelling, data analysis, data manipulation, meta-programming, multidimensional arrays, and parallel computing. By the end of the book, you will acquire the skills to work more effectively with your data.

What you will learn

[*] Extract and handle your data with Julia [*] Uncover the concepts of metaprogramming in Julia [*] Conduct statistical analysis with StatsBase.jl and Distributions.jl [*] Build your data science models [*] Find out how to visualize your data with Gadfly [*] Explore big data concepts in Julia

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Sep 30, 2016
Length 172 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781785882012
Category :
Languages :
Concepts :

Table of Contents

12 Chapters
Julia Cookbook Chevron down icon Chevron up icon
Credits Chevron down icon Chevron up icon
About the Author Chevron down icon Chevron up icon
About the Reviewer Chevron down icon Chevron up icon
www.PacktPub.com Chevron down icon Chevron up icon
Preface Chevron down icon Chevron up icon
Extracting and Handling Data Chevron down icon Chevron up icon
Metaprogramming Chevron down icon Chevron up icon
Statistics with Julia Chevron down icon Chevron up icon
Building Data Science Models Chevron down icon Chevron up icon
Working with Visualizations Chevron down icon Chevron up icon
Parallel Computing Chevron down icon Chevron up icon

Customer reviews

Filter icon Filter
Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%

Filter reviews by


No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.