Reader small image

You're reading from  Scala for Data Science

Product typeBook
Published inJan 2016
Reading LevelIntermediate
Publisher
ISBN-139781785281372
Edition1st Edition
Languages
Right arrow
Author (1)
Pascal Bugnion
Pascal Bugnion
author image
Pascal Bugnion

Pascal Bugnion is a data engineer at the ASI, a consultancy offering bespoke data science services. Previously, he was the head of data engineering at SCL Elections. He holds a PhD in computational physics from Cambridge University. Besides Scala, Pascal is a keen Python developer. He has contributed to NumPy, matplotlib and IPython. He also maintains scikit-monaco, an open source library for Monte Carlo integration. He currently lives in London, UK.
Read more about Pascal Bugnion

Right arrow

Chapter 6. Slick – A Functional Interface for SQL

In Chapter 5, Scala and SQL through JDBC, we investigated how to access SQL databases with JDBC. As interacting with JDBC feels somewhat unnatural, we extended JDBC using custom wrappers. The wrappers were developed to provide a functional interface to hide the imperative nature of JDBC.

With the difficulty of interacting directly with JDBC from Scala and the ubiquity of SQL databases, you would expect there to be existing Scala libraries that wrap JDBC. Slick is such a library.

Slick styles itself as a functional-relational mapping library, a play on the more traditional object-relational mapping name used to denote libraries that build objects from relational databases. It presents a functional interface to SQL databases, allowing the client to interact with them in a manner similar to native Scala collections.

FEC data


In this chapter, we will use a somewhat more involved example dataset. The Federal Electoral Commission of the United States (FEC) records all donations to presidential candidates greater than $200. These records are publicly available. We will look at the donations for the campaign leading up to the 2012 general elections that resulted in Barack Obama's re-election. The data includes donations to the two presidential candidates, Obama and Romney, and also to the other contenders in the Republican primaries (there were no Democrat primaries).

In this chapter, we will take the transaction data provided by the FEC, store it in a table, and learn how to query and analyze it.

The first step is to acquire the data. If you have downloaded the code samples from the Packt website, you should already have two CSVs in the data directory of the code samples for this chapter. If not, you can download the files using the following links:

  • data.scala4datascience.com/fec/ohio.csv.gz (or ohio.csv...

Invokers


Invokers are the components of a Slick query that build up the SQL select statement. Slick exposes a variety of invokers that allow the construction of complex queries. Let's look at some of these invokers here:

  • The map invoker is useful to select individual columns or apply operations to columns:

    scala> db.withSession { implicit session =>
      Tables.transactions.map {
        _.candidate 
      }.take(5).list       
    }
    List[String] = List(Obama, Barack, Paul, Ron, Paul, Ron, Paul, Ron, Obama, Barack)
    
  • The filter invoker is the equivalent of the WHERE statements in SQL. Note that Slick fields must be compared using ===:

    scala> db.withSession { implicit session => 
      Tables.transactions.filter {
        _.candidate === "Obama, Barack"
      }.take(5).list
    }
    List[Tables.Transactions#TableElementType] = List(Transaction(Some(1),Obama, Barack,Doe, John,TX,None,200,2010-06-22), ...
    

    Similarly, to filter out donations to Barack Obama, use the =!= operator:

    scala> db.withSession { implicit...

Operations on columns


In the previous section, you learned about the different invokers and how they mapped to SQL statements. We brushed over the methods supported by columns themselves, however: we can compare for equality using ===, but what other operations are supported by Slick columns?

Most of the SQL functions are supported. For instance, to get the total donations to candidates whose name starts with "O", we could run the following:

scala> db.withSession { implicit session =>
  Tables.transactions.filter { 
    _.candidate.startsWith("O") 
  }.take(5).list 
}
List[Tables.Transactions#TableElementType] = List(Transaction(Some(1594098)...

Similarly, to count donations that happened between January 1, 2011 and February 1, 2011, we can use the .between method on the date column:

scala> val dateParser = new SimpleDateFormat("dd-MM-yyyy")
dateParser: java.text.SimpleDateFormat = SimpleDateFormat

scala> val startDate = new java.sql.Date(dateParser.parse("01-01-2011").getTime...

Aggregations with "Group by"


Slick also provides a groupBy method that behaves like the groupBy method of native Scala collections. Let's get a list of candidates with all the donations for each candidate:

scala> val grouped = Tables.transactions.groupBy { _.candidate }
grouped: scala.slick.lifted.Query[(scala.slick.lifted.Column[...

scala> val aggregated = grouped.map {
  case (candidate, group) =>
    (candidate -> group.map { _.amount }.sum)
}
aggregated: scala.slick.lifted.Query[(scala.slick.lifted.Column[...

scala> val groupedDonations = db.withSession { 
  implicit session => aggregated.list 
}
groupedDonations: List[(String, Option[Long])] = List((Bachmann, Michele,Some(7439272)),...

Let's break this down. The first statement, transactions.groupBy { _.candidate }, specifies the key by which to group. You can think of this as building an intermediate list of (String, List[Transaction]) tuples mapping the group key to a list of all the table rows that satisfy this...

Accessing database metadata


Commonly, especially during development, you might start the script by dropping the table if it exists, then recreating it. We can find if a table is defined by accessing the database metadata through the MTable object. To get a list of tables with name matching a certain pattern, we can run MTable.getTables(pattern):

scala> import slick.jdbc.meta.MTable
import slick.jdbc.meta.MTable

scala> db.withSession { implicit session =>
  MTable.getTables("transactions").list
}
List[scala.slick.jdbc.meta.MTable] = List(MTable(MQName(fec.transactions),TABLE,,None,None,None) ...)

Thus, to drop the transactions table if it exists, we can run the following:

scala> db.withSession { implicit session =>
  if(MTable.getTables("transactions").list.nonEmpty) {
    Tables.transactions.ddl.drop
  }
}

The MTable instance contains a lot of metadata about the table. Go ahead and recreate the transactions table if you dropped it in the previous example. Then, to find information...

Slick versus JDBC


This chapter and the previous one introduced two different ways of interacting with SQL. In the previous chapter, we described how to use JDBC and build extensions on top of JDBC to make it more usable. In this chapter, we introduced Slick, a library that provides a functional interface on top of JDBC.

Which method should you choose? If you are starting a new project, you should consider using Slick. Even if you spend a considerable amount of time writing wrappers that sit on top of JDBC, it is unlikely that you will achieve the fluidity that Slick offers.

If you are working on an existing project that makes extensive use of JDBC, I hope that the previous chapter demonstrates that, with a little time and effort, you can write JDBC wrappers that reduce the impedance between the imperative style of JDBC and Scala's functional approach.

Summary


In the previous two chapters, we looked extensively at how to query relational databases from Scala. In this chapter, you learned how to use Slick, a "functional-relational" mapper that allows interacting with SQL databases as one would with Scala collections.

In the next chapter, you will learn how to ingest data by querying web APIs.

References


To learn more about Slick, you can refer to the Slick documentation (http://slick.typesafe.com/doc/2.1.0/) and its API documentation (http://slick.typesafe.com/doc/2.1.0/api/#package).

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Scala for Data Science
Published in: Jan 2016Publisher: ISBN-13: 9781785281372
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Pascal Bugnion

Pascal Bugnion is a data engineer at the ASI, a consultancy offering bespoke data science services. Previously, he was the head of data engineering at SCL Elections. He holds a PhD in computational physics from Cambridge University. Besides Scala, Pascal is a keen Python developer. He has contributed to NumPy, matplotlib and IPython. He also maintains scikit-monaco, an open source library for Monte Carlo integration. He currently lives in London, UK.
Read more about Pascal Bugnion