Reader small image

You're reading from  Learning Jupyter

Product typeBook
Published inNov 2016
Reading LevelIntermediate
PublisherPackt
ISBN-139781785884870
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Dan Toomey
Dan Toomey
author image
Dan Toomey

Dan Toomey has been developing application software for over 20 years. He has worked in a variety of industries and companies, in roles from sole contributor to VP/CTO-level. For the last few years, he has been contracting for companies in the eastern Massachusetts area. Dan has been contracting under Dan Toomey Software Corp. Dan has also written R for Data Science, Jupyter for Data Sciences, and the Jupyter Cookbook, all with Packt.
Read more about Dan Toomey

Right arrow

Chapter 9.  Jupyter Scala

The Scala language has become very popular. It is built on top of Java, so it has full interoperability, including resorting to inline Java in your Scala code. However, the syntax is much cleaner and intuitive, reworking some of the quirks in Java.

In this chapter, we will cover the following topics:

  • Installing Scala for Jupyter

  • Using Scala's features

Installing the Scala kernel


There is currently no process for installing the Scala kernel in a Windows environment. I'm not sure why. I expect this to change over time.

The steps for Mac OS/X are given here (taken from https://developer.ibm.com/hadoop/2016/05/04/install-jupyter-notebook-spark):

  1. Install GIT using this:

    yum install git
    
  2. Copy the Scala package locally:

    git clone https://github.com/alexarchambault/jupyter-scala.git
    
  3. Install the sbt build tool by running this:

    sudo yum install sbt
    
  4. Move to the Scala package directory:

    cd jupyter-scala
    
  5. Build the package:

    sbt cli/packArchive
    
  6. To launch the Scala shell, use this command:

    ./jupyter-scala
    
  7. Check the kernels installed by running this command: (you should see Scala in the list now):

     jupyter kernelspec list
    
  8. Launch the Jupyter Notebook:

    jupyter notebook
    
  9. You can now choose to use a Scala 2.11 shell.

    At this point, if you start Jupyter, you will see the choice for Scala listed:

If we create a Scala notebook, we end up with the familiar layout...

Scala data access in Jupyter


There is a copy of the iris dataset on the University of California, Irvine website at https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data . We will access this data and perform some simpler statistics on the same.

The Scala code is as follows:

import scala.io.Source;
//copied file locally https://archive.ics.uci.edu/ml/
      machine-learning-databases/iris/iris.data
val filename = "iris.data"

//DEBUGGING Uncomment this line to display more information -
println("SepalLength, SepalWidth, PetalLength, PetalWidth, Class");
val array = scala.collection.mutable.ArrayBuffer.empty[Float]
for (line <- Source.fromFile(filename).getLines) {
    var cols = line.split(",").map(_.trim);
    //println(s"${cols(0)}|${cols(1)}|${cols(2)}|
          ${cols(3)} |${cols(4)}");
    val i = cols(0).toFloat
    array += i;
}
val count = array.length;
var min:Double = 9999.0;
var max:Double = 0.0;
var total:Double = 0.0;
for ( x <- array ) {
    if (x...

Scala array operations


Scala does not have pandas, but we can emulate some of that logic with our own coding. We will use the same Titanic dataset used in Chapter 2 , Jupyter Python Scripting, from http://www.kaggle.com/c/titanic-gettingStarted/download/train.csv, which we have downloaded in our local space.

We can then use similar coding as was used in Chapter 2 , Jupyter Python Scripting, on pandas:

import scala.io.Source;
val filename = "train.csv"
//PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,
      Parch,Ticket,Fare,Cabin,Embarked
//1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
var males = 0
var females = 0
var males_survived = 0
var females_survived = 0
for (line <- Source.fromFile(filename).getLines) {
    var cols = line.split(",").map(_.trim);
    var sex = cols(5);
    if (sex == "male") { 
        males = males + 1;
        if (cols(1).toInt == 1) {
            males_survived = males_survived + 1;
        }
    }
    if (sex == "female") { 
        females...

Scala random numbers in Jupyter


In this example, we simulate a rolling dice and count how many times each combination appears. We then present a simple histogram for illustrative purposes.

The script is as follows:

val r = new scala.util.Random
r.setSeed(113L)
val samples = 1000
var dice = new Array[Int](12)
for( i <- 1 to samples){
    var total = r.nextInt(6) + r.nextInt(6)
    dice(total) = dice(total) + 1
}
val max = dice.reduceLeft(_ max _)
for( i <- 0 to 11) {
    var str = ""
    for( j <- 1 to dice(i)/3) {
        str = str + "X"
    }
    print(i+1, str, "\n")
}

We first pull in the Scala random library. We set the seed (in order to have repeatable results). We are drawing 1,000 rolls. For each roll, we increment a counter of how many times the total of pips on die 1 and die 2 appear. Then we present an abbreviated histogram of the results.

Scala has a number of shortcut methods for quick scanning through a list/collection, as seen in the reduceLeft(_ max _) statement. We...

Scala closures


A closure is a function. The resultant function value depends on the value of the variable(s) declared outside the function.

We can use this small script to illustrate it:

var factor = 7
val multiplier = (i:Int) => i * factor
val a = multiplier(11)
val b = multiplier(12)

We define a function named multiplier. The function expects an integer argument. For each argument, we take the argument and multiply it by the external variable factor.

We see this result:

Scala higher-order functions


A higher-order function either takes other functions as arguments or returns a function as its result.

We can use this example script:

def squared(x: Int): Int = x * x
def cubed(x: Int): Int = x * x * x
def process(a: Int, processor: Int => Int): Int = {processor(a) }
val fiveSquared = process(5, squared)
val sevenCubed = process(7, cubed)

We define two functions; one squares the number passed and the other cubes the number passed.

Next, we define the higher-order function that takes the number to work on and the processor to apply.

Lastly, we call each one. For example, we call process() with 5 and the squared() function. The process() function passes the 5 to the squared() function and returns the result:

We take advantage of the Scala's engine automatically printing out variable values to see the result expected.

Scala pattern matching


Scala has very useful, built-in pattern matching. Pattern matching can be used to test for exact and/or partial matches of entire values, parts of objects, and so on; you name it!

We can use this sample script for reference:

def matchTest(x: Any): Any = x match {
  case 7 => "seven"
  case "two" => 2
  case _ => "something"
}
val isItTwo = matchTest("two")
val isItTest = matchTest("test")
val isItSeven = matchTest(7)

We define a function called matchTest. It takes any kind of argument and can return any type of result (not sure if that is real-life programming!).

The keyword of interest is match. This means the function will walk down the list of choices until it gets a match on the value x passed and then returns it.

As you can see, we have numbers and strings as input and output.

The last case statement is a wildcard, catchall-if the code gets that far, it will match any argument.

We can see the output here:

Scala case classes


A case class is a simplified type that can be used without calling out new Classname(..). For example, we could have this script, which defines a case class and uses it:

case class Car(brand: String, model: String)
val buickLeSabre = Car("Buick", "LeSabre")

So, we have a case class called Car. We make an instance of that class called buickLeSabre.

Case classes are most useful for pattern matching since we can easily construct complex objects and examine their contents. Here's an example:

def carType(car: Car) = car match {
  case Car("Honda", "Accord") => "sedan"
  case Car("GM", "Denali") => "suv"
  case Car("Mercedes", "300") => "luxury"
  case Car("Buick", "LeSabre") => "sedan"
  case _ => "Car: is of unknown type"
}
val typeOfBuick = carType(buickLeSabre)

We define a pattern match block (as in the previous section of this chapter). In the match, we look at a Car object that has brand = GM, model = Denali, and so on. For each of the models of interest...

Scala immutability


Immutable means you cannot change something. In Scala, all variables are immutable unless specifically marked otherwise. This is the opposite of languages such as Java, where all variables are mutable unless specifically marked otherwise.

In Java, we can have the following function:

public void calculate(integer amount) { 
} 

We can modify the value of amount inside the calculate function. We can tell Java not to allow changing the value if we use the final keyword:

public void calculate(final integer amount) { 
} 

Whereas in Scala, the similar routine is as follows:

def calculate (amount: Int): Int = { 
  amount = amount + 1;
  return amount;
}

The preceding code leaves the value of the amount variable as it was before the routine was called.

We can see in the display that even though balance is a variable (marked as var), Scala will not allow you to change its value inside of the function.

Scala collections


In Scala, collections are automatically mutable or immutable depending on your usage. All collections in scala.collections.immutable are immutable, and vice versa for scala.collections.immutable. Scala picks immutable collections by default, so your code will then draw automatically from the mutable collections:

var List mylist;

This happens unless you prefix your variable with immutable:

var mylist immutable.List;

We can see this in this small amount of code, for example:

var mutableList = List(1, 2, 3);
var immutableList = scala.collection.immutable.List(4, 5, 6);
mutableList.updated(1,400);
immutableList.updated(1,700);

As you can see in this screenshot of the notebook:

Note that Scala cheated a little here; it created a new collection when we updated immutableList, as you can see, with the variable name as real_3 instead.

Named arguments


Scala allows you to specify parameter assignment by name rather than just ordinal position. For example, we can have this code:

def divide(dividend:Int, divisor:Int): Float = 
{ dividend.toFloat / divisor.toFloat }
divide(40, 5)
divide(divisor = 40, dividend = 5)

If we run this in a notebook, we can see the results:

The first call is to divide assigned parameters by position. The second call set them accordingly.

Scala traits


A trait in Scala defines a set of features that can be implemented by classes. A trait is similar to an interface in Java.

A trait can be partially implemented, forcing the user (class) of the trait to implement the details.

For example, we can have this code:

trait Color {
    def isRed(): Boolean
}
class Red extends Color {
    def isRed() = true
}
class Blue extends Color {
    def isRed() = false
}
var red = new Red();
var blue = new Blue();
red.isRed()
blue.isRed()

The code creates a trait called Color with one partially implemented function, isRed. So, every class that uses Color will have to implement isRed().

We then implement two classes, Red and Blue, that extend the Color trait (this is the Scala syntax for using a trait). Since the isRed() function is partially implemented, both classes have to provide implementations for the trait function.

We can see how this operates in the following screenshot of the notebook display:

We see (in the output section at the bottom...

Summary


In this chapter, we installed Scala for Jupyter. We used Scala coding to access data sets. We also saw how Scala can manipulate arrays. And we generated random numbers in Scala. There were examples of higher-order functions and pattern matching. We used case classes, saw examples of immutability in Scala, built collections using Scala packages, and looked at Scala traits.

In the next chapter, we will be looking at using big data in Jupyter.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learning Jupyter
Published in: Nov 2016Publisher: PacktISBN-13: 9781785884870
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Dan Toomey

Dan Toomey has been developing application software for over 20 years. He has worked in a variety of industries and companies, in roles from sole contributor to VP/CTO-level. For the last few years, he has been contracting for companies in the eastern Massachusetts area. Dan has been contracting under Dan Toomey Software Corp. Dan has also written R for Data Science, Jupyter for Data Sciences, and the Jupyter Cookbook, all with Packt.
Read more about Dan Toomey