Packt+ | Advance your knowledge in tech

You're reading from Learning Jupyter

Product typeBook

Published inNov 2016

Reading LevelIntermediate

PublisherPackt

ISBN-139781785884870

Edition1st Edition

Languages

Python

Tools

Jupyter

Concepts

Data Analysis

Author (1)

Dan Toomey

Chapter 9. Jupyter Scala

The Scala language has become very popular. It is built on top of Java, so it has full interoperability, including resorting to inline Java in your Scala code. However, the syntax is much cleaner and intuitive, reworking some of the quirks in Java.

In this chapter, we will cover the following topics:

Installing Scala for Jupyter
Using Scala's features

Installing the Scala kernel

There is currently no process for installing the Scala kernel in a Windows environment. I'm not sure why. I expect this to change over time.

The steps for Mac OS/X are given here (taken from https://developer.ibm.com/hadoop/2016/05/04/install-jupyter-notebook-spark):

Install GIT using this:
```
yum install git
```

Copy the Scala package locally:

git clone https://github.com/alexarchambault/jupyter-scala.git

Install the sbt build tool by running this:
```
sudo yum install sbt
```
Move to the Scala package directory:
```
cd jupyter-scala
```
Build the package:
```
sbt cli/packArchive
```
To launch the Scala shell, use this command:
```
./jupyter-scala
```
Check the kernels installed by running this command: (you should see Scala in the list now):
```
 jupyter kernelspec list
```
Launch the Jupyter Notebook:
```
jupyter notebook
```
You can now choose to use a Scala 2.11 shell.
At this point, if you start Jupyter, you will see the choice for Scala listed:

If we create a Scala notebook, we end up with the familiar layout...

Scala data access in Jupyter

There is a copy of the iris dataset on the University of California, Irvine website at https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data . We will access this data and perform some simpler statistics on the same.

The Scala code is as follows:

import scala.io.Source;
//copied file locally https://archive.ics.uci.edu/ml/
      machine-learning-databases/iris/iris.data
val filename = "iris.data"

//DEBUGGING Uncomment this line to display more information -
println("SepalLength, SepalWidth, PetalLength, PetalWidth, Class");
val array = scala.collection.mutable.ArrayBuffer.empty[Float]
for (line <- Source.fromFile(filename).getLines) {
    var cols = line.split(",").map(_.trim);
    //println(s"${cols(0)}|${cols(1)}|${cols(2)}|
          ${cols(3)} |${cols(4)}");
    val i = cols(0).toFloat
    array += i;
}
val count = array.length;
var min:Double = 9999.0;
var max:Double = 0.0;
var total:Double = 0.0;
for ( x <- array ) {
    if (x...

Scala array operations

Scala does not have pandas, but we can emulate some of that logic with our own coding. We will use the same Titanic dataset used in Chapter 2 , Jupyter Python Scripting, from http://www.kaggle.com/c/titanic-gettingStarted/download/train.csv, which we have downloaded in our local space.

We can then use similar coding as was used in Chapter 2 , Jupyter Python Scripting, on pandas:

import scala.io.Source;
val filename = "train.csv"
//PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,
      Parch,Ticket,Fare,Cabin,Embarked
//1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
var males = 0
var females = 0
var males_survived = 0
var females_survived = 0
for (line <- Source.fromFile(filename).getLines) {
    var cols = line.split(",").map(_.trim);
    var sex = cols(5);
    if (sex == "male") { 
        males = males + 1;
        if (cols(1).toInt == 1) {
            males_survived = males_survived + 1;
        }
    }
    if (sex == "female") { 
        females...

Scala random numbers in Jupyter

In this example, we simulate a rolling dice and count how many times each combination appears. We then present a simple histogram for illustrative purposes.

The script is as follows:

val r = new scala.util.Random
r.setSeed(113L)
val samples = 1000
var dice = new Array[Int](12)
for( i <- 1 to samples){
    var total = r.nextInt(6) + r.nextInt(6)
    dice(total) = dice(total) + 1
}
val max = dice.reduceLeft(_ max _)
for( i <- 0 to 11) {
    var str = ""
    for( j <- 1 to dice(i)/3) {
        str = str + "X"
    }
    print(i+1, str, "\n")
}

We first pull in the Scala random library. We set the seed (in order to have repeatable results). We are drawing 1,000 rolls. For each roll, we increment a counter of how many times the total of pips on die 1 and die 2 appear. Then we present an abbreviated histogram of the results.

Scala has a number of shortcut methods for quick scanning through a list/collection, as seen in the reduceLeft(_ max _) statement. We...

Scala closures

A closure is a function. The resultant function value depends on the value of the variable(s) declared outside the function.

We can use this small script to illustrate it:

var factor = 7
val multiplier = (i:Int) => i * factor
val a = multiplier(11)
val b = multiplier(12)

We define a function named multiplier. The function expects an integer argument. For each argument, we take the argument and multiply it by the external variable factor.

We see this result:

Scala higher-order functions

A higher-order function either takes other functions as arguments or returns a function as its result.

We can use this example script:

def squared(x: Int): Int = x * x
def cubed(x: Int): Int = x * x * x
def process(a: Int, processor: Int => Int): Int = {processor(a) }
val fiveSquared = process(5, squared)
val sevenCubed = process(7, cubed)

We define two functions; one squares the number passed and the other cubes the number passed.

Next, we define the higher-order function that takes the number to work on and the processor to apply.

Lastly, we call each one. For example, we call process() with 5 and the squared() function. The process() function passes the 5 to the squared() function and returns the result:

We take advantage of the Scala's engine automatically printing out variable values to see the result expected.

Scala pattern matching

Scala has very useful, built-in pattern matching. Pattern matching can be used to test for exact and/or partial matches of entire values, parts of objects, and so on; you name it!

We can use this sample script for reference:

def matchTest(x: Any): Any = x match {
  case 7 => "seven"
  case "two" => 2
  case _ => "something"
}
val isItTwo = matchTest("two")
val isItTest = matchTest("test")
val isItSeven = matchTest(7)

We define a function called matchTest. It takes any kind of argument and can return any type of result (not sure if that is real-life programming!).

The keyword of interest is match. This means the function will walk down the list of choices until it gets a match on the value x passed and then returns it.

As you can see, we have numbers and strings as input and output.

The last case statement is a wildcard, catchall-if the code gets that far, it will match any argument.

We can see the output here:

Scala case classes

A case class is a simplified type that can be used without calling out new Classname(..). For example, we could have this script, which defines a case class and uses it:

case class Car(brand: String, model: String)
val buickLeSabre = Car("Buick", "LeSabre")

So, we have a case class called Car. We make an instance of that class called buickLeSabre.

Case classes are most useful for pattern matching since we can easily construct complex objects and examine their contents. Here's an example:

def carType(car: Car) = car match {
  case Car("Honda", "Accord") => "sedan"
  case Car("GM", "Denali") => "suv"
  case Car("Mercedes", "300") => "luxury"
  case Car("Buick", "LeSabre") => "sedan"
  case _ => "Car: is of unknown type"
}
val typeOfBuick = carType(buickLeSabre)

We define a pattern match block (as in the previous section of this chapter). In the match, we look at a Car object that has brand = GM, model = Denali, and so on. For each of the models of interest...

Scala immutability

Immutable means you cannot change something. In Scala, all variables are immutable unless specifically marked otherwise. This is the opposite of languages such as Java, where all variables are mutable unless specifically marked otherwise.

In Java, we can have the following function:

public void calculate(integer amount) { 
}

We can modify the value of amount inside the calculate function. We can tell Java not to allow changing the value if we use the final keyword:

public void calculate(final integer amount) { 
}

Whereas in Scala, the similar routine is as follows:

def calculate (amount: Int): Int = { 
  amount = amount + 1;
  return amount;
}

The preceding code leaves the value of the amount variable as it was before the routine was called.

We can see in the display that even though balance is a variable (marked as var), Scala will not allow you to change its value inside of the function.

Scala collections

In Scala, collections are automatically mutable or immutable depending on your usage. All collections in scala.collections.immutable are immutable, and vice versa for scala.collections.immutable. Scala picks immutable collections by default, so your code will then draw automatically from the mutable collections:

var List mylist;

This happens unless you prefix your variable with immutable:

var mylist immutable.List;

We can see this in this small amount of code, for example:

var mutableList = List(1, 2, 3);
var immutableList = scala.collection.immutable.List(4, 5, 6);
mutableList.updated(1,400);
immutableList.updated(1,700);

As you can see in this screenshot of the notebook:

Note that Scala cheated a little here; it created a new collection when we updated immutableList, as you can see, with the variable name as real_3 instead.

Named arguments

Scala allows you to specify parameter assignment by name rather than just ordinal position. For example, we can have this code:

def divide(dividend:Int, divisor:Int): Float = 
{ dividend.toFloat / divisor.toFloat }
divide(40, 5)
divide(divisor = 40, dividend = 5)

If we run this in a notebook, we can see the results:

The first call is to divide assigned parameters by position. The second call set them accordingly.

Scala traits

A trait in Scala defines a set of features that can be implemented by classes. A trait is similar to an interface in Java.

A trait can be partially implemented, forcing the user (class) of the trait to implement the details.

For example, we can have this code:

trait Color {
    def isRed(): Boolean
}
class Red extends Color {
    def isRed() = true
}
class Blue extends Color {
    def isRed() = false
}
var red = new Red();
var blue = new Blue();
red.isRed()
blue.isRed()

The code creates a trait called Color with one partially implemented function, isRed. So, every class that uses Color will have to implement isRed().

We then implement two classes, Red and Blue, that extend the Color trait (this is the Scala syntax for using a trait). Since the isRed() function is partially implemented, both classes have to provide implementations for the trait function.

We can see how this operates in the following screenshot of the notebook display:

We see (in the output section at the bottom...

Summary

In this chapter, we installed Scala for Jupyter. We used Scala coding to access data sets. We also saw how Scala can manipulate arrays. And we generated random numbers in Scala. There were examples of higher-order functions and pattern matching. We used case classes, saw examples of immutability in Scala, built collections using Scala packages, and looked at Scala traits.

In the next chapter, we will be looking at using big data in Jupyter.

The rest of the chapter is locked

You have been reading a chapter from

Learning Jupyter

Published in: Nov 2016Publisher: PacktISBN-13: 9781785884870

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Dan Toomey

Dan Toomey has been developing application software for over 20 years. He has worked in a variety of industries and companies, in roles from sole contributor to VP/CTO-level. For the last few years, he has been contracting for companies in the eastern Massachusetts area. Dan has been contracting under Dan Toomey Software Corp. Dan has also written R for Data Science, Jupyter for Data Sciences, and the Jupyter Cookbook, all with Packt.
Read more about Dan Toomey

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages