Scientific Computing with Scala

3.8 (5 reviews total)
By Vytautas Jančauskas
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Introducing Scientific Computing with Scala

About this book

Scala is a statically typed, Java Virtual Machine (JVM)-based language with strong support for functional programming. There exist libraries for Scala that cover a range of common scientific computing tasks – from linear algebra and numerical algorithms to convenient and safe parallelization to powerful plotting facilities. Learning to use these to perform common scientific tasks will allow you to write programs that are both fast and easy to write and maintain.

We will start by discussing the advantages of using Scala over other scientific computing platforms. You will discover Scala packages that provide the functionality you have come to expect when writing scientific software. We will explore using Scala's Breeze library for linear algebra, optimization, and signal processing. We will then proceed to the Saddle library for data analysis. If you have experience in R or with Python's popular pandas library you will learn how to translate those skills to Saddle. If you are new to data analysis, you will learn basic concepts of Saddle as well. Well will explore the numerical computing environment called ScalaLab. It comes bundled with a lot of scientific software readily available. We will use it for interactive computing, data analysis, and visualization. In the following chapters, we will explore using Scala's powerful parallel collections for safe and convenient parallel programming. Topics such as the Akka concurrency framework will be covered. Finally, you will learn about multivariate data visualization and how to produce professional-looking plots in Scala easily. After reading the book, you should have more than enough information on how to start using Scala as your scientific computing platform

Publication date:
April 2016
Publisher
Packt
Pages
232
ISBN
9781785886942

 

Chapter 1. Introducing Scientific Computing with Scala

Scala was first publicly released in 2004 by Martin Odersky, then working at École Polytechnique Fédérale de Lausanne in Switzerland. Odersky took part in designing the current generation of the Java compiler javac as well. Scala programs, when compiled, run on the Java Virtual Machine (JVM). Scala is the most popular of all the JVM languages (except for Java.) Like Java, Scala is statically typed. From the perspective of a programmer, this means that variable types will have to be declared (unless they can be inferred by the compiler) and they cannot change during the execution of the program. This is in contrast to dynamic languages, such as Python, where you don't have to specify a variable's type and can assign anything to any variable at runtime. Unlike Java, Scala has strong support for functional programming. Scala draws inspiration from languages such as Haskell, Erlang, and others in this regard.

In this chapter, we will talk about why you would want to use Scala as your primary scientific computing environment. We will consider the advantages it has over other popular programming languages that are used in the scientific computing context. We will then go over Scala packages meant specifically for scientific computing. These will be considered briefly and will be divided into categories depending on what they are used for. Some of these we will consider in detail in later chapters.

Finally, we provide a small introduction on best practices for how to structure, build, test, and distribute your Scala software. This is important even to people who know how to program in Scala already. This chapter introduces concepts that I consider essential, to write scientific software successfully. They are, however, often overlooked by scientists. The reason is that scientists often don't concern themselves with software development techniques; instead, they prefer to get the job done quickly. For example, it is not uncommon to neglect build systems, which are very important no matter what language you are using and are essential when writing software in statically typed, compiled languages. Testing is another area of software development criminally overlooked by the scientists I have had the pleasure to work with. The same is usually true of IDE's, debuggers, and profilers. All of these following topics will be discussed:

  • Why Scala for scientific computing?

  • Numerical computing packages for Scala

  • Data analysis packages for Scala

  • Other scientific software

  • Alternatives for carrying out plotting

  • Using Emacs as the Scala IDE

  • Profiling Scala code

  • Debugging Scala code

  • Building, testing, and distributing your Scala software

  • Mixing Java and Scala code

 

Why Scala for scientific computing?


This book assumes a basic familiarity with the Scala language. If you do not know Scala but are interested in writing your scientific code in it, you should consider getting a companion book that teaches the basics of the language. Any nontrivial topics will be explained, but we do not provide an introduction to any of the basic Scala programming concepts here. We will assume that you have Scala installed and you have your favorite IDE, or at least your favorite text editor setup to write Scala programs. If not, we introduce using Emacs as a Scala IDE. It would also be of benefit to you if you are already familiar with other popular scientific computing systems.

A lot of the topics in the book will be far easier to understand and put to good use if you already know how to do the things in question in other systems: we will be covering functionality that is similar to the MATLAB interactive computing environment, NumPy scientific computing package for the Python programming language, pandas data analysis library for Python, statistical computing language R, and similar software. After reading the book, you will hopefully be able to get all the functionality of the aforementioned software from Scala and more!

What are the advantages compared to C/C++/Java?

One obvious advantage to using Scala is that it is a Java virtual machine language. It is one among several, including Clojure, Groovy, Jython, JRuby, and of course Java itself. This means that, after writing your program, you compile it to a Java virtual machine bytecode that is then executed by the Java virtual machine interpreter. Think of the bytecode as the machine code of a virtual computer. When you write programs in C/C++ and similar compiled languages, they are translated straight to machine code that you then execute directly on your computer's processor. If you want to then run it on a different processor, you would have to recompile your program. Since the Java virtual machine runs on many different computer architectures, this is no longer necessary.

After you compile your program, the resulting bytecode can then be run on any system that can run a Java virtual machine. Therefore, your compiled code is portable. This is one advantage to using Scala as opposed to C/C++. Why not just write your program in Java then? Java is designed for writing large software in teams consisting of many programmers of varying skill levels. As such, it is an incredibly bureaucratic language. Quickly realizing your ideas in Java is difficult. This is because, even in the simplest cases, there is a lot of boilerplate code involved. The language is designed to slow you down and force you to do things by the book, designing the software before you start writing it.

Scala has the additional advantage of interactivity. It is easy to write and run small Scala programs, test them interactively, make changes, and then repeat. This is essential for scientific code, where a lot of the time you are testing an idea out and you want to do it quickly so that, if it does not work out, you can move on to another idea. As an added bonus, you can use any of the many Java libraries from Scala with ease! Since Java is very widely used in the industry, it contains a plethora of libraries for various purposes. These can be accessed from any JVM-based language. Most often, new functional programming languages don't share this advantage (since they are not JVM-based).

Scala also has strong support for functional programming. Functional programming treats programming as the evaluation of mathematical functions and avoids changing variable state explicitly. This leads to a declarative programming style where (ideally) the intention, rather than an explicit procedure, is given by the programmer. This (partially) eliminates the need for side effects—changes in the program state. Eliminating side effects leads to a programming style that is less error-prone and makes it easier to understand and predict program behavior. This has important consequences such as the easy and automatic parallelization of programs, program verification, and so on.

Parallelization is becoming more important with the increasing number of CPU cores in computers. Parallel programming in imperative languages involves a lot of very subtle issues that few programmers fully understand. So, there is hope that functional programming can help in this regard.

Pure functional programming often feels restrictive to programmers who are used to the more common imperative style. As a consequence of this, Scala supports both programming styles. You can start programming more or less as you would in Java and slowly incorporate more advanced features of the language into your programs. This removes a lot of the seemingly intimidating nature of functional programming since the concepts can be incorporated when needed and where they fit best. This may annoy functional programming purists, but is great for the more pragmatically minded.

Here is a small code segment that compares Java and Scala code, which takes an array, squares the elements, and adds them together. This is not an uncommon pattern (in one form or another) in numerical code. This will serve as a small example of Scala's conciseness compared to Java. This is by no means a proof of how Scala is more concise compared to Java, but the perception that it is is very often true.

Scala code:

val arr = (0 until 10).toArray
arr.map(x => x * x).reduce(_+_)

Java code:

int arr[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
int result = 0;
for (int a: arr) {
  result += a * a;
}

In the Scala version, we think descriptively in functions that are applied to the array to get the result we want. These functions take other functions as arguments. In the Java version, we think imperatively—what actions have to be taken to get the result we need?

What are the advantages compared to MATLAB/Python/R?

You may object that a lot of what has been said earlier can also be said of languages such as Python, MATLAB, R, or any other interpreted language. They all support different programming styles; when you write your program in, say Python, it will run on any platform that can run a Python interpreter. So, why not just use those? Well, one answer is execution speed. Many will object, and have objected before, that speed is not their primary concern when writing scientific software. That is true but only up to a point. It isn't easy to convince people of this, but I have grown convinced of this myself. The usual workflow in languages using dynamic typing is outlined here:

  • Prototype your numerically intensive code in your favorite language.

  • Note that it will take 3 years to complete a single run of the program in its current state.

  • Use a profiler to identify bottlenecks. The profiler indicates that everything is a bottleneck.

  • Rewrite the most performance critical parts in C or C++.

  • Battle the foreign function interface or whatever other method your language provides for calling C/C++ functions.

  • End up with a C/C++ program wrapped in a couple of lines of your favorite programming language.

The aforementioned is obviously a caricature. However, the important point is that the process described here adds at least two extra nontrivial stages to the already complicated process of writing software; not any software, but software you usually have no clear specifications for. On top of that, you are often not sure if what you are doing is sensible (which is the case for most scientific software in my experience).

The two extra stages are using the profiler, which is a tool that identifies portions of the code your program spends time in, and embedding code written in C/C++ (or some other statically typed language) in your program. People often will use a profiler on programs written in languages such as C++ or Java as well. But the reason for using it is usually that you want to squeeze the last few drops of performance out of it and not just make the software usable. The result of this is that all of the advantages of your nice dynamically typed programming language are reduced to nothing.

These advantages are supposed to be the speed of development and being able to make changes quickly. However, you end up spending time profiling software, rewriting nice bits of your code into ugly efficient bits, and finally just writing most of the thing in C. None of this sounds or is fun. There are workarounds to this, but why would you be content with this procedure? Why can't you just write your program and have it behave sensibly from the very start? Some will object that you should not be using a programming language such as Python for performance-critical code. This is true. However, most people learn one language, get used to its libraries, and will tend to write all their code in it.

You may very well end up with something that is not usable without a lot of extra effort this way. Using languages designed for the speed of execution (so called systems programming languages) is certainly possible. They, however, have many other disadvantages. The primary disadvantage is that prototyping is very hard in them. So is realizing your ideas quickly.

So, how does Scala help? Why is it faster than, say, Python; and by how much? Where does dynamic versus static typing come in to this? A simple way to see how much faster one language is when compared to another is to use some kind of benchmark suite. An interesting comparison is provided in the following website:

http://benchmarksgame.alioth.debian.org/

You can visit the website to make sure what is said here is true. In it, several different algorithms implemented in each different language are compared in terms of execution speed, memory use, and so on. Java is evaluated against C in the results. It can be seen that Java is comparable to C in terms of execution speed. Even though it is slower, it is usually not slower by much. In two cases out of eleven, it is actually faster. The comparisons that are more interesting are between Python 3 and Java, and Scala and Java.

Python is a very popular language in scientific computing. So, how does it stack up? In five cases out of eleven, it is actually around 40 times slower than Java. This is a lot. If your calculations take 10 minutes with Java, you would have to wait almost 7 hours, if you wrote them in Python (if we assume a linear relationship—a fair assumption in this case I feel). Scala is much better in this regard.

In most cases, its speed is compared to that of Java. This is good news, since Java is a fairly fast language. This means that you can write your code in the clearest way possible, and it will still work fast. If you want to squeeze some extra juice out of it, you can always profile it using one of the profiling tools we will discuss later. Would the same apply in the case of MATLAB and R? Well, the website does not benchmark those languages but one would imagine so. Those are both dynamic languages as well.

So what is a dynamic language and a static language? Why is one slower than the other? What other advantages or disadvantages are there in using one over the other? The simplest way to describe it is this: in a dynamically typed language, variables are bound to objects only, and in a statically typed language variables are bound to both object and type.

When you program in a dynamically typed language, you can usually assign anything you want to any variable. In a statically typed language, a variable's type is declared in advance (or in the case of Scala can be inferred from context).

In practice, it follows from this that the compiler can optimize the code much better, since it can use optimizations specific for that type. For example, this happens with Java's numeric types, where they are compiled to JVM arithmetic opcodes instead of more general method calls. In a dynamic language, the type often has to be determined at runtime and there are often other checks as well. All of this consumes CPU cycles.

Furthermore, calling functions and methods as well as accessing object attributes is much faster in static languages than in dynamic ones. A compiler is also capable of catching type errors. In a dynamic language such as Python, nothing prevents you from calling any method with any arguments on any object. This leads to problems since these errors are only caught at runtime.

It can easily happen that your program will fail near the end of a 2-hour run just because you forgot that you made changes to a method's argument list. In statically typed languages, these types of error will be caught at compile time. As an added bonus, good IDEs are easier to implement for static languages than for dynamic ones. This is because the code itself provides a lot of useful information that the IDE can use to provide functionality you expect from a modern IDE. This includes autocompletion, listing available methods for an object, and so on.

Let's recap what was said so far—the main advantage to using Scala for scientific code is that you can write what you mean, and it will usually work fast. There is no need for elaborate and often wonky strategies employed to optimize code in other languages. This will result in readable, easy-to-understand code, and you will not lose any of the advantages of dynamic languages.

Scala is quick to develop in and easy to understand. I think these are the main reasons why you should consider it as your main scientific computing language. This is especially true if you write your own numerical code or code that is generally fairly complex and where you can't rely on fast libraries to provide most of the functionality.

Scala does parallelism well

Parallel execution of code is very important in scientific computing. Often scientists want to model a certain physical phenomenon. Simulations of the physical world take a long time. Since the primary method of increasing computer performance is adding more CPU cores, parallelizing your algorithms is becoming the main way of reducing the amount of time it takes your program to do the things you want of it.

Another aspect of this is running code on supercomputers where algorithms are split up into several tasks that usually communicate by passing messages to each other. Programs written in imperative style are generally tricky to parallelize. Scala has strong support for functional programming. In general, the declarative nature of programs written in functional programming languages makes them easier to parallelize. The main reason for this is that functional programming languages avoid side effects.

Side effects are explicit changes to state, such as assigning to variables, writing to files, or devices. Avoiding side effects avoids common pitfalls in parallel programming such as, race conditions, deadlocks, and so on. While no technique avoids these problems completely, declarative programming languages are much better suited to handle these issues.

Scala supports parallel collections that make it easy to carry out concurrent calculations. Another option is to use the Akka toolkit that supports several ways of carrying out calculations in parallel. Both these options will be discussed in detail in the following chapters.

Any downsides?

There is currently one big downside to using Scala for scientific computing, and many would consider it a crucial one; there currently aren't many well-established packages for scientific computing available for it. While the core language is solid, without an established infrastructure of libraries, there is only so much you can do on your own.

The situation in this regard is a lot better in other systems. This is especially true of Python, which has more scientific computing libraries than you can shake a stick at. But, it is also true of MATLAB and others. Thus, is the nature of the vicious cycle of popularity—systems are popular because they have many libraries for doing different things, and they have many libraries because they are popular.

Scala isn't yet an established language in this regard. I believe, however, that it deserves to be. And, maybe this book will help it towards that goal. With enough people using Scala for scientific computing, we will eventually see more libraries developed and existing ones being better supported and more actively maintained.

 

Numerical computing packages for Scala


Let's now look through linear algebra and numerical computing software that is available for Scala. A linear algebra software package would involve being able to perform operations on matrices and vectors, solving linear systems of equations, finding the determinant, and performing other operations associated with the discipline of linear algebra in the field of mathematics.

MATLAB started as a linear algebra package and evolved into a whole programming language and interactive computing environment. The NumPy library for Python can do most of the things expected from a linear algebra package and a lot more. In this section, we will provide an overview of what is available in Scala in this regard. We will examine the packages briefly, tell you where to get them, how actively they are developed, and very briefly discuss the main functionality available in them.

Scalala

Scalala is a linear algebra package for Scala. It is currently not actively maintained, having been superseded by Breeze. It can be found at the following website:

https://github.com/scalala/Scalala

It is right now mostly of historic interest; however, Scalala has rich MATLAB-like operators on vectors and matrices and a library of numerical routines. It also has basic support for plotting.

Breeze

Breeze is the biggest and best maintained numerical computing library for Scala. It can be found at the ScalaNLP website:

http://www.scalanlp.org/

It is developed along with Epic and Puck, the former of which is a powerful statistical parser and the latter is a GPU-powered parser for natural languages. These two later libraries will be of less concern to us. Breeze, however, is a big part of this book. It provide functionality that is roughly equivalent to the famous and widely used NumPy package for Python. It is actively maintained and is likely to remain so in the near future.

Breeze is modeled on Scalala, which was mentioned previously. It supports all the matrix and vector operations you would expect. It provides a large number of probability distributions. It also provides routines for optimization and linear equation solving as well as routines for plotting. In a later chapter, we will introduce Breeze in detail and explain to readers how to do things they have grown accustomed to in other systems in Breeze.

ScalaLab

ScalaLab is a numerical computing environment aiming to replicate the functionality of MATLAB. The website is given here:

https://code.google.com/p/scalalab/

ScalaLab will be discussed in a section dedicated to it. It supports Scala-based scripting and is written mostly in Java with some speed-critical sections written in C/C++. It allows you to access the results of MATLAB scripts. It can use dozens of Scala and Java libraries for scientific computing. There is a basic support for plotting:

The ScalaLab window with a plotting example

 

Data analysis packages for Scala


By data analysis packages, we mean software designed for analyzing data in some way. A simple statistical regression would be an example. Software implementing machine-learning algorithms would be another example.

Saddle

Saddle is Scala's answer to R and Python's pandas package. It supports reading in structured data in a variety of different formats, including CSV and HDF5. The data can be loaded into frames and then manipulated as you would in other similar software. Statistical analysis can be performed, and you can build your own statistical analysis methods on top of the data structures provided by Saddle. Saddle is examined in detail in a separate chapter dedicated to it. It can be found at the following website:

https://saddle.github.io/

MLlib

Apache's MLlib library provides machine learning algorithms for the Spark platform. The library can be accessed from Scala as well as from Java and Python. It supports basic statistical methods for data analysis, various regression and classification methods, clustering via k-means, dimensionality reduction, and optimization methods. The number of algorithms in the library is constantly growing. The MLib library can be found at the following website:

http://spark.apache.org/mllib/

 

Other scientific software


Here, we include packages that don't really fit in either of the preceding categories. As in the other sections, we only provide packages that seem the most promising at the time of writing. The idea is to give the reader a sense of what's available in this regard in Scala.

FACTORIE

FACTORIE is a toolkit for deployable probabilistic modeling. It allows you to create probabilistic graphical models and perform inference. There are many applications for probabilistic graphical modeling including speech recognition, computer vision, natural language processing, and applications in bioinformatics. For more information on FACTORIE, refer to the following website

http://factorie.cs.umass.edu/

Cassovary

Cassovary is a large graph processing toolkit developed by Twitter. While there are other graph libraries for Scala, Cassovary allows one to work with graphs consisting of billions of vertices and edges in an efficient way. It powers the underlying Twitter infrastructure. Here is the link for Cassovary:

https://github.com/twitter/cassovary

Figaro

Figaro is a probabilistic programming language. It supports the development of rich probabilistic models and provides reasoning algorithms. These can be used to draw conclusions from evidence. Probabilistic reasoning is one of the foundational technologies behind machine learning. It has been used for stock price prediction, recommendation systems, image detection, and other machine learning tasks. For more information on Figaro, visit:

https://github.com/p2t2/figaro

 

Alternatives for doing plotting


There are many ways of doing plotting in Scala. We will discuss most of these in the chapters dedicated to plotting. For the time being, we will give a small summary. Breeze, Saddle, and ScalaLab all have basic support for plotting. Wisp stands out as a library worth mentioning:

https://github.com/quantifind/wisp

You can use Jzy3d for 3D plots:

http://jzy3d.org/

You can also use Java libraries for plotting. Among these, JFreeChart can be considered as being a roughly equivalent feature to libraries such as matplotlib:

http://www.jfree.org/jfreechart/

 

Using Emacs as the Scala IDE


If you haven't yet picked a Scala IDE to use, I would like to recommend the ENSIME mode for Emacs and other editors. You can find it at the following website:

https://github.com/ensime

Obviously installation instruction will be dependent on the kind of editor you are using. If using it with Emacs (a popular cross-platform editor), the installation process is described here. You can get Emacs from the Emacs website:

https://www.gnu.org/software/emacs/

There are a lot of books available on using it. It is completely open source and with ENSIME it makes probably the best open source IDE for Scala, in the author's opinion. After installing Emacs on your system, you can install ENSIME. To do this, first add the following lines to your .emacs file, which (on Linux) usually resides in your home directory. If you are not using Linux, consult the Emacs documentation for where the .emacs file is. The following code assumes a fairly recent version of Emacs. This was tested with Emacs 24:

(require 'package)
(add-to-list 'package-archives
             '("melpa" . "https://melpa.org/packages/"))
(when (< emacs-major-version 24)
  (add-to-list 'package-archives '("gnu" . "http://elpa.gnu.org/packages/")))
(package-initialize)
(require 'ensime)
(add-hook 'scala-mode-hook 'ensime-scala-mode-hook)

After adding this and restarting Emacs, you need to actually install ENSIME, which is fortunately really simple. Just use the M-x package-install key combination, press Return, then enter ensime, and press Return again. Note that M in Emacs stands for the Alt key. So the combination is Alt-x. However, it is customary to write it the way we did.

Scala integrates well with the SBT tool, which we will discuss in the next section. Instructions on how to integrate the two are given in it. Here we will just list some of the things that ENSIME can do for you to enable you to write code more efficiently and mistake-free. These include code completion, type inspection, automated refactoring, and code navigation. ENSIME will also highlight parts of your code that contain compilation errors. It is usually easy to tell from the markings where you went wrong.

The code completion feature will show you possible completions for the code you are currently typing. If it is a variable name, it will try to guess how to complete it so that you can enter it quickly. Also, you can press the Tab key after entering the object name and a dot (.). This will show you a list of all methods you can use for that object.

Type inspection allows you to see what will be the result of Scala's type inference mechanism. To see what type has been inferenced, simply use the C-c C-v i key combination. This means Ctrl-c Ctrl-v i in Emacs notation.

Automated refactoring features let you conveniently rename variables, without worrying that you may have forgotten some, and do other similar stuff. This is more useful for larger projects.

Code navigation features in ENSIME let you move around in the code by finding the definitions of symbols under the cursor. You can use M-. to jump to the definition of the object under the cursor.

The complete command reference for ENSIME can be found at:

https://github.com/ensime/ensime-emacs/wiki/Emacs-Command-Reference

 

Profiling Scala code


Profiling Scala code is trickier than one might expect. There are several JVM profilers you can use. However, you will lose some of the Scala abstraction when using them. They are primarily designed for use with Java. These include the VisualVM profiler that can be found at:

https://visualvm.java.net/

Another possible choice is Takipi, which can be found at the following web address:

https://www.takipi.com/

 

Debugging Scala code


Using a dedicated debugger (rather than a bunch of println statements) is sometimes the best way of figuring out where you went wrong when writing your program. Debugging is easy when using ENSIME. You can start the debugger with C-c C-d d, set a breakpoint on a line that the cursor is on with C-c C-d b, or remove a breakpoint with C-c C-d u. After you set up your breakpoints, you can start debugging with C-c C-d r. After your program stops at a breakpoint, you can use C-c C-d n to step to the next line and C-c C-d i to inspect the value of the symbol under the cursor. You can use C-c C-d t to show the current backtrace. Consult the ENSIME Emacs command reference for a few other important debugger features.

 

Building, testing, and distributing your Scala software


We are making the assumption that you are familiar with the Scala programming language in this book. Therefore, language concepts are not introduced. We want to present a convenient way of building, testing, and distributing your software. Even if you have already read several books on Scala and implemented some basic (or not so basic) programs, you may still benefit from this.

In this section, we will discuss how to build, test, and distribute your software using the SBT tool. You can get the tool from the SBT website www.scala-sbt.org. There, you will find instructions for how to set it up on whichever operating system you are using. We will only consider here how to use SBT to build and test your software. If you also want to use version control (and you should), then you should consult the documentation and tutorials for tools such as git.

SBT is an open source build tool similar to Java's Maven or Ant that lets you compile your Scala project, integrates with many popular Scala test frameworks, has dependency management functionality, integrates with the Scala interpreter, supports mixed Scala/Java projects, and much more.

For this tutorial, we will consider a simple interval arithmetic software package. This will serve as an example to showcase the capabilities of SBT and will also let you try your hand at creating a full (albeit simple) Scala library ready for distribution for other people's benefit. This small package will serve to illustrate the principles of building and testing software with SBT. It is a small library implementing an Interval class and operations that correspond to interval arithmetic for that class.

Interval arithmetic is a generalization of standard arithmetic rules designed to operate on intervals of values. It has many applications in science and engineering. For example, measuring rounding errors or in global optimization methods. While it is full of complex intricacies, at the base of it are some very simple ideas. An interval [a, b] is a range of values between a and b including a and b. Now what is the sum of two intervals [a, b] and [c, d]? We will define the sum of two intervals as the interval that the result of adding any number from the interval [a, b] to any number from the interval [c, d] will fall in to. This is simply the interval [a + c, b + d]. It is not difficult to convince oneself that this is so by considering that a and c are the smallest numbers from their respective intervals; thus, their sum is the smallest number in the resulting interval. The same goes for the upper bounds b and d.

Similarly for subtraction, we get that [a, b] minus [c, d] is equal to [a – d, b – c]; for multiplication, we get that [a, b] times [c, d] is equal to the interval [min(ac, ad, bc, bd), max(ac, ad, bc, bd)]; and finally for division, we get that [a, b] divided by [c, d] is equal to [min(a/c, a/d, b/c, b/d), max(a/c, a/d, b/c, b/d)]. Finally, we can define the relational operators as follows [a, b] < [c, d] if and only if b < c and similarly [a, b] > [c, d] if and only if a > d. Two intervals are considered equal if (and only if) a = c and b = d. Using this information lets us define a Scala class called Interval and define operations with the semantics that we discussed here. You should put all of the following code into a file called Interval.scala:

package org.intervals.intervals

import java.lang.ArithmeticException
class Interval(ac: Double, bc: Double) {
    var a: Double = ac
    var b: Double = bc
    if (a > b) {
        val tmp = a
        a = b
        b = tmp
    }

    def contains(x: Double): Boolean =
        x >= this.a && x <= this.b

    def contains(x: Interval): Boolean = 
        x.a >= this.a && x.b <= this.b

    def +(that: Interval): Interval =
        new Interval(this.a + that.a, this.b + that.b)

    def -(that: Interval): Interval =
        new Interval(this.a – that.b, this.b – that.a)

    def *(that: Interval) : Interval = {
        val all = List(this.a * that.a, this.a * that.b,
                       this.b * that.a, this.b * that.b)
        new Interval(all.min, all.max)
    }

    def /(that: Interval) : Interval = {
        if (that.contains(0.0)) {
            throw new ArithmeticException("Division by an interval containing zero")
        }
        val all = List(this.a / that.a, this.a / that.b,
                       this.b / that.a, this.b / that.b)
        new Interval(all.min, all.max)
    }

    def ==(that: Interval): Boolean =
        this.a == that.a && this.b == that.b

    def <(that: Interval): Boolean =
        this.b < that.a

    def >(that: Interval): Boolean =
        this.a > that.b
     
    override def toString(): String =
        "[" + this.a + ", " + this.b + "]"
}

This will create a new class called Interval, which can be constructed by specifying the interval limits. You can then add, subtract, multiply, and divide intervals using standard Scala syntax. We made sure that division by zero threw an exception if the user tries to divide by an interval containing zero. This is because division by zero is undefined, and it is not immediately clear what to do when the interval you divide by contains zero. To use it, you would would use Scala statements such as these:

val interval1 = new Interval(-0.5, 0.8)
val interval2 = new Interval(0.3, 0.5)
val interval3 = (interval1 + interval2) * interval3 / (interval1 – interval2)

Obviously, to start doing this, you have to make sure the program using it is able to find it. Let's explore how this can be achieved next.

Directory structure

The SBT tool expects you to follow a certain directory structure that is given here. If you put the appropriate files into specific directories, SBT will be able to automatically build and test your software without having to specify many details in the configuration files:

project/
    src/
        main/
            resources/
            scala/
            java/
        test/
            resources/
            scala/
            java/

For example, for our project, we will want to create a directory called intervals in which we then create the whole directory tree starting with src. Naturally, we will want to put our Interval.scala file inside the src/ main/ scala directory. There is, however, another thing to consider concerning the directory structure. You can follow the Java convention of structuring directories according to the package name. While this is mandatory in Java, it is only optional in Scala, but we will do it anyway. Because of that, our Interval.scala file ends up inside the src/main/scala/org/intervals/intervals directory.

We now need to tell SBT some basic things about our project. These include various bits of metadata such as the project name, version number, and the version of Scala we want to use. One nice thing about SBT is that it will download the Scala version you need for your project, whichever version you may already have installed on your system. Also, it has to know the root directory of our project. Let's now add the build.sbt file to the project. You need to put that file under the project/ directory of the main project file tree. In our case, we called the project directory intervals. For now, fill in the file with the following information:

lazy val commonSettings = Seq(
    organization := "org.intervals",
    name := "intervals",
    version := "0.0.1",
    scalaVersion := "2.11.4"
)

lazy val root = (project in file(".")).
    settings(commonSettings: _*)

Now, if we want to build the project using SBT, believe it or not nothing remains to be done. SBT will take advantage of the a priori known folder structure and look for files in expected places. Simply go to the project directory and issue the following command from the terminal:

$ sbt compile console

The preceding commands will first compile the Scala code and then put us into the Scala REPL. Alternatively, you can run SBT first and then type the compile and console commands into its command interpreter. After the Interval.scala file is compiled, you will be dropped in to the Scala REPL where you can start using your new class immediately. Let's try it out.

We need to import our new library first:

scala> import org.intervals.intervals.Interval
import org.intervals.intervals.Interval

Now, let's create a couple of Interval objects:

scala> val ab = new Interval(-3.0, 2.0)
ab: org.intervals.intervals.Interval = [-3.0, 2.0]

scala> val cd = new Interval(4.0, 7.0)
cd: org.intervals.intervals.Interval = [4.0, 7.0]

Now, let's test whether our newly defined interval arithmetic operations work as expected:

scala> ab + cd
res0: org.intervals.intervals.Interval = [1.0, 9.0]

scala> ab - cd
res1: org.intervals.intervals.Interval = [-10.0, -2.0]

scala> ab * cd
res2: org.intervals.intervals.Interval = [-21.0, 14.0]

scala> ab / cd
res3: org.intervals.intervals.Interval = [-0.75, 0.5]

And finally, let's test the relational operators. Again, these will test that our implementation follows the rules we described for partially ordering intervals:

scala> ab == cd
res4: Boolean = false

scala> ab < cd
res5: Boolean = true

scala> ab > cd
res6: Boolean = false

scala> ab contains 0.0
res7: Boolean = true

It seems that SBT successfully built and loaded our newly created software package. Now, if only there was some way to see if the software works correctly without having to type all that stuff in to the Scala console all the time!

Testing Scala code with the help of SBT

Testing code when you use SBT to build your Scala software is very easy. All you need to do is make sure SBT knows you need the testing framework and then type sbt compile test into the command line. To make sure SBT downloads and installs the testing framework of your choice, you need to add it to the build.sbt file that we discussed earlier. We recommend using ScalaTest, since it allows very simple testing, which is great for medium-sized software that most scientific computing packages are. It also has more advanced capabilities if you need them. To use ScalaTest, add the following line to the end of your build.sbt file:

libraryDependencies += "org.scalatest" %% "scalatest" % "2.2.0" % "test"

Use a higher version number than 2.2.0 if needed. This will pull in the testing classes as needed. Now, we will need to write the actual test code and put it into our src/ test/scala/ directory. We will be using the appropriately named FunSuite class for our tests. Let's call this file IntervalSuite.scala and put in the tests that follow. First, we want to import both the FunSuite and Interval classes, which we will be testing:

import org.scalatest.FunSuite
import org.intervals.intervals.Interval

class IntervalSuite extends FunSuite {

Testing with FunSuite is really simple. Just use test followed by description of the test and use assert in the body of the test that will fail if our program exhibits undesired behavior. In the following cases, we want to test if our newly defined interval arithmetic operations work according to interval arithmetic rules:

  test("interval addition should work according to interval arithmetic") {
    val interval1 = new Interval(0.1, 0.2)
    val interval2 = new Interval(1.0, 3.0)
    val sum = interval1 + interval2
    assert(sum.a == 1.1)
    assert(sum.b == 3.2)
  }

  test("interval subtraction should work according to interval arithmetic") {
    val interval1 = new Interval(0.1, 0.2)
    val interval2 = new Interval(1.0, 3.0)
    val sub = interval1 - interval2
    assert(sub.a == -2.9)
    assert(sub.b == -0.8)
  }

  test("inclusion should return true if a Double falls within the interval bounds") {
    val interval = new Interval(-1.0, 1.0)
    assert(interval.contains(0.0))
    assert(!interval.contains(2.0))
    assert(!interval.contains(-2.0))
  }

  test("interval multiplication should work according to interval arithmetic") {
    val interval1 = new Interval(-2.0, 4.0)
    val interval2 = new Interval(-3.0, -1.0)
    val mul = interval1 * interval2
    assert(mul.a == -12.0)
    assert(mul.b == 6.0)
  }

In the following test, we want to test if division works as expected. Division by an interval that contains zero is undefined for our simplified interval arithmetic system. As such, we want the division to signal an exception if the divisor interval contains zero. To do this, we employ the intercept statement. We specify there that we expect that dividing interval2 by interval1 will signal an ArithmeticException exception, which according to our implementation it should:

  test("interval division should work according to interval arithmetic") {
    val interval1 = new Interval(-2.0, 4.0)
    val interval2 = new Interval(-3.0, -1.0)
    intercept[ArithmeticException] {
      interval2 / interval1
    }
    val div = interval1 / interval2
    assert(div.a == -4.0)
    assert(div.b == 2.0)
  }

  test("equality operator should work according to interval arithmetic") {
    val interval1 = new Interval(-2.0, 4.0)
    assert(interval1 == interval1)
  }

  test("inequality operators should work according to interval arithmetic") {
    val interval1 = new Interval(-2.0, 4.0)
    val interval2 = new Interval(5.0, 6.0)
    assert(interval1 < interval2)
    assert(interval2 > interval1)
    assert(interval1 != interval2)
  }

Finally, we add one more test to be completely sure. All basic interval arithmetic operations are inclusion-isotonic. This means that, if the intervals are i1, i2, i3, and i4 and if i1 is fully contained within i3 and i2 is contained within i4, then the result of i1 op i2 is contained within the interval i3 op i4. Here, op is one of +, -, *, or / defined according to interval arithmetic rules:

  test("all basic interval arithmetic operations should be inclusion isotonic") {
    val interval1 = new Interval(2.0, 4.0)
    val interval2 = new Interval(2.5, 3.5)
    val interval3 = new Interval(1.0, 3.0)
    val interval4 = new Interval(1.5, 2.5)
    assert((interval1 + interval3).contains(interval2 + interval3))
    assert((interval1 - interval3).contains(interval2 - interval3))
    assert((interval1 * interval3).contains(interval2 * interval3))
    assert((interval1 / interval3).contains(interval2 / interval3))
  }
}

With the IntervalSuite.scala file put in the src/ test/ scala directory, testing our library is simple. Simply type in sbt compile test into the console window. The result will show all the tests passed and failed and the reasons for failure if any. Testing your scientific software becomes simple this way: just a matter of writing the tests and using SBT to run them.

ENSIME and SBT integration

You can take advantage of the SBT integration if you use the ENSIME mode for Emacs. To begin using it, you need to create the .ensime file in your project folder. Do this by adding the following line to your ~/.sbt/0.13/plugins/plugins.sbt file:

addSbtPlugin("org.ensime" % "ensime-sbt" % "0.2.0")

Now, you can just go to the root of your project folder and issue the sbt gen-ensime command. This will create the .ensime file using the information gathered by SBT about your project. After that, you can start using ENSIME to develop your project. Just load the newly created .ensime file before starting ENSIME.

Distributing your software

After you have written your oh-so-useful Scala library, you will probably want other people be able to use it. Ideally, you just want people to append the library name to their library dependencies list in the build.sbt file and then have that package automatically downloaded whenever needed. The process for publishing software this way is not currently very simple. There is, however, a simpler way of publishing your software and that is as an unmanaged dependency. Unmanaged dependencies differ from managed ones. The user will have to download a .jar file containing your library and place it under the lib directory in their project file tree in an unmanaged dependency. To create a .jar file for your project, all you have to do is use the sbt publish command. Simply type in sbt publish at the console and your Scala package will be compiled and put in target/ intervals_2.11-0.0.1.jar. Now, it is a simple matter of putting that .jar file in the lib/ directory of the project you want to use it in. Alternatively, you can put it up online for people to download. One thing to watch out for though is that, if your library has dependencies, then the user of that library will have to make sure they also end up in their lib/ folder.

Now, let's test this with our intervals library. First package it using the sbt publish command. Then, create a new project. It is actually very simple. Instead of creating a full project tree, you can simply create a directory for the project and put your source code directly in it. SBT is clever enough to figure out what is going on in these cases too.

Let's say we create a new project directory called intervals_user; inside this directory, create a new directory called lib. Now, copy the result of the sbt publish command, which will be called intervals_2.11-0.0.1.jar and will reside in the target subdirectory of our intervals project to this new directory. From here on, SBT will let you use this library in your new project. Create a new file called IntervalUser.scala and put the following code there:

import org.orbitfold.iafs.Interval

object IntervalUser {
  def main(args: Array[String]) = {
    val interval1 = new Interval(-0.5, 0.5)
    val interval2 = new Interval(0.2, 0.8)
    println(interval1 + interval2)
    println(interval1 - interval2)
    println(interval1 * interval2)
    println(interval1 / interval2)
  }
}

It is now simple to run this program. You can merely issue the sbt run command in the intervals_user folder that we created for this project. If you have done everything right, you should see the following lines as part of the output of this program:

[-0.3, 1.3]
[-1.3, 0.3]
[-0.4, 0.4]
[-2.5, 2.5]

Another method for distributing software is more involved. SBT uses Apache Ivy, which in turn looks for packages on the central Maven repository by default. What happens when you add a dependency to the library dependencies list in your build.sbt file is that the information there is used to locate the appropriate files on the Maven repository; the files are then downloaded to your computer. The process of publishing your library to these is complicated and will not be discussed here since it would be a large detour for a book about writing scientific software with Scala. For now, you can simply ask people to download your package to their lib folder. After you have worked more on your library and want it known and widely used, you can look up the process for publishing software to Maven central online.

 

Mixing Java and Scala code


Using Java code from Scala is fairly easy. This is because both languages are based on JVM. Using Scala code in Java is also possible but trickier. In general, Scala is designed to be compatible with Java. You can take advantage of this if you want to use one of the many available Java libraries. For example, you could use Java's Swing library to write a user interface for your program, or you may want to use Java's useful JFreeChart to perform data visualization or just basic plotting. All Java concepts translate more or less directly to the Scala concepts. We will look into that in one of the chapters of this book. For now, let's consider a really simple Scala program just to see how easy it is to write user interfaces with Swing in Scala. You can type the following into a script and then run it on Unix-like systems. You may have to modify the shell name and parameters:

#!/bin/sh
exec scala "$0" "[email protected]"
!#

import javax.swing.JFrame

object GUIHelloWorld extends App {
  val f = new JFrame
  f.setVisible(true)
  f.setTitle("Hello, world!")
  f.setSize(300, 200)
  f.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE)
}

HelloWorld.main(args)

We will discuss using other Java libraries in Chapter 8, Scientific Plotting with Scala.

 

Summary


In this chapter, we discussed the advantages of using Scala over other programming languages and environments for scientific computing. These include static typing and strong support for functional programming. We discussed how this will help you write better scientific software. We compared Scala to other popular programming languages and discussed their comparative merits and demerits; that is, Scala will allow you to write faster, better structured software while also keeping most of the advantages of dynamic languages.

We had a quick overview of the major scientific packages available for use in Scala. These cover a range from linear algebra and data analysis to statistical modeling. Using the ENSIME mode for Emacs and other text editors as a Scala IDE was discussed; we have also shown how to use ENSIME when debugging Scala code. Finally, and perhaps most importantly, we showed you how to use SBT to package, build, test, and distribute your software. Using well-established, convenient, and powerful build tools is very important since it removes a lot of the chores from writing software and allows you to concentrate on what is important.

We guided you through the process of writing, building, testing, and distributing an example library written in Scala. After all, you probably want your software to be used by other people. We also briefly describe how one would use Java libraries from a Scala program. You will want to know this since you will probably want your standalone program to have a nice Swing interface, or you could take advantage of JFreeChart when performing scientific plotting.

Tip

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

  • You can download the code files by following these steps:

  • Log in or register to our website using your e-mail address and password.

  • Hover the mouse pointer on the SUPPORT tab at the top.

  • Click on Code Downloads & Errata.

  • Enter the name of the book in the Search box.

  • Select the book for which you're looking to download the code files.

  • Choose from the drop-down menu where you purchased this book from.

  • Click on Code Download.

You can also download the code files by clicking on the Code Files button on the book's webpage at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Please note that you need to be logged in to your Packt account.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR / 7-Zip for Windows

  • Zipeg / iZip / UnRarX for Mac

  • 7-Zip / PeaZip for Linux

About the Author

  • Vytautas Jančauskas

    Vytautas Jančauskas is a computer science PhD student and lecturer at Vilnius University. At the time of writing, he was about to get a PhD in computer science. The thesis concerns multiobjective optimization using nature-inspired optimization methods. Throughout the years, he has worked on a number of open source projects that have to do with scientific computing. These include Octave, pandas, and others. Currently, he is working with numerical codes with astrophysical applications.

    He has experience writing code to be run on supercomputers, optimizing code for performance, and interfacing C code to higher-level languages. He has been teaching computer networks, operating systems design, C programming, and computer architecture to computer science and software engineering undergraduates at Vilnius University for 4 years now.

    His primary research interests include optimization, numerical algorithms, programming language design, and software engineering. Vytautas has significant experience with various different programming languages. He has written simple programs and has participated in projects using Scheme, Common Lisp, Python, C/C++, and Scala. He has experience working as a Unix systems administrator. He also has significant experience working with numerical computing platforms such as NumPy/MATLAB and data analysis frameworks such pandas and R.

    Browse publications by this author

Latest Reviews

(5 reviews total)
É uma boa compra para projetos científicos. Estava precisando ter uma avaliação de uso com problemas de uso intenso de algoritmos numéricos, em especial Álgebra Linear Computacional e Problemas de Aproximação.
Very shallow description of libraries. For instance the chapter on Breeze does not show how to create your own UFuncs, in fact it does not even mention UFunc at all! It does not touch on the intricacies of Breeze implicits. It reviews on several libs, granted, but does not mention Apache Spark or any muli-dimensional array library like ND4J.
Scientific Computing with Scala
Unlock this book and the full library for FREE
Start free trial