Search icon
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Scala Data Analysis Cookbook

You're reading from  Scala Data Analysis Cookbook

Product type Book
Published in Oct 2015
Publisher
ISBN-13 9781784396749
Pages 254 pages
Edition 1st Edition
Languages
Author (1):
Arun Manivannan Arun Manivannan
Profile icon Arun Manivannan

Table of Contents (14) Chapters

Scala Data Analysis Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
1. Getting Started with Breeze 2. Getting Started with Apache Spark DataFrames 3. Loading and Preparing Data – DataFrame 4. Data Visualization 5. Learning from Data 6. Scaling Up 7. Going Further Index

Getting Breeze – the linear algebra library


In simple terms, Breeze (http://www.scalanlp.org) is a Scala library that extends the Scala collection library to provide support for vectors and matrices in addition to providing a whole bunch of functions that support their manipulation. We could safely compare Breeze to NumPy (http://www.numpy.org/) in Python terms. Breeze forms the foundation of MLlib—the Machine Learning library in Spark, which we will explore in later chapters.

In this first recipe, we will see how to pull the Breeze libraries into our project using Scala Build Tool (SBT). We will also see a brief history of Breeze to better appreciate why it could be considered as the "go to" linear algebra library in Scala.

Note

For all our recipes, we will be using Scala 2.10.4 along with Java 1.7. I wrote the examples using the Scala IDE, but please feel free to use your favorite IDE.

How to do it...

Let's add the Breeze dependencies into our build.sbt so that we can start playing with them in the subsequent recipes. The Breeze dependencies are just two—the breeze (core) and the breeze-native dependencies.

  1. Under a brand new folder (which will be our project root), create a new file called build.sbt.

  2. Next, add the breeze libraries to the project dependencies:

    organization := "com.packt"
    
    name := "chapter1-breeze"
    
    scalaVersion := "2.10.4"
    
    libraryDependencies  ++= Seq(
      "org.scalanlp" %% "breeze" % "0.11.2",
      //Optional - the 'why' is explained in the How it works section
      "org.scalanlp" %% "breeze-natives" % "0.11.2"
    )
  3. From that folder, issue a sbt compile command in order to fetch all your dependencies.

    Note

    You could import the project into your Eclipse using sbt eclipse after installing the sbteclipse plugin https://github.com/typesafehub/sbteclipse/. For IntelliJ IDEA, you just need to import the project by pointing to the root folder where your build.sbt file is.

There's more...

Let's look into the details of what the breeze and breeze-native library dependencies we added bring to us.

The org.scalanlp.breeze dependency

Breeze has a long history in that it isn't written from scratch in Scala. Without the native dependency, Breeze leverages the power of netlib-java that has a Java-compiled version of the FORTRAN Reference implementation of BLAS/LAPACK. The netlib-java also provides gentle wrappers over the Java compiled library. What this means is that we could still work without the native dependency but the performance won't be great considering the best performance that we could leverage out of this FORTRAN-translated library is the performance of the FORTRAN reference implementation itself. However, for serious number crunching with the best performance, we should add the breeze-natives dependency too.

The org.scalanlp.breeze-natives package

With its native additive, Breeze looks for the machine-specific implementations of the BLAS/LAPACK libraries. The good news is that there are open source and (vendor provided) commercial implementations for most popular processors and GPUs. The most popular open source implementations include ATLAS (http://math-atlas.sourceforge.net) and OpenBLAS (http://www.openblas.net/).

If you are running a Mac, you are in luck—Native BLAS libraries come out of the box on Macs. Installing NativeBLAS on Ubuntu / Debian involves just running the following commands:

sudo apt-get install libatlas3-base libopenblas-base
sudo update-alternatives --config libblas.so.3
sudo update-alternatives --config liblapack.so.3

Tip

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

For Windows, please refer to the installation instructions on https://github.com/xianyi/OpenBLAS/wiki/Installation-Guide.

You have been reading a chapter from
Scala Data Analysis Cookbook
Published in: Oct 2015 Publisher: ISBN-13: 9781784396749
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}