Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Scala and Spark for Big Data Analytics

You're reading from  Scala and Spark for Big Data Analytics

Product type Book
Published in Jul 2017
Publisher Packt
ISBN-13 9781785280849
Pages 796 pages
Edition 1st Edition
Languages
Concepts
Authors (2):
Md. Rezaul Karim Md. Rezaul Karim
Profile icon Md. Rezaul Karim
Sridhar Alla Sridhar Alla
Profile icon Sridhar Alla
View More author details

Table of Contents (19) Chapters

Preface Introduction to Scala Object-Oriented Scala Functional Programming Concepts Collection APIs Tackle Big Data – Spark Comes to the Party Start Working with Spark – REPL and RDDs Special RDD Operations Introduce a Little Structure - Spark SQL Stream Me Up, Scotty - Spark Streaming Everything is Connected - GraphX Learning Machine Learning - Spark MLlib and Spark ML My Name is Bayes, Naive Bayes Time to Put Some Order - Cluster Your Data with Spark MLlib Text Analytics Using Spark ML Spark Tuning Time to Go to ClusterLand - Deploying Spark on a Cluster Testing and Debugging Spark PySpark and SparkR

Collection APIs

"That we become depends on what we read after all of the professors have finished with us. The greatest university of all is a collection of books."

- Thomas Carlyle

One of the features that attract most Scala users in its Collection APIs that are very powerful, flexible, and has lots of operations coupled with it. The wide range of operations will make your life easy dealing with any kind of data. We are going to introduce Scala collections APIs including their different types and hierarchies in order to accommodate different types of data and solve a wide range of different problems. In a nutshell, the following topics will be covered in this chapter:

  • Scala collection APIs
  • Types and hierarchies
  • Performance characteristics
  • Java interoperability
  • Using Scala implicits

Scala collection APIs

The Scala collections are a well-understood and frequently used programming abstraction that can be distinguished between mutable and immutable collections. Like a mutable variable, a mutable collection can be changed, updated, or extended when necessary. However, like an immutable variable, immutable collections cannot be changed. Most collection classes to utilize them are located in the packages scala.collection, scala.collection.immutable, and scala.collection.mutable, respectively.

This extremely powerful feature of Scala provides you with the following facility to use and manipulate your data:

  • Easy to use: For example, it helps you eliminate the interference between iterators and collection updates. As a result, a small vocabulary consisting of 20-50 methods should be enough to solve most of your collection problem in your data analytics solution.
  • ...

Types and hierarchies

Scala collections are a well-understood and frequently-used programming abstraction that can be distinguished between mutable and immutable collections. Like a mutable variable, a mutable collection can be changed, updated, or extended when necessary. Like an immutable variable, immutable collections; cannot be changed. Most collection classes that utilize them are located in the packages scala.collection, scala.collection.immutable, and scala.collection.mutable, respectively.

The following hierarchical diagram (Figure 1) shows the Scala collections API hierarchy according to the official documentation of Scala. These all are either high-level abstract classes or traits. These have mutable as well as immutable implementations.

Figure 1: Collections under package scala.collection
...

Performance characteristics

In Scala, different collections have different performance characteristics and these performance characteristics are the reason you will prefer to choose one collection over the others. In this section, we will judge the performance characteristics of Scala collection objects from the operational and memory usage point of view. At the end of this section, we will provide some guidelines for selecting appropriate collection objects for your code and problem types.

Performance characteristics of collection objects

The following are the performance characteristics Scala Collections, based on the official documentation of Scala.

  • Const: The operation takes only constant time.
  • eConst: The operation takes...

Java interoperability

As we mentioned earlier, Scala has very rich collection API. The same applies for Java but there are lots of differences between the two collection APIs. For example, both APIs have iterable, iterators, maps, sets, and sequences. But Scala has advantages; it pays more attention to immutable collections and provides more operations for you in order to produce another collection. Sometimes, you want to use or access Java collections or vice versa.

JavaConversions is no longer a sound choice. JavaConverters makes the conversion between Scala and Java collection explicit and you'll be much less likely to experience implicit conversions you didn't intend to use.

As a matter of fact, it's quite trivial to do so because Scala offers in an implicit way to convert between both APIs in the JavaConversion object. So, you might find bidirectional conversions...

Using Scala implicits

We have addressed implicits in the previous chapters, but here we are going to see more examples. Implicit parameters are very similar to default parameters but they use different mechanisms in order to find the default value.

An implicit parameter is one that is passed to a constructor or a method and is marked as implicit, which means that the compiler will search for an implicit value within the scope if you don't provide a value for this parameter. For example:

scala> def func(implicit x:Int) = print(x) 
func: (implicit x: Int)Unit
scala> func
<console>:9: error: could not find implicit value for parameter x: Int
func
^
scala> implicit val defVal = 2
defVal: Int = 2
scala> func(3)
3

Implicits are very useful for the collection API. For example, the collections API use implicit parameters to supply CanBuildFrom...

Summary

Throughout this chapter, we have seen many examples of using the Scala collections API. It's very powerful, flexible, and has lots of operations coupled with them. This wide range of operations will make your life easy dealing with any kind of data. We introduced the Scala collections API, and its different types and hierarchies. We also demonstrated the capabilities of the Scala collections API and how it can be used in order to accommodate different types of data and solve a wide range of different problems. In summary, you learned about types and hierarchies, performance characteristic, Java interoperability, and the usage of implicits. So, this is more or less the end of the learning Scala. However, you will keep on learning more advanced topics and operations using Scala through the following chapters.

In the next chapter, we will explore data analysis and big...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Scala and Spark for Big Data Analytics
Published in: Jul 2017 Publisher: Packt ISBN-13: 9781785280849
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}