Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Fast Data Processing Systems with SMACK Stack

You're reading from  Fast Data Processing Systems with SMACK Stack

Product type Book
Published in Dec 2016
Publisher Packt
ISBN-13 9781786467201
Pages 376 pages
Edition 1st Edition
Languages
Author (1):
Raúl Estrada Raúl Estrada
Profile icon Raúl Estrada

Table of Contents (15) Chapters

Fast Data Processing Systems with SMACK Stack
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
1. An Introduction to SMACK 2. The Model - Scala and Akka 3. The Engine - Apache Spark 4. The Storage - Apache Cassandra 5. The Broker - Apache Kafka 6. The Manager - Apache Mesos 7. Study Case 1 - Spark and Cassandra 8. Study Case 2 - Connectors 9. Study Case 3 - Mesos and Docker

Chapter 8. Study Case 2 - Connectors

In this chapter, we analyze the Connectors, that is, the software pieces that enable SMACK stack technologies to communicate with each other. The relationship between Spark and Kafka, was covered in the Kafka chapter, and we also dealt with the relationship between Spark and Cassandra in the previous study case in Chapter 7, Study Case 1 - Spark and Cassandra.

This chapter has the following sections, along with the remaining relationships:

  • Akka and Cassandra
  • Akka and Spark
  • Kafka and Akka
  • Kafka and Cassandra

Akka and Cassandra


For this example, we will use the DataStax Cassandra driver and Akka to build an application that downloads tweets and then stores their ID, text, name, and date in a Cassandra table. Here we will see:

  • How to build a simple Akka application with just a few actors
  • How to use Akka IO to make HTTP requests
  • How to store the data in Cassandra

The first step is to build our core example. It contains three actors: two actors interact with the database and one actor downloads the tweets. The TwitterReadActor reads from Cassandra, the TweetWriteActor writes to Cassandra, and the TweetScanActor downloads the tweets and passes them to the TweetWriteActor to be written to Cassandra:

class TweetReadActor(cluster: Cluster) extends Actor { ... } 
 
class TweetWriterActor(cluster: Cluster) extends Actor { ... } 
 
class TweetScanActor(tweetWrite: ActorRef, queryUrl: String => String) extends Actor {  ... } 

Figure 8-1 shows the relationship between the Twitter downloader...

Akka and Spark


We start developing the Spark Streaming application by creating a SparkConf followed by a StreamingContext:

val conf = new SparkConf(false) 
  .setMaster("local[*]") 
  .setAppName("Spark Streaming with Akka") 
  .set("spark.logConf", "true") 
  .set("spark.driver.port", s"$driverPort") 
  .set("spark.driver.host", s"$driverHost") 
  .set("spark.akka.logLifecycleEvents", "true") 
val ssc = new StreamingContext(conf, Seconds(1)) 

This gives us a context to access the actor system that is of the type ReceiverInputDStream:

val actorName = "salutator" 
val actorStream: ReceiverInputDStream[String] = ssc.actorStream[String](Props[Salutator], actorName) 

Now that we have a DStream, let's define a high-level processing pipeline in Spark Streaming:

actorStream.print() 

In the preceding case, the print() method is going to print the first 10 elements of each RDD generated in this DStream. Nothing happens until start() is executed...

Kafka and Akka


The connector is available at Maven Central for Scala 2.11 in the coordinates:

libraryDependencies += "com.typesafe.akka" %% "akka-stream-kafka" % "0.11-M4"

If you remember, a producer publishes messages to Kafka topics. The message itself contains information about what topic and partition to publish, so one can publish to different topics with the same producer. The underlying implementation uses the KafkaProducer.

When creating a consumer stream we need to pass ProducerSettings defining:

  • Kafka cluster bootstrap servers
  • Serializers for the keys and values
  • Tuning parameters

Here we have a ProducerSettings example:

import akka.kafka._ 
import akka.kafka.scaladsl._ 
import org.apache.kafka.common.serialization.StringSerializer 
import org.apache.kafka.common.serialization.ByteArraySerializer 
 
val producerSettings = ProducerSettings(system, new ByteArraySerializer, new StringSerializer).withBootstrapServers("localhost:9092") 

The easiest way to publish...

Kafka and Cassandra


We need to use the kafka-connect-cassandra which is published on Maven Central by Tuplejump.

It can be defined as a dependency in the build file. For example, with SBT:

libraryDependencies += "com.tuplejump" %% "kafka-connect-cassandra" % "0.0.7"

This code polls Cassandra with a specific query. Using this, data can be fetched from Cassandra in two modes:

  • Bulk
  • Timestamp based

The modes change automatically based on the query. For example:

Bulk:

SELECT * FROM userlog; 
 

Timestamp based:

SELECT * FROM userlog WHERE ts > previousTime(); 
SELECT * FROM userlog WHERE ts = currentTime(); 
SELECT * FROM userlog WHERE ts >= previousTime() AND  ts <= currentTime() ;  
 

Here, previousTime() and currentTime() are replaced before fetching the data.

CQL Type

Schema Type

ASCII

STRING

VARCHAR

STRING

TEXT

STRING

BIGINT

INT64

COUNTER

INT64

BOOLEAN

BOOLEAN

DECIMAL

FLOAT64

DOUBLE

FLOAT64

FLOAT

FLOAT32

TIMESTAMP

TIMESTAMP

Table...

Summary


We have reviewed the connectors between all the SMACK stack technologies. The Spark and Kafka connection was explained in the Chapter 5, The Broker - Apache Kafka and the Spark and Cassandra connector was explained in the Chapter 7, Study Case 1 - Spark and Cassandra.

In this chapter, we reviewed the connectors between:

  • Akka and Cassandra
  • Akka and Spark
  • Kafka and Akka
  • Kafka and Cassandra

In the following chapter, we will review container technologies.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Fast Data Processing Systems with SMACK Stack
Published in: Dec 2016 Publisher: Packt ISBN-13: 9781786467201
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}