Reader small image

You're reading from  Fast Data Processing Systems with SMACK Stack

Product typeBook
Published inDec 2016
Reading LevelIntermediate
PublisherPackt
ISBN-139781786467201
Edition1st Edition
Languages
Right arrow
Author (1)
Raúl Estrada
Raúl Estrada
author image
Raúl Estrada

Raúl Estrada has been a programmer since 1996 and a Java developer since 2001. He loves all topics related to computer science. With more than 15 years of experience in high-availability and enterprise software, he has been designing and implementing architectures since 2003. His specialization is in systems integration, and he mainly participates in projects related to the financial sector. He has been an enterprise architect for BEA Systems and Oracle Inc., but he also enjoys web, mobile, and game programming. Raúl is a supporter of free software and enjoys experimenting with new technologies, frameworks, languages, and methods. Raúl is the author of other Packt Publishing titles, such as Fast Data Processing Systems with SMACK and Apache Kafka Cookbook.
Read more about Raúl Estrada

Right arrow

Chapter 8. Study Case 2 - Connectors

In this chapter, we analyze the Connectors, that is, the software pieces that enable SMACK stack technologies to communicate with each other. The relationship between Spark and Kafka, was covered in the Kafka chapter, and we also dealt with the relationship between Spark and Cassandra in the previous study case in Chapter 7, Study Case 1 - Spark and Cassandra.

This chapter has the following sections, along with the remaining relationships:

  • Akka and Cassandra
  • Akka and Spark
  • Kafka and Akka
  • Kafka and Cassandra

Akka and Cassandra


For this example, we will use the DataStax Cassandra driver and Akka to build an application that downloads tweets and then stores their ID, text, name, and date in a Cassandra table. Here we will see:

  • How to build a simple Akka application with just a few actors
  • How to use Akka IO to make HTTP requests
  • How to store the data in Cassandra

The first step is to build our core example. It contains three actors: two actors interact with the database and one actor downloads the tweets. The TwitterReadActor reads from Cassandra, the TweetWriteActor writes to Cassandra, and the TweetScanActor downloads the tweets and passes them to the TweetWriteActor to be written to Cassandra:

class TweetReadActor(cluster: Cluster) extends Actor { ... } 
 
class TweetWriterActor(cluster: Cluster) extends Actor { ... } 
 
class TweetScanActor(tweetWrite: ActorRef, queryUrl: String => String) extends Actor {  ... } 

Figure 8-1 shows the relationship between the Twitter downloader...

Akka and Spark


We start developing the Spark Streaming application by creating a SparkConf followed by a StreamingContext:

val conf = new SparkConf(false) 
  .setMaster("local[*]") 
  .setAppName("Spark Streaming with Akka") 
  .set("spark.logConf", "true") 
  .set("spark.driver.port", s"$driverPort") 
  .set("spark.driver.host", s"$driverHost") 
  .set("spark.akka.logLifecycleEvents", "true") 
val ssc = new StreamingContext(conf, Seconds(1)) 

This gives us a context to access the actor system that is of the type ReceiverInputDStream:

val actorName = "salutator" 
val actorStream: ReceiverInputDStream[String] = ssc.actorStream[String](Props[Salutator], actorName) 

Now that we have a DStream, let's define a high-level processing pipeline in Spark Streaming:

actorStream.print() 

In the preceding case, the print() method is going to print the first 10 elements of each RDD generated in this DStream. Nothing happens until start() is executed...

Kafka and Akka


The connector is available at Maven Central for Scala 2.11 in the coordinates:

libraryDependencies += "com.typesafe.akka" %% "akka-stream-kafka" % "0.11-M4"

If you remember, a producer publishes messages to Kafka topics. The message itself contains information about what topic and partition to publish, so one can publish to different topics with the same producer. The underlying implementation uses the KafkaProducer.

When creating a consumer stream we need to pass ProducerSettings defining:

  • Kafka cluster bootstrap servers
  • Serializers for the keys and values
  • Tuning parameters

Here we have a ProducerSettings example:

import akka.kafka._ 
import akka.kafka.scaladsl._ 
import org.apache.kafka.common.serialization.StringSerializer 
import org.apache.kafka.common.serialization.ByteArraySerializer 
 
val producerSettings = ProducerSettings(system, new ByteArraySerializer, new StringSerializer).withBootstrapServers("localhost:9092") 

The easiest way to publish...

Kafka and Cassandra


We need to use the kafka-connect-cassandra which is published on Maven Central by Tuplejump.

It can be defined as a dependency in the build file. For example, with SBT:

libraryDependencies += "com.tuplejump" %% "kafka-connect-cassandra" % "0.0.7"

This code polls Cassandra with a specific query. Using this, data can be fetched from Cassandra in two modes:

  • Bulk
  • Timestamp based

The modes change automatically based on the query. For example:

Bulk:

SELECT * FROM userlog; 
 

Timestamp based:

SELECT * FROM userlog WHERE ts > previousTime(); 
SELECT * FROM userlog WHERE ts = currentTime(); 
SELECT * FROM userlog WHERE ts >= previousTime() AND  ts <= currentTime() ;  
 

Here, previousTime() and currentTime() are replaced before fetching the data.

CQL Type

Schema Type

ASCII

STRING

VARCHAR

STRING

TEXT

STRING

BIGINT

INT64

COUNTER

INT64

BOOLEAN

BOOLEAN

DECIMAL

FLOAT64

DOUBLE

FLOAT64

FLOAT

FLOAT32

TIMESTAMP

TIMESTAMP

Table...

Summary


We have reviewed the connectors between all the SMACK stack technologies. The Spark and Kafka connection was explained in the Chapter 5, The Broker - Apache Kafka and the Spark and Cassandra connector was explained in the Chapter 7, Study Case 1 - Spark and Cassandra.

In this chapter, we reviewed the connectors between:

  • Akka and Cassandra
  • Akka and Spark
  • Kafka and Akka
  • Kafka and Cassandra

In the following chapter, we will review container technologies.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Fast Data Processing Systems with SMACK Stack
Published in: Dec 2016Publisher: PacktISBN-13: 9781786467201
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Raúl Estrada

Raúl Estrada has been a programmer since 1996 and a Java developer since 2001. He loves all topics related to computer science. With more than 15 years of experience in high-availability and enterprise software, he has been designing and implementing architectures since 2003. His specialization is in systems integration, and he mainly participates in projects related to the financial sector. He has been an enterprise architect for BEA Systems and Oracle Inc., but he also enjoys web, mobile, and game programming. Raúl is a supporter of free software and enjoys experimenting with new technologies, frameworks, languages, and methods. Raúl is the author of other Packt Publishing titles, such as Fast Data Processing Systems with SMACK and Apache Kafka Cookbook.
Read more about Raúl Estrada