Packt+ | Advance your knowledge in tech

You're reading from Fast Data Processing Systems with SMACK Stack

Product typeBook

Published inDec 2016

Reading LevelIntermediate

PublisherPackt

ISBN-139781786467201

Edition1st Edition

Languages

Scala

Tools

Mesos Apache Spark

Concepts

Data Processing

Author (1)

Raúl Estrada

Chapter 8. Study Case 2 - Connectors

In this chapter, we analyze the Connectors, that is, the software pieces that enable SMACK stack technologies to communicate with each other. The relationship between Spark and Kafka, was covered in the Kafka chapter, and we also dealt with the relationship between Spark and Cassandra in the previous study case in Chapter 7, Study Case 1 - Spark and Cassandra.

This chapter has the following sections, along with the remaining relationships:

Akka and Cassandra
Akka and Spark
Kafka and Akka
Kafka and Cassandra

Akka and Cassandra

For this example, we will use the DataStax Cassandra driver and Akka to build an application that downloads tweets and then stores their ID, text, name, and date in a Cassandra table. Here we will see:

How to build a simple Akka application with just a few actors
How to use Akka IO to make HTTP requests
How to store the data in Cassandra

The first step is to build our core example. It contains three actors: two actors interact with the database and one actor downloads the tweets. The TwitterReadActor reads from Cassandra, the TweetWriteActor writes to Cassandra, and the TweetScanActor downloads the tweets and passes them to the TweetWriteActor to be written to Cassandra:

class TweetReadActor(cluster: Cluster) extends Actor { ... } 
 
class TweetWriterActor(cluster: Cluster) extends Actor { ... } 
 
class TweetScanActor(tweetWrite: ActorRef, queryUrl: String => String) extends Actor {  ... }

Figure 8-1 shows the relationship between the Twitter downloader...

Akka and Spark

We start developing the Spark Streaming application by creating a SparkConf followed by a StreamingContext:

val conf = new SparkConf(false) 
  .setMaster("local[*]") 
  .setAppName("Spark Streaming with Akka") 
  .set("spark.logConf", "true") 
  .set("spark.driver.port", s"$driverPort") 
  .set("spark.driver.host", s"$driverHost") 
  .set("spark.akka.logLifecycleEvents", "true") 
val ssc = new StreamingContext(conf, Seconds(1))

This gives us a context to access the actor system that is of the type ReceiverInputDStream:

val actorName = "salutator" 
val actorStream: ReceiverInputDStream[String] = ssc.actorStream[String](Props[Salutator], actorName)

Now that we have a DStream, let's define a high-level processing pipeline in Spark Streaming:

actorStream.print()

In the preceding case, the print() method is going to print the first 10 elements of each RDD generated in this DStream. Nothing happens until start() is executed...

Kafka and Akka

The connector is available at Maven Central for Scala 2.11 in the coordinates:

libraryDependencies += "com.typesafe.akka" %% "akka-stream-kafka" % "0.11-M4"

If you remember, a producer publishes messages to Kafka topics. The message itself contains information about what topic and partition to publish, so one can publish to different topics with the same producer. The underlying implementation uses the KafkaProducer.

When creating a consumer stream we need to pass ProducerSettings defining:

Kafka cluster bootstrap servers
Serializers for the keys and values
Tuning parameters

Here we have a ProducerSettings example:

import akka.kafka._ 
import akka.kafka.scaladsl._ 
import org.apache.kafka.common.serialization.StringSerializer 
import org.apache.kafka.common.serialization.ByteArraySerializer 
 
val producerSettings = ProducerSettings(system, new ByteArraySerializer, new StringSerializer).withBootstrapServers("localhost:9092")

The easiest way to publish...

Kafka and Cassandra

We need to use the kafka-connect-cassandra which is published on Maven Central by Tuplejump.

It can be defined as a dependency in the build file. For example, with SBT:

libraryDependencies += "com.tuplejump" %% "kafka-connect-cassandra" % "0.0.7"

This code polls Cassandra with a specific query. Using this, data can be fetched from Cassandra in two modes:

Bulk
Timestamp based

The modes change automatically based on the query. For example:

Bulk:

SELECT * FROM userlog;

Timestamp based:

SELECT * FROM userlog WHERE ts > previousTime(); 
SELECT * FROM userlog WHERE ts = currentTime(); 
SELECT * FROM userlog WHERE ts >= previousTime() AND  ts <= currentTime() ;

Here, previousTime() and currentTime() are replaced before fetching the data.

CQL Type	Schema Type
ASCII	STRING
VARCHAR	STRING
TEXT	STRING
BIGINT	INT64
COUNTER	INT64
BOOLEAN	BOOLEAN
DECIMAL	FLOAT64
DOUBLE	FLOAT64
FLOAT	FLOAT32
TIMESTAMP	TIMESTAMP

Table...

Summary

We have reviewed the connectors between all the SMACK stack technologies. The Spark and Kafka connection was explained in the Chapter 5, The Broker - Apache Kafka and the Spark and Cassandra connector was explained in the Chapter 7, Study Case 1 - Spark and Cassandra.

In this chapter, we reviewed the connectors between:

Akka and Cassandra
Akka and Spark
Kafka and Akka
Kafka and Cassandra

In the following chapter, we will review container technologies.

The rest of the chapter is locked

You have been reading a chapter from

Fast Data Processing Systems with SMACK Stack

Published in: Dec 2016Publisher: PacktISBN-13: 9781786467201

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Raúl Estrada

Raúl Estrada has been a programmer since 1996 and a Java developer since 2001. He loves all topics related to computer science. With more than 15 years of experience in high-availability and enterprise software, he has been designing and implementing architectures since 2003. His specialization is in systems integration, and he mainly participates in projects related to the financial sector. He has been an enterprise architect for BEA Systems and Oracle Inc., but he also enjoys web, mobile, and game programming. Raúl is a supporter of free software and enjoys experimenting with new technologies, frameworks, languages, and methods. Raúl is the author of other Packt Publishing titles, such as Fast Data Processing Systems with SMACK and Apache Kafka Cookbook.
Read more about Raúl Estrada

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages