Reader small image

You're reading from  Fast Data Processing Systems with SMACK Stack

Product typeBook
Published inDec 2016
Reading LevelIntermediate
PublisherPackt
ISBN-139781786467201
Edition1st Edition
Languages
Right arrow
Author (1)
Raúl Estrada
Raúl Estrada
author image
Raúl Estrada

Raúl Estrada has been a programmer since 1996 and a Java developer since 2001. He loves all topics related to computer science. With more than 15 years of experience in high-availability and enterprise software, he has been designing and implementing architectures since 2003. His specialization is in systems integration, and he mainly participates in projects related to the financial sector. He has been an enterprise architect for BEA Systems and Oracle Inc., but he also enjoys web, mobile, and game programming. Raúl is a supporter of free software and enjoys experimenting with new technologies, frameworks, languages, and methods. Raúl is the author of other Packt Publishing titles, such as Fast Data Processing Systems with SMACK and Apache Kafka Cookbook.
Read more about Raúl Estrada

Right arrow

Chapter 7. Study Case 1 - Spark and Cassandra

The three last chapters are study cases. In the first study case we discuss the relationship between Spark and Cassandra; in the second study case we explore the relationship among the other technologies; and in the last chapter we analyze the Mesos frameworks and containers.

Remember that in all the examples we use Scala as language and Akka as the actor model. Also Mesos is considered an infrastructure technology, so we assume that all the use cases can be deployed on Mesos and use Scala and Akka.

This chapter has the following parts:

  • Spark Cassandra connector
  • Study case: The Calliope project

Spark Cassandra connector


To use Apache Spark and Apache Cassandra together, we could develop the calls with our bare hands, but thanks to the open source community coordinated by the DataStax people we have the Spark Cassandra connector. If you remember the history, Cassandra was a project conceived on Facebook that became an Apache project and reached such a size that a whole company was created to support it: DataStax.

DataStax is the company responsible for Apache Cassandra's fate. DataStax has developed, among other useful tools, the Spark-Cassandra connector, which is a powerful open source library that hast three main directives:

  1. Expose Cassandra tables as Spark RDDs.
  2. Write Spark RDDs to Cassandra.
  3. Execute CQL queries within Spark applications.

The Spark-Cassandra connector main features are:

  • Supports Apache Spark version 1.0 through 1.6
  • Supports Apache Cassandra version 2.0 or later
  • Supports Scala versions 2.10 and 2.11
  • Supports all the Cassandra data types including collections
  • Can convert...

Study case: The Calliope project


In Greek mythology, Calliope (/kəˈlaɪ.əpiː/ kə-ly-ə-pee; Ancient Greek: Καλλιόπη Kalliopē "beautiful-voiced") was the muse of epic poetry. Calliope was the daughter of Zeus and Mnemosyne, and is believed she was the muse of the poet Homer who inspired the Odyssey and the Iliad.

Calliope is the bridge between Cassandra and Spark that allows us to create fast real-time data apps with ease. Calliope is a library that provides an interface to consume Cassandra data into Spark and vice versa; and to store Spark Resilient Distributed Datasets into Cassandra. As we saw, we can use Spark on Cassandra without Calliope, but Calliope make it all easier.

Calliope was started by Tuplejump Inc in 2013, when there was no other solution available to work with Cassandra Data in Spark. In 2014 Tuplejump worked on the core stabilization while Calliope was adopted and deployed at many organizations.

Installing Calliope

To use the Calliope jar from the Spark shell, add this jar to...

Summary


In the case study in this chapter we have covered the connection between Spark and Cassandra.

We looked at the Spark Cassandra connector and how to make the Cassandra and Spark Context setup, Cassandra and Spark streaming, streaming context creation, reading and writing a stream from Cassandra, saving datasets, collections and tuples to Cassandra, modifying collections, saving UDTs and RDDs as tables.

We also reviewed the Calliope project: installing Calliope, reading and writing from Cassandra with CQL3 and writing and reading from Cassandra with Thrift.

This chapter was about the relation between Spark and Cassandra. In the next chapter we will examine the relationship between the remaining SMACK technologies.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Fast Data Processing Systems with SMACK Stack
Published in: Dec 2016Publisher: PacktISBN-13: 9781786467201
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Raúl Estrada

Raúl Estrada has been a programmer since 1996 and a Java developer since 2001. He loves all topics related to computer science. With more than 15 years of experience in high-availability and enterprise software, he has been designing and implementing architectures since 2003. His specialization is in systems integration, and he mainly participates in projects related to the financial sector. He has been an enterprise architect for BEA Systems and Oracle Inc., but he also enjoys web, mobile, and game programming. Raúl is a supporter of free software and enjoys experimenting with new technologies, frameworks, languages, and methods. Raúl is the author of other Packt Publishing titles, such as Fast Data Processing Systems with SMACK and Apache Kafka Cookbook.
Read more about Raúl Estrada