Reader small image

You're reading from  Fast Data Processing Systems with SMACK Stack

Product typeBook
Published inDec 2016
Reading LevelIntermediate
PublisherPackt
ISBN-139781786467201
Edition1st Edition
Languages
Right arrow
Author (1)
Raúl Estrada
Raúl Estrada
author image
Raúl Estrada

Raúl Estrada has been a programmer since 1996 and a Java developer since 2001. He loves all topics related to computer science. With more than 15 years of experience in high-availability and enterprise software, he has been designing and implementing architectures since 2003. His specialization is in systems integration, and he mainly participates in projects related to the financial sector. He has been an enterprise architect for BEA Systems and Oracle Inc., but he also enjoys web, mobile, and game programming. Raúl is a supporter of free software and enjoys experimenting with new technologies, frameworks, languages, and methods. Raúl is the author of other Packt Publishing titles, such as Fast Data Processing Systems with SMACK and Apache Kafka Cookbook.
Read more about Raúl Estrada

Right arrow

Resilient distributed datasets


The Spark soul is the resilient distributed dataset. Spark has four design goals: make in-memory (Hadoop is not in-memory) data storage, distribute in a cluster, be fault tolerant, and be fast and efficient.

Fault tolerance is achieved, in part, by applying linear operations on small data chunks. Efficiency is achieved by parallelization of operations throughout all parts of the cluster. Performance is achieved by minimizing data replication between cluster members.

A fundamental concept in Spark is that there are only two types of operations we can do on an RDD:

  • Transformations: A new RDD is created from the original; for example, mapping, filtering, union, intersection, sort, join, coalesce
  • Actions: The original RDD isn't changed; for example, count, collect, first

It's right when people say that computer science is mathematics with a costume. As we've already seen, in functional programming, functions are first-class citizens; the equivalent in mathematics is...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Fast Data Processing Systems with SMACK Stack
Published in: Dec 2016Publisher: PacktISBN-13: 9781786467201

Author (1)

author image
Raúl Estrada

Raúl Estrada has been a programmer since 1996 and a Java developer since 2001. He loves all topics related to computer science. With more than 15 years of experience in high-availability and enterprise software, he has been designing and implementing architectures since 2003. His specialization is in systems integration, and he mainly participates in projects related to the financial sector. He has been an enterprise architect for BEA Systems and Oracle Inc., but he also enjoys web, mobile, and game programming. Raúl is a supporter of free software and enjoys experimenting with new technologies, frameworks, languages, and methods. Raúl is the author of other Packt Publishing titles, such as Fast Data Processing Systems with SMACK and Apache Kafka Cookbook.
Read more about Raúl Estrada