Reader small image

You're reading from  Apache Spark 2.x for Java Developers

Product typeBook
Published inJul 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781787126497
Edition1st Edition
Languages
Right arrow
Authors (2):
Sourav Gulati
Sourav Gulati
author image
Sourav Gulati

Sourav Gulati is associated with software industry for more than 7 years. He started his career with Unix/Linux and Java and then moved towards big data and NoSQL World. He has worked on various big data projects. He has recently started a technical blog called Technical Learning as well. Apart from IT world, he loves to read about mythology.
Read more about Sourav Gulati

Sumit Kumar
Sumit Kumar
author image
Sumit Kumar

Sumit Kumar is a developer with industry insights in telecom and banking. At different junctures, he has worked as a Java and SQL developer, but it is shell scripting that he finds both challenging and satisfying at the same time. Currently, he delivers big data projects focused on batch/near-real-time analytics and the distributed indexed querying system. Besides IT, he takes a keen interest in human and ecological issues.
Read more about Sumit Kumar

View More author details
Right arrow

Advanced transformations


As stated earlier in this book, if an RDD operation returns an RDD, then it is called a transformation. In Chapter 4, Understanding the Spark Programming Model, we learnt about commonly used useful transformations. Now we are going to look at some advanced level transformations.

mapPartitions

The working of this transformation is similar to map transformation. However, instead of acting upon each element of the RDD, it acts upon each partition of the RDD. So, the map function is executed once per RDD partition. Therefore, there will one-to-one mapping between partitions of the source RDD and the target RDD.

As a partition of an RDD is stored as a whole on a node, this transformation does not require shuffling.

In the following example, we will create an RDD of integers and increment all elements of the RDD by 1 using mapPartitions:

JavaRDD<Integer> intRDD = jsc.parallelize(Arrays.asList(1,2,3,4,5,6,7,8,9,10),2);

Java 7:

intRDD.mapPartitions(new FlatMapFunction<...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Apache Spark 2.x for Java Developers
Published in: Jul 2017Publisher: PacktISBN-13: 9781787126497

Authors (2)

author image
Sourav Gulati

Sourav Gulati is associated with software industry for more than 7 years. He started his career with Unix/Linux and Java and then moved towards big data and NoSQL World. He has worked on various big data projects. He has recently started a technical blog called Technical Learning as well. Apart from IT world, he loves to read about mythology.
Read more about Sourav Gulati

author image
Sumit Kumar

Sumit Kumar is a developer with industry insights in telecom and banking. At different junctures, he has worked as a Java and SQL developer, but it is shell scripting that he finds both challenging and satisfying at the same time. Currently, he delivers big data projects focused on batch/near-real-time analytics and the distributed indexed querying system. Besides IT, he takes a keen interest in human and ecological issues.
Read more about Sumit Kumar