Reader small image

You're reading from  Learning Spark SQL

Product typeBook
Published inSep 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781785888359
Edition1st Edition
Languages
Right arrow

Understanding the SparkR architecture


SparkR's distributed DataFrame programming syntax that is familiar to R users. The high-level DataFrame API integrates the R API with the optimized SQL execution engine in Spark.  

SparkR's architecture primarily consists of two components: an R to JVM binding on the driver that enables R programs to submit jobs to a Spark cluster and support for running R on the Spark executors.

SparkR's design consists of support for launching R processes on Spark executor machines. However, there is an overhead associated with serializing the query and deserializing the results after they have been computed. As the amount of data transferred between R and the JVM increases, these overheads can become more significant as well. However, caching can enable efficient interactive query processing in SparkR.

Note

For a detailed description of SparkR design and implementation, refer: "SparkR: Scaling R Programs with Spark" by Shivaram Venkataraman1, Zongheng Yang, et al, available...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Learning Spark SQL
Published in: Sep 2017Publisher: PacktISBN-13: 9781785888359