Reader small image

You're reading from  Learning Spark SQL

Product typeBook
Published inSep 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781785888359
Edition1st Edition
Languages
Right arrow

Understanding performance improvements using whole-stage code generation


In this section, we first present a high-level of whole-stage generation in Spark SQL, followed by a set of examples to show improvements in various JOINs using Catalyst's code generation feature.

After we have an optimized query plan, it needs to be converted to a DAG of RDDs for execution on the cluster. We use this example to explain the basic concepts of Spark SQL whole-stage code generation:

scala> sql("select count(*) from orders where customer_id = 26333955").explain() 
 
== Optimized Logical Plan == 
Aggregate [count(1) AS count(1)#45L] 
+- Project 
   +- Filter (isnotnull(customer_id#42L) && (customer_id#42L = 
              26333955)) 
      +- Relation[customer_id#42L,good_id#43L] parquet 

The preceding optimized logical plan can be viewed as a sequence of Scan, Filter, Project, and Aggregate operations, as shown in the following figure:

Traditional databases will typically execute the preceding...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Learning Spark SQL
Published in: Sep 2017Publisher: PacktISBN-13: 9781785888359