Reader small image

You're reading from  Learning Spark SQL

Product typeBook
Published inSep 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781785888359
Edition1st Edition
Languages
Right arrow

Understanding multi-way JOIN ordering optimization


Spark SQL optimizer's heuristics can transform a SELECT statement into a query plan with the following characteristics:

  • The filter operator and project operator are pushed down below the join operator, that is, both the filter and project operators are executed before the join operator
  • Without subquery block, the join operator is pushed down below the aggregate operator for a select statement, that is, a join operator is usually executed before the aggregate operator

With this observation, the biggest benefit we can get from CBO is multi-way join ordering optimization. Using a dynamic programming technique, we try to get globally optimal join order for a multi-way join query.

Note

For more details on multi-way join reordering in Spark 2.2, refer to https://spark-summit.org/2017/events/cost-based-optimizer-in-apache-spark-22/.

Clearly, the join cost is the dominant factor in choosing the best join order. The cost formula is dependent on the implementation...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Learning Spark SQL
Published in: Sep 2017Publisher: PacktISBN-13: 9781785888359