Reader small image

You're reading from  Learning Spark SQL

Product typeBook
Published inSep 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781785888359
Edition1st Edition
Languages
Right arrow

Processing graphs containing multiple types of relationships


For the next few examples, we use an augmented DataFrame containing a relationship column. We insert two types of relationships in the column based on the number of similar purchases and the number of categories that a product belongs to.

For this, we join the nodes and edges DataFrames, and subsequently drop the node-related columns after the relationship computation is completed to obtain our final edges DataFrame (with the relationship column suitably populated):

scala> val joinDF = nodesDF.join(edgesDF).where(nodesDF("id") === edgesDF("src")).withColumn("relationship", when(($"similars" > 4) and ($"categories" <= 3), "highSimilars").otherwise("alsoPurchased"))
scala> val edgesDFR = joinDF.select("src", "dst", "relationship")
scala> val gDFR = GraphFrame(nodesDF, edgesDFR)

Next, we count the number of records for each type of relationship and list a few edges along with the relationship values:

scala> gDFR.edges...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Learning Spark SQL
Published in: Sep 2017Publisher: PacktISBN-13: 9781785888359