Building the graph
In the preceding sections, you installed GraphFrames and built the DataFrames required for the graph; now, you can start building the graph itself.
How to do it...
The first component of this recipe involves importing the necessary libraries, in this case, the PySpark SQL functions (pyspark.sql.functions) and GraphFrames (graphframes). In the previous recipe, we had created the src and dst columns as part of creating the deptsDelays_geo DataFrame. When creating edges within GraphFrames, it is specifically looking for the src and dst columns to create the edges as per edges. Similarly, GraphFrames is looking for the column id to represent the graph vertex (as well as join to the src and dst columns). Therefore, when creating the vertexes, vertices, we rename the IATA column to id:
from pyspark.sql.functions import *
from graphframes import *
# Create Vertices (airports) and Edges (flights)
vertices = airports.withColumnRenamed("IATA", "id").distinct()
edges = deptsDelays_geo...