Sample questions
Question 1:
Which of the following code blocks returns a DataFrame showing the mean of the salary column of the df DataFrame, grouped by the department column?
df.groupBy("department").agg(avg("salary"))df.groupBy(col(department).avg())df.groupBy("department").avg(col("salary"))df.groupBy("department").agg(average("salary"))
Question 2:
Which of the following code blocks returns unique values across all values in the state and department columns in df?
df.select(state).join(transactionsDf.select('department'),col(state)==col('department'), 'outer').show()df.select(col('state'),col('department')).agg({'*': 'count'}).show()df.select('state', 'department').distinct().show()df.select('state').union(df.select('department')).distinct().show()
...