Creating a DataFrame from CSV
In this recipe, we'll look at how to create a new DataFrame from a delimiter-separated values file.
Note
The code for this recipe can be found at https://github.com/arunma/ScalaDataAnalysisCookbook/blob/master/chapter1-spark-csv/src/main/scala/com/packt/scaladata/spark/csv/DataFrameCSV.scala.
How to do it...
This recipe involves four steps:
- Add the spark-csvsupport to our project.
- Create a Spark Config object that gives information on the environment that we are running Spark in.
- Create a Spark context that serves as an entry point into Spark. Then, we proceed to create an SQLContextfrom the Spark context.
- Load the CSV using the SQLContext.
- CSV support isn't first-class in Spark, but it is available through an external library from Databricks. So, let's go ahead and add that to our build.sbt.After adding the spark-csvdependency, our completebuild.sbtlooks like this:organization := "com.packt" name := "chapter1-spark-csv" scalaVersion... 
 
                                             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
     
         
                 
                 
                 
                 
                 
                 
                 
                 
                