The creation of a DataFrame can be done in several ways:
- By executing SQL queries
- Loading external data such as Parquet, JSON, CSV, text, Hive, JDBC, and so on
- Converting RDDs to data frames
A DataFrame can be created by loading a CSV file. We will look at a CSV statesPopulation.csv, which is being loaded as a DataFrame.
The CSV has the following format of US states populations from years 2010 to 2016.
| State | Year | Population |
| Alabama | 2010 | 4785492 |
| Alaska | 2010 | 714031 |
| Arizona | 2010 | 6408312 |
| Arkansas | 2010 | 2921995 |
| California | 2010 | 37332685 |
Since this CSV has a header, we can use it to quickly load into a DataFrame with an implicit schema detection.
scala> val statesDF = spark.read.option("header", "true").option("inferschema", "true").option("sep", ",").csv("statesPopulation...