Reader small image

You're reading from  Learning Spark SQL

Product typeBook
Published inSep 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781785888359
Edition1st Edition
Languages
Right arrow

Understanding data sources in Spark applications


Spark can to many different data sources, files, and SQL and NoSQL databases. Some of the more popular data sources include files (CSV, JSON, Parquet, AVRO), MySQL, MongoDB, HBase, and Cassandra.

In addition, it can also connect to special purpose engines and data sources, such as ElasticSearch, Apache Kafka, and Redis. These engines enable specific functionality in Spark applications such as search, streaming, caching, and so on. For example, enables deployment of cached machine learning models in high performance applications. We discuss more on Redis-based application deployment in Chapter 12, Spark SQL in Large-Scale Application Architectures. Kafka is extremely popular in Spark streaming applications, and we will cover more details on Kafka-based streaming applications in Chapter 5, Using Spark SQL in Streaming Applications, and Chapter 12Spark SQL in Large-Scale Application Architectures. The DataSource API enables connectivity...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Learning Spark SQL
Published in: Sep 2017Publisher: PacktISBN-13: 9781785888359