Delta unifies all types of data
In this section, we will give you some examples of how to ingest (read) different data types into a Spark DataFrame and save it all in the Delta format in the data lake using a common API, df.write.format
. ("delta") and this curated data is the single source of truth for all BI and AI use cases, as shown in the following diagram:
The three main data types are structured, semi-structured, and unstructured, and Spark native APIs (along with partner-aided connectors) can ingest data from a wide variety of data sources to create a curated view that all consumers can access, depending on their privilege levels. In some cases, different Lines Of Business (LOBs) may want to have their own mini data lake, and that is perfectly fine as it follows a hub-spoke model of a central data repository for common access and specialized ones with tighter guardrails...