Types of data sources
There is an inexhaustible list of possible data sources for pipelines, and as the industry continues to evolve, new kinds of data sources will become available. Currently, data files or databases—whether structured, semi-structured, or unstructured—can all be used as both data sources and data sink locations. However, it is important to keep in mind that the more flexible the data definitions are for the data used in your pipeline, the more difficult it will be to validate the data produced by the pipeline. In this chapter, we are going to use some of the most frequently used source systems in the industry, as follows:
- CSV/Excel files
- Parquet files
- APIs
- RDBMS databases
- HTML