Retrieving, Processing, and Storing Data
In this age of information and AI, data is being generated at a rapid pace from every aspect of life in various shapes and formats. We are generating it from websites, IoT sensors, by email, FTP, databases, or we can collect it ourselves from laboratory experiments, election studies, marketing surveys, and social surveys. As a data professional, you should have the ability to work with a wide variety of data sources and formats. In this chapter, we will focus on the methods and techniques to retrieve, process, and store such large and heterogeneous data effectively.
The chapter provides a comprehensive overview of acquiring data in commonly used formats such as CSV, Excel, JSON, HDF5, Parquet, and Pickle. In many real-world applications, data needs to be stored or saved either before analysis, for later use, or after analysis, for reporting and sharing. In modern data workflows, analysts, and engineers often work with data stored across...