WHAT IS DATA LITERACY?
There are various definitions of data literacy that involve concepts such as data, meaningful information, decision-making, drawing conclusions, chart reading, and so forth. According to Wikipedia, which we’ll use as a starting point, data literacy is defined as follows:
Data literacy is the ability to read, understand, create, and communicate data as information. Much like literacy as a general concept, data literacy focuses on the competencies involved in working with data. It is, however, not similar to the ability to read text since it requires certain skills involving reading and understanding data. (Wikipedia, 2023)
Data literacy encompasses many topics, starting with analyzing data that is often in the form of a CSV (comma-separated values) file. The quality of the data in a dataset is of paramount importance: high data quality enables you to make more reliable inferences regarding the nature of the data. Indeed, high data quality is a requirement for fields such as machine learning, scientific experiments, and so forth. However, keep in mind that you might face various challenges regarding robust data, such as:
• a limited amount of available data
• costly acquisition of relevant data
• difficulty in generating valid synthetic data
• availability of domain experts
Depending on the domain, the cost of data cleaning can involve months of work at a cost of millions of dollars. For instance, identifying images of cats and dogs is essentially trivial, whereas identifying potential tumors in x-rays is much more costly and requires highly skilled individuals.
With all the preceding points in mind, let’s take a look at EDA (exploratory data analysis), which is the topic of the next section.