Packt+ | Advance your knowledge in tech

You're reading from Data Literacy With Python A Comprehensive Guide to Understanding and Analyzing Data with Python

Product type Paperback

Published in Jul 2024

Publisher Mercury_Learning

ISBN-13 9781836640097

Length 271 pages

Edition 1st Edition

Languages

Python

Tools

Matplotlib

Concepts

Data Analysis

Authors (2):

Mercury Learning and Information

Oswald Campesato

View More author details

Table of Contents (9) Chapters

Preface

1. Chapter 1: Working With Data

2. Chapter 2: Outlier and Anomaly Detection FREE CHAPTER

3. Chapter 3: Cleaning Datasets

4. Chapter 4: Introduction to Statistics

5. Chapter 5: Matplotlib and Seaborn

6. Index

Appendix A: Introduction to Python

1. Appendix B: Introduction to Pandas

WHAT IS DATA LITERACY?

There are various definitions of data literacy that involve concepts such as data, meaningful information, decision-making, drawing conclusions, chart reading, and so forth. According to Wikipedia, which we’ll use as a starting point, data literacy is defined as follows:

Data literacy is the ability to read, understand, create, and communicate data as information. Much like literacy as a general concept, data literacy focuses on the competencies involved in working with data. It is, however, not similar to the ability to read text since it requires certain skills involving reading and understanding data. (Wikipedia, 2023)

Data literacy encompasses many topics, starting with analyzing data that is often in the form of a CSV (comma-separated values) file. The quality of the data in a dataset is of paramount importance: high data quality enables you to make more reliable inferences regarding the nature of the data. Indeed, high data quality is a requirement for fields such as machine learning, scientific experiments, and so forth. However, keep in mind that you might face various challenges regarding robust data, such as:

• a limited amount of available data

• costly acquisition of relevant data

• difficulty in generating valid synthetic data

• availability of domain experts

Depending on the domain, the cost of data cleaning can involve months of work at a cost of millions of dollars. For instance, identifying images of cats and dogs is essentially trivial, whereas identifying potential tumors in x-rays is much more costly and requires highly skilled individuals.

With all the preceding points in mind, let’s take a look at EDA (exploratory data analysis), which is the topic of the next section.

The rest of the chapter is locked

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

You're reading from Data Literacy With Python A Comprehensive Guide to Understanding and Analyzing Data with Python

Table of Contents (9) Chapters

Authors (2)

Personalised recommendations for you

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access