Introduction to data transformation
When working with data science, it is important to ensure that your dataset has been cleaned of all the messy data, that is, all of the missing data has been handled correctly. Otherwise, you could end up getting unexpected results when summarizing your dataset and deriving insights. For example, if you want to calculate an average but haven't cleaned up missing data that might be arbitrarily represented as a specific number, such as -999, you could calculate an incorrect aggregation (such as an average) that will include that specific number, -999. Having a good understanding of that arbitrary convention (with -999 representing the missing data) will allow you to exclude that number from any calculation to avoid reporting incorrect aggregations. A good understanding of how to handle messy and missing data in pandas will increase the confidence and accuracy of your analysis.