Cleansing and transforming data
Data, in their raw form, are usually not structured for analysis and may have undesired elements in them as well. Hence, it is important that the data are scrubbed of what you do not desire and are shaped into an analytics-friendly form.
Cleansing
Data stored on systems often have problems that don’t make them reporting-ready. Some of these problems, which are quite common across organizations, are the following:
- Inconsistency: This is where data in the same table are entered in different ways by users, or different formats are used to enter the data; for example, dates are entered using the YYYYMMDD format by some and MM/DD/YYYY by others.
- Incomplete data: This is where data are entered only into some fields for most records or where entire important fields are left empty.
- Incorrect data: This is where the wrong information is entered by the users.
- Data duplication: This is where the same data is entered multiple times...