Combining data from multiple sources
Combining data from multiple tables from within a single system and across systems is a common data preparation activity. It is advisable and is a general rule of thumb to first have the data from each table cleansed as much as possible before you attempt to combine it with another. Of course, there can always be exceptions to this rule, such as when you want to combine multiple similar files, in which case you must first combine and then apply the same cleansing rules across all these files in one go.
In the Cleansing and transforming data section, while cleansing and transforming, we inadvertently combined data from multiple tables: Product
, ProductSubcategory
, and ProductCategory
. Having related tables from a transactional database, such as SQL Server, provides the option of automatically combining the tables based on foreign keys (Figure 3.1). Hence, we did this without having to think too much by selecting columns from the related tables...