Data preprocessing and feature engineering
Data mining, a buzzword in the 1990s, is the predecessor of data science (the science of data). One of the methodologies popular in the data mining community is called the Cross-Industry Standard Process for Data Mining (CRISP-DM) (https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining). CRISP-DM was created in 1996, and machine learning basically inherits its phases and general framework.
CRISP-DM consists of the following phases, which aren't mutually exclusive and can occur in parallel:
- Business understanding: This phase is often taken care of by specialized domain experts. Usually, we have a businessperson formulate a business problem, such as selling more units of a certain product.
- Data understanding: This is also a phase that may require input from domain experts; however, often a technical specialist needs to get involved more than in the business understanding phase. The...