Exercises
Practice building and evaluating machine learning models in scikit-learn with the following exercises:
- Build a clustering model to distinguish between red and white wine by their chemical properties:
a) Combine the red and white wine datasets (
data/winequality-red.csvanddata/winequality-white.csv, respectively) and add a column for the kind of wine (red or white).b) Perform some initial EDA.
c) Build and fit a pipeline that scales the data and then uses k-means clustering to make two clusters. Be sure not to use the
qualitycolumn.d) Use the Fowlkes-Mallows Index (the
fowlkes_mallows_score()function is insklearn.metrics) to evaluate how well k-means is able to make the distinction between red and white wine.e) Find the center of each cluster.
- Predict star temperature:
a) Using the
data/stars.csvfile, perform some initial EDA and then build a linear regression model of all the numeric columns to predict the temperature of the star.b) Train the model on 75% of...