Problem 2 – using Python in biological data analysis
For this particular problem, we’ll be using the Breast_cancer_data.csv file, which can be found on Kaggle (https://www.kaggle.com/nsaravana/breast-cancer?select=breast-cancer.csv). The file has also been uploaded to this book’s GitHub repository.
When looking at data, sometimes, we want to make comparisons with the data we currently have, or we want to use it for predictions in machine learning. In this case, we’re going to look at how we can present another type of plot, the scatterplot, using two specific values of columns in our dataset.
Let’s imagine you received this data and already determined that your mean perimeter and mean textures are better predictors than the other values in the columns. Your goal now is to create an algorithm that will analyze the values for those two columns by comparing them using a scatterplot. Our goal is to get that scatterplot. For additional analysis and...