Reader small image

You're reading from  Machine Learning with scikit-learn Quick Start Guide

Product typeBook
Published inOct 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781789343700
Edition1st Edition
Languages
Right arrow
Author (1)
Kevin Jolly
Kevin Jolly
author image
Kevin Jolly

Kevin Jolly is a formally educated data scientist with a master's degree in data science from the prestigious King's College London. Kevin works as a statistical analyst with a digital healthcare start-up, Connido Limited, in London, where he is primarily involved in leading the data science projects that the company undertakes. He has built machine learning pipelines for small and big data, with a focus on scaling such pipelines into production for the products that the company has built. Kevin is also the author of a book titled Hands-On Data Visualization with Bokeh, published by Packt. He is the editor-in-chief of Linear, a weekly online publication on data science software and products.
Read more about Kevin Jolly

Right arrow

Implementing the k-means algorithm in scikit-learn

Now that you understand how the k-means algorithm works internally, we can proceed to implement it in scikit-learn. We are going to work with the same fraud detection dataset that we used in all of the previous chapters. The key difference is that we are going to drop the target feature, which contains the labels, and identify the two clusters that are used to detect fraud.

Creating the base k-means model

In order to load the dataset into our workspace and drop the target feature with the labels, we use the following code:

import pandas as pd
#Reading in the dataset
df = pd.read_csv('fraud_prediction.csv')
#Dropping the target feature & the index
df = df.drop([&apos...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Machine Learning with scikit-learn Quick Start Guide
Published in: Oct 2018Publisher: PacktISBN-13: 9781789343700

Author (1)

author image
Kevin Jolly

Kevin Jolly is a formally educated data scientist with a master's degree in data science from the prestigious King's College London. Kevin works as a statistical analyst with a digital healthcare start-up, Connido Limited, in London, where he is primarily involved in leading the data science projects that the company undertakes. He has built machine learning pipelines for small and big data, with a focus on scaling such pipelines into production for the products that the company has built. Kevin is also the author of a book titled Hands-On Data Visualization with Bokeh, published by Packt. He is the editor-in-chief of Linear, a weekly online publication on data science software and products.
Read more about Kevin Jolly