Reader small image

You're reading from  Mastering pandas. - Second Edition

Product typeBook
Published inOct 2019
Reading LevelIntermediate
Publisher
ISBN-139781789343236
Edition2nd Edition
Languages
Tools
Right arrow
Author (1)
Ashish Kumar
Ashish Kumar
author image
Ashish Kumar

Ashish Kumar is a seasoned data science professional, a publisher author and a thought leader in the field of data science and machine learning. An IIT Madras graduate and a Young India Fellow, he has around 7 years of experience in implementing and deploying data science and machine learning solutions for challenging industry problems in both hands-on and leadership roles. Natural Language Procession, IoT Analytics, R Shiny product development, Ensemble ML methods etc. are his core areas of expertise. He is fluent in Python and R and teaches a popular ML course at Simplilearn. When not crunching data, Ashish sneaks off to the next hip beach around and enjoys the company of his Kindle. He also trains and mentors data science aspirants and fledgling start-ups.
Read more about Ashish Kumar

Right arrow

Data analysis and preprocessing using pandas

In this section, we will utilize pandas to do some analysis and preprocessing of the data before submitting it as input to scikit-learn.

Examining the data

To start our preprocessing of the data, let's read in the training dataset and examine what it looks like.

Here, we read the training dataset into a pandas DataFrame and display the first rows:

In [2]: import pandas as pd
        import numpy as np
# For .read_csv, always use header=0 when you know row 0 is the header row
       train_df = pd.read_csv('csv/train.csv', header=0)
In [3]: train_df.head(3)

The output is as follows:

Hence, we can see the various features...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Mastering pandas. - Second Edition
Published in: Oct 2019Publisher: ISBN-13: 9781789343236

Author (1)

author image
Ashish Kumar

Ashish Kumar is a seasoned data science professional, a publisher author and a thought leader in the field of data science and machine learning. An IIT Madras graduate and a Young India Fellow, he has around 7 years of experience in implementing and deploying data science and machine learning solutions for challenging industry problems in both hands-on and leadership roles. Natural Language Procession, IoT Analytics, R Shiny product development, Ensemble ML methods etc. are his core areas of expertise. He is fluent in Python and R and teaches a popular ML course at Simplilearn. When not crunching data, Ashish sneaks off to the next hip beach around and enjoys the company of his Kindle. He also trains and mentors data science aspirants and fledgling start-ups.
Read more about Ashish Kumar