Reader small image

You're reading from  Learn Python by Building Data Science Applications

Product typeBook
Published inAug 2019
Reading LevelIntermediate
PublisherPackt
ISBN-139781789535365
Edition1st Edition
Languages
Tools
Right arrow
Authors (2):
Philipp Kats
Philipp Kats
author image
Philipp Kats

Philipp Kats is a researcher at the Urban Complexity Lab, NYU CUSP, a research fellow at Kazan Federal University, and a data scientist at StreetEasy, with many years of experience in software development. His interests include data analysis, urban studies, data journalism, and visualization. Having a bachelor's degree in architectural design and a having followed the rocky path (at first) of being a self-taught developer, Philipp knows the pain points of learning programming and is eager to share his experience.
Read more about Philipp Kats

David Katz
David Katz
author image
David Katz

David Katz is a researcher and holds a Ph.D. in mathematics. As a mathematician at heart, he sees code as a tool to express his questions. David believes that code literacy is essential as it applies to most disciplines and professions. David is passionate about sharing his knowledge and has 6 years of experience teaching college and high school students.
Read more about David Katz

View More author details
Right arrow

Exploring feature engineering

Now that we made a system to fairly compare models with no fear of overfitting, let's think about how we can improve our model. One way would be to create new features that might add more context. One way to go about this is to create features of our own, for example, calculate a proportion of armies on different sides or the absolute difference in the number of soldiers—we can't say in advance which would work better. Let's try it out with the help of the following code:

  1. First, we'll create a ratio of soldiers on either side:
data['infantry_ratio'] = data['allies_infantry'] / data['axis_infantry']
cols.append('infantry_ratio')
  1. Now, we won't do that for tanks, planes, and so on, as the numbers here are very small and we'll have to deal with division by zero. Instead, we...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Learn Python by Building Data Science Applications
Published in: Aug 2019Publisher: PacktISBN-13: 9781789535365

Authors (2)

author image
Philipp Kats

Philipp Kats is a researcher at the Urban Complexity Lab, NYU CUSP, a research fellow at Kazan Federal University, and a data scientist at StreetEasy, with many years of experience in software development. His interests include data analysis, urban studies, data journalism, and visualization. Having a bachelor's degree in architectural design and a having followed the rocky path (at first) of being a self-taught developer, Philipp knows the pain points of learning programming and is eager to share his experience.
Read more about Philipp Kats

author image
David Katz

David Katz is a researcher and holds a Ph.D. in mathematics. As a mathematician at heart, he sees code as a tool to express his questions. David believes that code literacy is essential as it applies to most disciplines and professions. David is passionate about sharing his knowledge and has 6 years of experience teaching college and high school students.
Read more about David Katz