Reader small image

You're reading from  Learn Python by Building Data Science Applications

Product typeBook
Published inAug 2019
Reading LevelIntermediate
PublisherPackt
ISBN-139781789535365
Edition1st Edition
Languages
Tools
Right arrow
Authors (2):
Philipp Kats
Philipp Kats
author image
Philipp Kats

Philipp Kats is a researcher at the Urban Complexity Lab, NYU CUSP, a research fellow at Kazan Federal University, and a data scientist at StreetEasy, with many years of experience in software development. His interests include data analysis, urban studies, data journalism, and visualization. Having a bachelor's degree in architectural design and a having followed the rocky path (at first) of being a self-taught developer, Philipp knows the pain points of learning programming and is eager to share his experience.
Read more about Philipp Kats

David Katz
David Katz
author image
David Katz

David Katz is a researcher and holds a Ph.D. in mathematics. As a mathematician at heart, he sees code as a tool to express his questions. David believes that code literacy is essential as it applies to most disciplines and professions. David is passionate about sharing his knowledge and has 6 years of experience teaching college and high school students.
Read more about David Katz

View More author details
Right arrow

Understanding casualties

Casualties are probably the most verbose and non-structured columns of the dataset. It will be extremely hard to make use of all the nuances of information here, so again—perhaps we can simplify the task, getting only the things we really want to use. Perhaps we can use code words to extract any digit preceding them; for example, ([\d|,]+)\s*dead should extract any consecutive digits or commas before the word 'dead'. We can define similar patterns for all types of casualties and loop over all of them, testing the patterns. There are, unfortunately, many keywords that mean the same thing ('captured', 'prisoners', and many more), so we have to make them optional, similar to the preceding month expression:

digit_pattern = '([\d|\,]+)(?:\[\d+\])?\s*(?:{words})'

keywords = { 'killed': ['dead&apos...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Learn Python by Building Data Science Applications
Published in: Aug 2019Publisher: PacktISBN-13: 9781789535365

Authors (2)

author image
Philipp Kats

Philipp Kats is a researcher at the Urban Complexity Lab, NYU CUSP, a research fellow at Kazan Federal University, and a data scientist at StreetEasy, with many years of experience in software development. His interests include data analysis, urban studies, data journalism, and visualization. Having a bachelor's degree in architectural design and a having followed the rocky path (at first) of being a self-taught developer, Philipp knows the pain points of learning programming and is eager to share his experience.
Read more about Philipp Kats

author image
David Katz

David Katz is a researcher and holds a Ph.D. in mathematics. As a mathematician at heart, he sees code as a tool to express his questions. David believes that code literacy is essential as it applies to most disciplines and professions. David is passionate about sharing his knowledge and has 6 years of experience teaching college and high school students.
Read more about David Katz