Clean Data: Tips, Tricks, and Techniques [Video]

More Information
Learn
  • Learn to spot outliers in your data and analyze sensor data to find omissions.
  • Tokenize data and clean stop words to make it more robust.
  • Analyze and extract features from unstructured text data.
  • Clean and handle duplicates in your big data analytics and statistics.
  • Find and remove global row duplicates.
  • Learn to handle data cleaning for numbers.
About

"Give me six hours to chop down a tree and I will spend the first four sharpening the axe"? Do you apply the same principle when doing Data Science?

Effective data cleaning is one of the most important aspects of good Data Science and involves acquiring raw data and preparing it for analysis, which, if not done effectively, will not give you the accuracy or results that you're looking to achieve, no matter how good your algorithm is.
Data Cleaning is the hardest part of big data and ML. To address this matter, this course will equip you with all the skills you need to clean your data in Python, using tried and tested techniques. You'll find a plethora of tips and tricks that will help you get the job done, in a smart, easy, and efficient way.

All the code and supporting files for this course are available on Github at https://github.com/PacktPublishing/Clean-Data-Tips-Tricks-and-Techniques

Style and Approach

Each section teaches one particular aspect of the overall topic and its section title reflects that. Each video teaches a subtopic in a hands-on way with a practical demonstration, along with explanation and a discussion of how it works and how to use it.

Features
  • Sift through your data to identify issues such as outliers, missing values, and duplicate rows,
  • Deal with unstructured data in the most effective ways and hone your skills in transforming and combining your data,
  • Use Python to check your data for consistency and get rid of any missing or duplicated data.
Course Length 1 hour 31 minutes
ISBN 9781789808902
Date Of Publication 31 Oct 2018

Authors

Tomasz Lelek

Tomasz Lelek is a software engineer who programs mostly in Java and Scala. He has worked with the core Java language for the past six years. He has developed multiple production Java software projects that work in a reactive way.

He is passionate about nearly everything associated with software development and believes that we should always try to consider different solutions and approaches before solving a problem. Recently, he was a speaker at conferences in Poland, at JDD (Java Developers Day), and at Krakow Scala User Group. He has also conducted a live coding session at Geecon Conference.

He is a co-founder of initLearn, an e-learning platform that was built with the Java language.

He has also written articles about everything related to the Java world.