Reader small image

You're reading from  Julia for Data Science

Product typeBook
Published inSep 2016
Reading LevelBeginner
PublisherPackt
ISBN-139781785289699
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
Anshul Joshi
Anshul Joshi
author image
Anshul Joshi

Anshul Joshi is a data scientist with experience in recommendation systems, predictive modeling, neural networks, and high performance computing. His research interests encompass deep learning, artificial intelligence, and computational physics. Most of the time, he can be caught exploring GitHub or trying anything new he can get his hands on. You can also follow his personal blog.
Read more about Anshul Joshi

Right arrow

What is data munging?


Munging comes from the term "munge," which was coined by some students of Massachusetts Institute of Technology, USA. It is considered one of the most essential parts of the data science process; it involves collecting, aggregating, cleaning, and organizing the data to be consumed by the algorithms designed to make discoveries or to create models. This involves numerous steps, including extracting data from the data source and then parsing or transforming the data into a predefined data structure. Data munging is also referred to as data wrangling.

The data munging process

So what's the data munging process? As mentioned, data can be in any format and the data science process may require data from multiple sources. This data aggregation phase includes scraping it from websites, downloading thousands of .txt or .log files, or gathering the data from RDBMS or NoSQL data stores.

It is very rare to find data in a format that can be used directly by the data science process...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Julia for Data Science
Published in: Sep 2016Publisher: PacktISBN-13: 9781785289699

Author (1)

author image
Anshul Joshi

Anshul Joshi is a data scientist with experience in recommendation systems, predictive modeling, neural networks, and high performance computing. His research interests encompass deep learning, artificial intelligence, and computational physics. Most of the time, he can be caught exploring GitHub or trying anything new he can get his hands on. You can also follow his personal blog.
Read more about Anshul Joshi