Chapter 2. Working with Data
Building real world's data analytics requires accurate data. In this chapter we discuss how to obtain, clean, normalize, and transform raw data into a standard format such as Comma-Separated Values (CSV) or JavaScript Object Notation (JSON) using OpenRefine.
In this chapter we will cover:
- Datasource - Open data 
- Text files 
- Excel files 
- SQL databases 
- NoSQL databases 
- Multimedia 
- Web scraping 
 
- Data scrubbing - Statistical methods 
- Text parsing 
- Data transformation 
 
- Data formats - CSV 
- JSON 
- XML 
- YAML 
 
- Getting started with OpenRefine 
 
                                             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
     
         
                 
                 
                 
                 
                 
                 
                 
                 
                