Paid-for versus free data-wrangling tools
There are many different data-wrangling tools available on the market nowadays that we can use and extract relevant and meaningful data.
These are some free tools:
- Tabula: This tool extracts data stored in PDFs into CSV or Excel files. This tool is available for free at https://github.com/tabulapdf/tabula.
- Open Refine: This is an open source Google tool that wrangles inconsistent and messy data from one format to another and can also extend data by connecting it with the web. Using this tool is the best and easiest way to identify inconsistencies in the data and can be found in the following locations:
- https://openrefine.org/
- GitHub: https://github.com/OpenRefine/OpenRefine
- R packages: The R language has many packages, or pre-programmed functions, that help with data wrangling. Two of the most common and efficient packages in R are
dlpr
andtidyr
:- Some of the functions in
dlpr
are as follows:mutate()
adds new variables that...
- Some of the functions in
The rest of the chapter is locked