Learn to use OpenRefine to effectively manage your data using Packt's new book and eBook

October 2013 | Open Source

Packt is pleased to announce the release of Using OpenRefine, a hands-on, recipe-styled book that features all aspects of OpenRefine, starting from data analysis and error fixing to linking your dataset to the Web. This book is now available in all the popular formats-eBook, Kindle, and selected library formats. The book consists of 114 pages and is priced at $34.99, while the eBook is priced at $17.84.

About the authors:

Ruben Verborgh is a PhD researcher in Semantic Hypermedia. In 2011, he launched the Free Your Metadata project together with Seth van Hooland and Max De Wilde, which aims to evangelize the importance of bringing your data on the Web. He currently works at Multimedia Lab, a research group of iMinds, Ghent University, Belgium, in the domains of Semantic Web, Web APIs, and Adaptive Hypermedia. 

Max De Wilde is a PhD researcher in Natural Language Processing and a teaching assistant at the Université Libre de Bruxelles (ULB), in the department of Information and Communication Sciences. He holds a Master's degree in Linguistics from the ULB and an Advanced Master's in Computational Linguistics from the University of Antwerp. He works as a full-time assistant and supervises practical classes for Master's level students in a number of topics, including database quality, document management, and architecture of information systems. 

OpenRefine is a standalone open source desktop application for data cleanup and transformation to other formats. It is similar to spreadsheet applications (and can work with spreadsheet file formats), however, it behaves more like a database. 

Using OpenRefine takes readers on a practical tour of all the handy features of this well-known data transformation tool. This book covers all the necessary skills to handle any large dataset and turn it into high quality data for the Web. After learning how to analyze data and spot issues, the reader is shown how to solve them in order to obtain a clean dataset. Messy and inconsistent data is recovered through advanced techniques such as automated clustering. The book shows extract links from keywords and full-text fields using reconciliation and named-entity extraction. 

The following topics are covered in this book:

Chapter 1: Diving Into OpenRefine
Chapter 2: Analyzing and Fixing Data
Chapter 3: Advanced Data Operations
Chapter 4: Linking Datasets
Appendix: Regular Expressions and GREL 

Using OpenRefine is more than a manual: it's a guide stuffed with tips and tricks to get the best out of data. This book is targeted at anyone who works on or handles a large amount of data. No prior knowledge of OpenRefine is required.

 

Using OpenRefine
The essential OpenRefine guide that takes you from data analysis and error fixing to linking your dataset to the Web

For more information, please visit book page

Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software