Search icon
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Using OpenRefine

You're reading from  Using OpenRefine

Product type Book
Published in Sep 2013
Publisher Packt
ISBN-13 9781783289080
Pages 114 pages
Edition 1st Edition
Languages

Recipe 3 – clustering similar cells


Thanks to OpenRefine, you don't have to worry about inconsistencies that slipped in during the creation process of your data. If you have been investigating the various categories after splitting the multi-valued cells, you might have noticed that the same category labels do not always have the same spelling. For instance, there is Agricultural Equipment and Agricultural equipment (capitalization differences), Costumes and Costume (pluralization differences), and various other issues. The good news is that these can be resolved automatically; well, almost. But, OpenRefine definitely makes it a lot easier.

The process of finding the same items with slightly different spelling is called clustering. After you have split multi-valued cells, you can click on the Categories dropdown and navigate to Edit cells | Cluster and edit…. OpenRefine presents you with a dialog box where you can choose between different clustering methods, each of which can use various...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}