Google Refine… pretty awesome.

Google RefineOk, here’s another shout out to the guys and gals in Mountain View, CA.  For anyone who’s ever had processing scripts fail because someone on the data entry side of the house didn’t use caps, misplaced a decimal point, mixed up the letters of an acronym or emisspelled siphonophore (the most obvious example), Google Refine might be a God-send.

Basically its a web-based spreadsheet program in the spirit of the Google Docs Spreadsheet program but with a couple of serious tweaks.

  1. With one click Google Refine shows the value of a column and number of times that value appears in a row.  Here’s an example: Lets say you have an event logger for recording ROV observations.  On 999 rows the user enters “fish”, on 123 rows the user enters ”fsh” on 12 rows the user enters ”FSH” and on 54 rows the user enters ”Fish”.  Using Google Refine you would quickly see all these variations and be able to quickly make that column in all the rows read “fish”.
  2. For numerical data Google Refine can perform some basic statistical analysis to find outliers that may have been caused by misplacing a decimal or putting in text instead of numbers.
  3. Although it web-based, Google Refine runs as a local application. Meaning that unlike Google Docs, you don’t have to upload your data to Google and it works without an Internet connection.  Both very important since… you might be working with proprietary data, on a ship, with a 128kB internet connection.  It’s also cross-platform (Windows, Linux and OS X).

Here’s a screencast that will probably do a better job explaining than I ever could.

Thanks to Eric Martin from MBARI for pointing me to this.

I hope this helps.

Want to talk about this some more? Please post your questions in the Forums

One thought on “Google Refine… pretty awesome.

Leave a Reply