Bulk Geo-Tagging of Images Using SCS Timestamped NMEA GGA, HDT and ROV Data

The idea idea of Geo-Tagging or Geo-Referencing images is straight forward; embed time and position information into an image file so that you know when and where the image was taken without having to keep track of the information externally.  It is a technique that is widely used among professional and amateur photographers alike to show where their photos were taken.  Technically speaking, geo-tagging is the process of populating storage bins in the image file’s metadata header with GPS time, longitude, latitude, heading (true or magnetic), and altitude (above or below sea level).  This metadata header is only present on PSD, JPEG and TIFF image types so the techniques discussed here will only work on those files.

Why Geo-tag?

For many ships, data is primarily made up of ASCII data received on a network or serial port (i.e. GPS, Gyro, Met Sensors, etc) or specialized binary data (ADCP, Multibeam, SBP, etc).  However on an increasing number of ships (i.e. the NOAA Ship Okeanos Explorer, the USGC Icebreaker Healy, and the E/V Nautilus) images from fixed cameras that were once novel are becoming standard datasets.  For the same reason we add a timestamp to our serial data strings to provide them temporal context, we should be adding something to this image dataset so that we can figure out when and where it came from.

There are 3 major standards in the image metadata world: EXIF, XMP and IPTC.  All the standards have ways to embed copyright, creator, creation time and a description but only EXIF goes beyond that and handles among other things, image orientation (landscape/profile) and GPS location.  Thus to Geo-tag the images we need just worry about populating the EXIF metadata bins.

A Real-World Scenario:

During one cruise aboard the NOAA Ship Okeanos Explorer the ROV team saved ~3400 high-resolution JPEG images.  The scientists who participated on the cruise needed to know where these images were taken (in x,y,z and heading).

The NOAA Ship Okeanos Explorer records High-Definition (HD) video from their two Remotely Operated Vehicles (ROVs) and four ship-mounted HD pan/tilt/zoom (PTZ) cameras.  The Okeanos does not use Blu-ray discs or HDCAM tape media to store video but instead records all video as files onto dual 42TB RAID arrays.  Each filename includes the date and time of the first frame as well as the camera source and a brief description.  The date/time portion are pulled from a dedicated GPS-sync’d SMPTE time code generator and automatically added to the filenames by the video reordering system.  The camera source and description are generated manually by the video operator.  An example of this file naming syntax: 20100711_05h21m23s02_ROVHD_CRABS.mov.  From these HD video files, the ROV video team selects individual frames and saves them as high-resolution JPEG images.  The image filenames match the original video filename syntax but the date/time portions of the filenames are manually corrected to correspond with the timecode for individual video frame.  i.e. 20100711_05h24m53s19_ROVHD_CRABS.jpg.

20100711_05h24m53s19_ROVHD_CRABS.jpg

The lat,lon positions of the ROVs are generated as a NMEA0183 GGA string. The ROV heading, depth, pitch, roll, altitude, etc is generated as a proprietary NMEA formatted sentence created by the ROV control software vendor.  All of this data is logged using the NOAA developed Shipboard Computing System (SCS).  The SCS system prepended each received line of data with a date and time (“mm/dd/yyyy,hh:mm:ss.sss” UTC timezone).

Here are some samples of what the raw data looks like:

08/01/2010,00:20:41.900,$GPGGA,002040.60,0441.78672,N,12650.79987,E,0,00,0.0,0.00,M,,,,*0B
08/01/2010,00:20:42.010,$GPGGA,002040.60,0441.78672,N,12650.79987,E,0,00,0.0,0.00,M,,,,*0B
08/01/2010,00:20:42.119,$GPGGA,002040.60,0441.78672,N,12650.79987,E,0,00,0.0,0.00,M,,,,*0B
08/01/2010,00:20:42.260,$GPGGA,002040.60,0441.78672,N,12650.79987,E,0,00,0.0,0.00,M,,,,*0B

08/01/2010,00:16:21.922,$PGSSRVR,359.8,0.9,0.7,0.0,m,0.0,m,1,0.0,0.0*57
08/01/2010,00:16:22.422,$PGSSRVR,359.4,1.6,0.5,0.0,m,0.0,m,1,0.0,0.0*57
08/01/2010,00:16:22.922,$PGSSRVR,359.0,1.2,0.1,0.0,m,0.0,m,1,0.0,0.0*53
08/01/2010,00:16:23.437,$PGSSRVR,358.3,0.8,0.6,0.0,m,0.0,m,1,0.0,0.0*5D

Required Tools

To read/write the metadata I needed a command-line tool that I could use as part of a script.  After Google-ing “EXIF command line tool” I found exiv2, a simple, open-source EXIF metadata tool, perfect!

I also needed to do some simple file querying and row formatting. For this I relied on my good friends grep and awk. I also needed to use the BASH shell scripting language to stitch everything together. My solution was developed on the Mac OS X platform but ultimately it was ported to work on a Linux-based server.   At the time of this writing, I haven’t looked to see if this will work on Windows using CYGWIN but I don’t see a reason why not.

The Solution

UPDATED 2011/01/12: After using this script I realized many web-gallery sites such as Piwigo and Gallery3 require that the Exif.Photo.DateTimeOriginal bin must also be set for images to be correctly sorted chronologically.  I’ve uploaded a new version on the final script that sets this bin to the same time as SCSTIMESTAMP.

The image filenames alone provided a lot of information such as date, time and camera source as well as a brief description.  Using that time stamp I could search the position and attitude data but before I could perform the file query I first had to convert the timestamp format in the file name to match the SCS format.  For simplicity’s sake I did not worry about the decimal second data.  The following awk command handled the conversion:

echo "20100711_05h24m53s19_ROVHD_CRABS.jpg" | \
awk -F"_" ' {printf "%s/%s/%s,%s:%s:%s\n",  substr($1,5,2),substr($1,7,2),substr($1,1,4),substr($2,1,2),substr($2,4,2),substr($2,7,2)}'

Now that I had the date/time information from the filename in the same format as the SCS time stamp I used grep to search the GGA and ROV data files.  I used the -m1 argument to return only the first row that matched the timestamp.  Again, I chose to ignore the partial second information:

SCSTIMESTAMP="07/11/2010,05:24:53"
grep -m1 ${SCSTIMESTAMP} ./SCSData/ROV_GPGGA_07112010.Raw
grep -m1 ${SCSTIMESTAMP} ./SCSData/ROV_PGSSRVR_07112010.Raw

Running the two queries yielded the following two results respectively.

07/11/2010,05:24:53.027,$GPGGA,052451.93,0251.00232,N,12503.55104,E,0,00,0.0,0.00,M,,,,*06
07/11/2010,05:24:53.449,$PGSSRVR,109.7,5.8,2.8,422.2,m,0.0,m,7,0.0,0.0*56

The next step was to embed this data in the image file using exiv2.  I found exiv2 to be a little funny to use.  First off I had to know the exact  bin name and expected datatype (i.e. ASCII, Rational Number, Byte, etc).  Please refer to the exiv2 tag reference for all the metadata bin names and datatypes.  Second quirk, to store a rational number, say 154.23, you have to enter it as 15423/100.  The last quirk is how data is passed to the exiv2 command using the command-line.  There are two methods: you can populate individual bins using the bin name and value as command-line arguments or you can pass a script file to exiv2 and populate multiple bins at once.  I found the latter method easier to script.

Once I understood how to use exiv2 to populate metadata had to determine exactly what bins I populate in order for programs like iPhoto and Picasa3 to recognize an image as geo-tagged.  My conclusion was that at the most basic level I needed the GPS version, GPS Time, Latitude Reference, Latitude, Longitude Reference, Longitude, Altitude Reference and Altitude.  Here is the list of the metadata bin names, datatypes and descriptions of how I planned to populate them:

Exif.GPSInfo.GPSVersionID Multi-Byte – must be set to: “02 00 00 00”
Exif.GPSInfo.GPSTimeStamp Multi-Rational – hours, minutes, seconds: “6/1 21/1 42967/1000” = 6:21:42.967
Exif.GPSInfo.GPSLatitudeRef Ascii – N (North) or S (South)
Exif.GPSInfo.GPSLatitude Multi-Rational – Degrees, Minutes, Seconds: “5/1 4/1 439/10” = 5 deg, 4 min, 43.9 sec
Exif.GPSInfo.GPSLongitudeRef Ascii – W (West) or E (East)
Exif.GPSInfo.GPSLongitude Multi-Rational – Degrees, Minutes, Seconds: “126/1 39/1 17956/1000” = 126 deg, 39 min, 17.956 sec
Exif.GPSInfo.GPSAltitudeRef Byte – 00 (Above Sea level) or 01 (Below Sea Level)
Exif.GPSInfo.GPSAltitude Rational – meter away from sea level: “170041/100” = 1700.41

Optionally I could populate heading using the following bins:

Exif.GPSInfo.GPSImgDirectionRef Ascii – T (True) or M (Magnetic)
Exif.GPSInfo.GPSImgDirection Rational – degrees: “2352/10” = 235.2

Getting back the my example, I used awk to dissect the GGA string build an exiv2 script file for the GPSVersionID, GPSTimeStamp, GPSLatitudeRef, GPSLatitude, GPSLongitudeRef, and GPSLongitude.  In cases where the data value was floating point number I multiplied the value by a factor of 10 to make it an integer.  I also had to convert the decimal minutes part of the lat and long data to minutes, seconds by multiplying the fractional part of the decimal minutes by 60.  Here were the results:

GGASTRING="07/11/2010,05:24:53.027,$GPGGA,052451.93,0251.00232,N,12503.55104,E,0,00,0.0,0.00,M,,,,*06"
echo ${GGASTRING} | awk -F"," '{printf "set Exif.GPSInfo.GPSVersionID Byte 02 00 00 00\n\
set Exif.GPSInfo.GPSTimeStamp Rational %i/1 %i/1 %i/1000\
set Exif.GPSInfo.GPSLatitudeRef Ascii %s\
set Exif.GPSInfo.GPSLatitude Rational %i/1 %i/1 %i/100000\
set Exif.GPSInfo.GPSLongitudeRef Ascii %s\
set Exif.GPSInfo.GPSLongitude %i/1 %i/1 %i/100000\
", \
substr($4,1,2),substr($4,3,2),substr($4,5,6)*1000, \
$6, \
substr($5,1,2),substr($5,3,2),substr($5,6,5)*60, \
$8, \
substr($7,1,3),substr($7,4,2),substr($7,7,5)*60}' > exiv2GGAScript.txt

The output exiv2GGAScript.txt script looked like:

set Exif.GPSInfo.GPSVersionID Byte 02 00 00 00
set Exif.GPSInfo.GPSTimeStamp Rational 5/1 24/1 51930/1000
set Exif.GPSInfo.GPSLatitudeRef Ascii N
set Exif.GPSInfo.GPSLatitude Rational 2/1 51/1 13920/100000
set Exif.GPSInfo.GPSLongitudeRef Ascii E
set Exif.GPSInfo.GPSLongitude 125/1 3/1 3306240/100000

I used awk again to dissect the PGSSRVR string and build an exiv2 script file for the GPSAltitudeRef, GPSAltitude, GPSImgDirectionRef and GPSImgDirectionRef.  In cases where the data value was floating point number I multiplied the value by a factor of 10 to make it an integer.  Here were the results:

PGSSRVRSTRING="07/11/2010,05:24:53.449,$PGSSRVR,109.7,5.8,2.8,422.2,m,0.0,m,7,0.0,0.0*56"
echo ${PGSSRVRSTRING} | awk -F"," '{printf "set Exif.GPSInfo.GPSImgDirectionRef Ascii T\
set Exif.GPSInfo.GPSImgDirection Rational %i/100\
set Exif.GPSInfo.GPSAltitudeRef Byte 01\
set Exif.GPSInfo.GPSAltitude Rational %i/100\
", \
$4*100, \
$7*100 }' > exiv2PGSSRVRScript.txt

The output exiv2PGSSRVRScript.txt script looked like:

set Exif.GPSInfo.GPSImgDirectionRef Ascii T
set Exif.GPSInfo.GPSImgDirection Rational 10970/100
set Exif.GPSInfo.GPSAltitudeRef Byte 01
set Exif.GPSInfo.GPSAltitude Rational 42220/100

Rather than call the exiv2 command once with each script, chose to concatenate the two scripts into a single script and call the evix2 command just once.

cat exiv2GGAScript.txt exiv2PGSSRVRScript.txt > exiv2Script.txt
exiv2 -k -m exiv2Script.txt 20100711_05h24m53s19_ROVHD_CRABS.jpg

To verify the metadata was successfully written I used the exiv2 command with the -pt argument.

exiv2 -pt 20100711_05h24m53s19_ROVHD_CRABS.jpg
Exif.Image.GPSTag                            Long        1  26
Exif.GPSInfo.GPSVersionID                    Byte        4  2.0.0.0
Exif.GPSInfo.GPSLatitudeRef                  Ascii       2  North
Exif.GPSInfo.GPSLatitude                     Rational    3  2deg 51' 0.139"
Exif.GPSInfo.GPSLongitudeRef                 Ascii       2  East
Exif.GPSInfo.GPSLongitude                    Rational    3  125deg 3' 33.062"
Exif.GPSInfo.GPSAltitudeRef                  Byte        1  Below sea level
Exif.GPSInfo.GPSAltitude                     Rational    1  422.2 m
Exif.GPSInfo.GPSTimeStamp                    Rational    3  05:24:51.9
Exif.GPSInfo.GPSImgDirectionRef              Ascii       2  True direction
Exif.GPSInfo.GPSImgDirection                 Rational    1  10970/100

Mac users can verify the metadata was written by opening the image with Preview, Click Goto Tool->Show Inspector and select the GPS tab.

And again using Picasa3.

Picasa3 Showing Geographic Location

So at this point I had a method for finding the right data, formatting it properly and embedding it into an image.  All that was left was to script it so that I could GeoTag all ~3400 images at once.

Automating it!

So what I wanted was a script that does all the steps for me and does it for an entire directory of images.  I also wanted to add some additional functionality such as:

  • Ability to only do GPS coordinates no heading or depth (if I were tagging an image from a digital camera or  frame capture from one of the ship-mounted HD PTZ cameras)
  • Ability to pull the timestamp from the modification time of the file instead of the filename (if I were using a real-time frame capture system this would be the way to go)
  • Ability to embed heading data based on the NMEA0183 HDT string (if I wanted to pull heading off a NMEA-compadible Gyroscope)
  • Ability to create a copy of the image and modify the copy, not the original
  • Ability to pass an additional exiv2 script to further populate the image metadata. (I’m thinking about adding things like Expedition Name, Dive Number, Cruise ID, Copyright Credit, etc)
  • Ability to properly handle TIFF and JPG images

So here is the script.

This was a long one in development but I’m happy with the results.  Without any options the script will try to geo-tag all images matching the YYYYMMDD_HHhMMmSSsFF_CAMERA-SOURCE_DESCRIPTION.jpg format.  By Default these images will be located at sea level.  If no GPS data is found for the given timestamp the script will move on to the next image.  Use the -v flag to increase the verbose messaging.  Use the -t flag to perform a dry-run (images will not be modified).  Use the -m flag to search for position data based on the files last modification time.  Use the -F jpg|tif|tiff arguments to specify the image file type.  The argument must be “jpg”, “tiff” or “tif”. Use the -R <PGSSRVR file> argument to add depth and heading information based on the PGSSRVR string format.  Use the -H <GPHDT file> argument to add heading information based on the NMEA0183 HDT string format.  If  -R and -H are specified, the HDT format will override the heading data from the PGSSRVR data.  Use the -M <EXIV2 Script> arguments to populate additional EXIF metadata bins.  This file must use the same exiv2 script format as described above. I used the following script when I was testing this feature:

UPDATED 2011/01/12: After using the script with different web gallery programs I now use a more comprehensive auxiliary EXIV2 script, the follow section has been updated.
set Exif.Image.Artist Ascii "NOAA Okeanos EXplorer Program"
set Exif.Image.Copyright Ascii "INDEX-SATAL 2010, Okeanos Explorer Program, NOAA"
set Iptc.Application2.Copyright String "INDEX-SATAL 2010, Okeanos Explorer Program, NOAA"
set Exif.Image.ImageDescription Ascii "This will be the first time scientists use a remotely operated vehicle (ROV) to get even a glimpse of deepwater biodiversity in the waters of the Sangihe Talaud Region. We expect to make discoveries that will advance our understanding of undersea ecosystems, particularly those associated with submarine volcanoes and hydrothermal vents. http://oceanexplorer.noaa.gov"
set Exif.Photo.UserComment "This will be the first time scientists use a remotely operated vehicle (ROV) to get even a glimpse of deepwater biodiversity in the waters of the Sangihe Talaud Region. We expect to make discoveries that will advance our understanding of undersea ecosystems, particularly those associated with submarine volcanoes and hydrothermal vents. http://oceanexplorer.noaa.gov"
set Iptc.Application2.Caption String "This will be the first time scientists use a remotely operated vehicle (ROV) to get even a glimpse of deepwater biodiversity in the waters of the Sangihe Talaud Region. We expect to make discoveries that will advance our understanding of undersea ecosystems, particularly those associated with submarine volcanoes and hydrothermal vents. http://oceanexplorer.noaa.gov"
set Iptc.Application2.Keywords String "ROV, INDEX-SATAL 2010, Indonesia"

Use the -O <output directory> arguments to specify an output directory for the geo-tagged images.  When this flag is used, each image is first copied to the output directory and then that copied image is geo-tagged.  The original file is not modified at all and the copied file still retains all the timestamps and permissions as the original.

The only two required arguments are the <gga file> and the <image directory>.  Without these the script will exit immediately.

The script writes the filename and position data to the stdout which is convenient for finding out what position data was embedded in each image and finding any images where position information could not be found.  I typically redirect this output to a logfile so that I can go back at a later time and access this information.

The full usage statement is:

Usage: ODR_GeoTag.sh: [-vtm] [-F jpg|tif|tiff ] [-R <PGSSRVR file>] [-H <HDT file>] [-M <EXIV2 script> [-O <output directory>] <gga file> <image directory>
	-v verbose
	-t test only
	-m Geotag based on file modification timestamp, not filename
	-F jpg|tif|tiff Set the image type to embed, the only valid options are jpg, tif, and tiff
	-R <PGSSRVR file> Use PGSSRVR file to populate depth and heading metadata bins
	-H <HDT file> Use HDT file to populate heading metadata bins
	-M <EXIV2 Script> Use EXIV2 Script to populate additional metadata bins
	-O <output dir> directory to save the output files
	<gga file> the navigation file to use
	<image directory> the directory containing the image files to geotag

Enjoy, and I hope this helps.

Want to talk about what was discussed here? Please go to the forums.

Leave a Reply