Convert SCS timestamped NMEA GGA data to KML

Overview

For better or worse Google Earth is becoming the defacto standard for geospatial visualization.  I’m guessing this is due to the amazingly powerful and beautiful yet intuitive user interface (giving credit where credit is due).  Regardless of the reasoning, for the near future Google Earth is going to be how most people prefer to show off their GIS data and the interest of being good datarats we should try to figure out how to accommodate our scientists.

Here’s the real-world scenario I’m facing:  I’ve got a science party member who wants to be able to display the entire cruise track in Google Earth.  This reqires taking the recorded GPS data and producing a Google Earth-formated (.kml) file.

So how to attack the problem?  Since all we’re really doing is converting data from one format to another let’s take a look at our staring and end points and see just how much trouble we’re really in.  The starting point is ship’s GPS.  This sensor spits out standard NMEA0183 GGA messages that are logged by the ship’s datalogger to a file. In my scenario the GPS sensor is a POS/MV and the datalogger  is the NOAA’s Shipbard Computing System (SCS).  That means I’m receiving data values at 2Hz and SCS is automatically prepending each reading with a date and time (mm/dd/yyyy,hh:mm:ss.sss).  The final saved data looks something like the following:

07/22/2010,07:30:54.744,$GPGGA,073054.518,0207.20460,N,12539.85843,E,2,11,0.8,5.42,M,,,10,0025*33
07/22/2010,07:30:55.260,$GPGGA,073055.018,0207.20573,N,12539.85840,E,2,11,0.8,5.35,M,,,10,0025*37
07/22/2010,07:30:55.744,$GPGGA,073055.518,0207.20685,N,12539.85839,E,2,11,0.8,5.27,M,,,9,0025*0D
07/22/2010,07:30:56.260,$GPGGA,073056.018,0207.20799,N,12539.85838,E,2,11,0.8,5.19,M,,,9,0025*0B

Google Earth uses an XML-based file format called Keyhole Markup Langauge (KML) to display geospatial data.  The name Keyhole is a carry over from the Keyhole Inc., the GIS company Google aquired in 2004 to create Google Earth.  As a sidebar the name Keyhole is also based on the military’s first-generation eye-in-the-sky, the KH-11 reconnaissance satellite (it all makes sense now, right?).  Here’s an example of a KML file:

<?xml version=”1.0″ encoding=”UTF-8″?>
<kml xmlns=”http://www.opengis.net/kml/2.2″>
<Folder>
<name>Tracks</name>
<Folder>
<name>Points</name>
<Placemark>
<TimeStamp><when>2010-07-22T07:30:54.017Z</when></TimeStamp>
<Point>
<coordinates>125.664308,2.120058,5.460000</coordinates>
</Point>
</Placemark>
<Placemark>
<TimeStamp><when>2010-07-22T07:31:54.017Z</when></TimeStamp>
<Point>
<coordinates>125.664272,2.122292,5.290000</coordinates>
</Point>
</Placemark>
</Folder>
</Folder>
</kml>

Although XML is very powerful in it’s ability to gracefully handle multiple varieties of GIS information, translating from comma-delimited to XML is a bit of a PITA (pain-in-the-ass).  As a design philosophy when facing a challenge such as this I prefer to do a little research to find out if someone else has already solved a similar problem.  I find it’s quicker to hack than to reinvent.  Quick… to the Internet!

Tools

After Googl-ing “Translate NMEA to KML command-line” and clicking a couple of links I came across a slick little utility GPSBabel.  GPSBabel can  translate between seemingly every file format related to storing GPS data including NMEA and KML. Perfect. A quick read though the online documentation confirms that it should fit the bill and best of all GPSBabel is FREE!

In addition to GPSBabel we need to do some simple file querying and row formatting.  For this I’ll rely on my good friends grep and awk.  I’ll also need to use the BASH shell scripting language to stitch everything together.  For my solution I’ll be developing on the Mac OS X platform but ultimately porting the solution over to a Linux-based server.  I haven’t looked to see if this will work on Windows using CYGWIN but I don’t see a reason why not.

Solution

After reading the man pages for GPSBabel I discovered the correct way to use the command using just the default settings:

gpsbabel -i NMEA -f <input file> -o KML -F <output file>

GPSBabel requires that the input file be properly formatted NMEA0183 GGA.  This means that before we can proceed we will need to strip out the date and time stamp that SCS added.  To do this I use AWK to print each column of data starting at column 3:

awk -F, '{ printf "%s", $3 } { for ( i = 4; i <= NF; i++ ) printf ",%s", $i} {printf "\n"}' <input file> > <output file>

The output should look like:

$GPGGA,073054.518,0207.20460,N,12539.85843,E,2,11,0.8,5.42,M,,,10,0025*33
$GPGGA,073055.018,0207.20573,N,12539.85840,E,2,11,0.8,5.35,M,,,10,0025*37
$GPGGA,073055.518,0207.20685,N,12539.85839,E,2,11,0.8,5.27,M,,,9,0025*0D
$GPGGA,073056.018,0207.20799,N,12539.85838,E,2,11,0.8,5.19,M,,,9,0025*0B

As I mentioned earlier my GPS samples position at 2Hz.  While this sampling rate is required for operating a multibeam system or a dynamic positioning system, it’s simply too much for displaying a multi-day track line in Google Earth, a 5-minute interval should be more than sufficent.  To resample the data we could employ a mathmatical resampling algorithim that would interpolate the data to a 5-minute interval or take a shortcut and just grab one row out of every 600 (2 records/second * 60 seconds/minute * 5 minutes = 600 records/5 minutes).  To do that I’ll use AWK again.  The following command grabs the 600th line of data starting at line 1 of the input file and stores the line in the output file:

CORRECTION: Turns out starting on the first line is dangerous if your datalogging program introduces >.5 second delay between when the data is produced and when it is time stamped as you could end up with the following situation:

07/26/2010,00:00:00.174,$GPGGA,235959.734,0216.12878,N,12449.07267,E,2,09,1.1,7.33,M,,,10,0025*3A

This is a problem because the SCS time is from the day after when the data was produced (SCS Time: 00:00:00.174, GPS Time: 23:59:59.734, from the previous day).  There’s no easy way to account for this so what I’m going to do is perform the resampling operation starting at the second line.  The corrected AWK command is:

awk 'NR%600==2' <input file> > <output file>

When I run the gpsbabel command and immediately get an error:

nmea: No date found within track (all points dropped)!
nmea: Please use option "date" to preset a valid date for thoose tracks.

So I head back to the man pages and look at the options for the nmea format.  Turns out I need to give GPSBabel the date the data was recorded using the “date=<yyyymmdd>” option.  Time is provided internally by the second column of the GGA message.  So we can manually look at each data file and translate the date into the required format or… use the following command to grab the date field from the first line of the raw SCS file and translate it for use:

head -n1 <input file> | awk -F[,/] '{print $3$1$2}'

Let’s try this again:

gpsbabel -i NMEA,date=<yyyymmdd> -f <input file> -o KML -F <output file>

Success… well maybe, we at least have a KML output file.

Importing it into Google Earth yields a trackline with timeline.  I can event hit play and watch as the the marker retraces where the ship’s been cruising for the last month.  Sweet, but what’s up with those numeric labels that keeps decreasing?  That’s just a little annoying and I need to remove it so it’s back to the GPSBabel man pages.

After trying a bunch of stuff I finally tweaked my command to produce a KML file just the way I want it.  I can still see my little ship move along it’s trackline, but there are no labels distracting me.  As a bonus I’ve reduced the output filesize by 97.5%!  As I mentioned earlier, XML is very powerful but it is not simple and it is not concise.  The command I ultimately ended up using was:

gpsbabel -i NMEA,date=<yyyymmdd> -f <input file> -o KML,trackdata=0,track=1,labels=0 -F <output file>

UPDATED 2011-01-28: Processing interpreted GGA data (i.e from an ROV)

Recently someone I’ve worked with tried using this procedure to convert GGA data from and ROV dive.  The process failed very quietly and generated a KML file with no data points.  Here’s why and here’s how to fix it.  I’ve also updated the final script so that it now handles this scenario abet you will need to installed the NMEACheckSumGen program described below.

The Cause

To the best of my knowledge GPS doesn’t work deep underwater.  The signals from the satellites just can’t penetrate kilometers of water.  That said, Science-Class ROV operators still need to know where their vehicles are both relative to the ship and absolute to the surface of the planet.  For determining ROV position relative to the vessel, the majority of ROV operators use an Ultra-Short Baseline (USBL) acoustic navigation system.  Simply put, a USBL uses sound to determine the range and bering from a transducer mounted on the ship’s hull to a transducer on an underwater vehicle.  This gives the operators the X and Y distances (in feet or meters) between the ship’s transducer and the vehicle’s transducer.  To get the absolute location of the vehicle in lat/lon, another process (i.e. Hypack) combines the USBL data with the ship’s GPS position, taking into account the distance between the ship’s GPS antenna and the ship’s USBL transducer.  Here’s where the problem begins…

The output from these last process produces a NMEA GGA data stream that represents the lat/lon position of the vehicle.  Now if you were to dig deep into the NMEA GGA spec you will find that the 6th GGA data field corresponds to the GPS quality type.  There are three valid values for this field: 1=GPS fix, 2=Differential GPS fix, 0=invalid.  Because the calculated GPS location is neither a real GPS fix nor a diff. GPS fix, it’s logged as type 0, invalid.

So the 6th field has a 0 in it, so what?  Well, GPSBabel, the utility I use for translating GGA into KML is a bit anal-retentive and will only process quality type 1 and type 2 data.  This is an easy fix.  Use awk to change the field from 0 to 1.  Unfortunately this creates another problem, the NMEA checksum.  The NMEA checksum is a XOR summation of the data that is used to verify the data was transmitted correctly.  The idea is that whatever system is uses the GPS data can caluclate their own checksum upon receiving the data.  If the transmitted checksum and calculated checksum don’t match then something went wrong during transmission and the fix should be interpreted as bad.  Again, here is where GPSBabel is a bit anal-retentive and will only process fixes that have the correct checksum.

The Solution

The solution to the first problem of the quality type = 0 is an easy fix using awk.  Here’s the command, remember this command is for an SCS-timestamped GGA file so we need to change the 9th field instead of the 7th:

awk -F, 'BEGIN{ OFS="," } {print $1, $2, $3, $4, $5, $6, $7, $8, "1", $10, $11, $12, $13, $14, $15, $16, $17}' <input file> > <output file>

The solution to the second problem is not as straight-forward.  After searching the internet for awhile I was able to piece together enough C code to build a simple program that will read an input NMEA file, calculate the correct checksum and spit out a valid data stream.  The program is so simple it will actually work for any valid NMEA data stream not just GGA.  Please remember this if you find yourself in a similar bind with other NMEA data.  Here’s the source code: main.c Usage instructions are in the code’s header comment.  On Mac and Linux systems, you can compile the code using the following command:

gcc -o ./NMEACheckSumGen ./main.c -I

This command will create an executable program called NMEACheckSumGen.  Copy the resulting executable to somewhere on your system’s path (i.e. /usr/local/bin) and you’ll be able to call it from anywhere.

Automating it!

So now we have our rough procedure for getting from start to finish, time to put it all together.

I want to be able to simply pass a SCS-timestamped GGA file to a program and have KML come out.  The script should  resample the data file to whatever interval I set, figure out what date string to pass to GPSBabel and reformat the raw datafile to a NMEA0183 compliant GGA format automatically.  The program should also be able to handle interpreted GGA data such as ROV position.

So here is the script.

In addition to the requirements specified previously the script also accounts for if the raw file includes data from more than one day.  In that case the script will parse out the data for a single day and build a new KML file for that date.  This process repeats for each date discovered in the raw data.  This is useful for when the data logging program doesn’t truncate files on the even day or if you want to concatenate all the raw data from a cruise into a single file and run the script once at the end.  Depending one the amount of navigation data to convert you may want to change the default resampling interval.  The default resample interval is 600 (1 sample/5 minutes) but can be changed using the -R flag.  The script can add a user-specified prefix to the output KML files using the -P option.  This is useful if you want to store KML files from more than one platform in the same directory (i.e. ship and an ROV).  The script has two required arguments. The first is the input data file.  The second is the directory to store the resulting KML file(s).  Make sure you have write permission to the output directory before you run the script.

Here is the full usage statement:

Usage: ./NMEA_2_KML: [-vf] [-R <interval> ] [-P <prefix>] <input file> <output directory>
     -v turn on verbose messaging
     -f fix interpreted GGA data, requires NMEACheckSumGen
     -R <interval> resample the input data by selecting 1/<interval>
          data row.  Interval must be an integer. Default=600
     -P <prefix> prefix to add to the output KML files
      <input file> the navigation file to use
      <output directory> the directory to store the KML files

Got questions? Please put them in the comments.

Enjoy, and I hope this helps.
– Webb

This entry was posted in Post Processing and tagged , , , , , , , on by .

About webbpinner

I'm Webb, the owner/operator of oceandatarat.org. I started this blog to document some knowledge and tricks I've picked up along the way. My goal is to share what I know in hope that it is useful to others. I'm also the owner operator of Capable Solutions, a small company focused on helping oceanographers and vessel operators turn diesel fuel into quality data.

Leave a Reply