OpenVDM, Understanding How Hooks Work.

openVDM_LogoV2_1_shortAs I mentioned in my last post the next incremental version of OpenVDM is going to include hooks that will allow vessel operators to leverage OpenVDM for automatically executing their own custom data QA/QC and processing software exactly when those custom programs need to be run.  In this post I’ll explain how that works.

How Hooks Work

First a quick refresher…

Before I dive straight down into the weeds,  here’s a quick refresher on OpenVDM terminology that I will be referencing throughout the article:

  • OpenVDM Data Warehouse: the server hosting OpenVDM and where OpenVDM stores  cruise data.
  • Collection System Transfers: the copying of raw data from a remote data acquisition system like SCS, SeaSave, SIS, Winfrog, etc to the desired location on the OpenVDM Data Warehouse.  All of the configuration information needed to perform this task is stored within OpenVDM and is available via the OpenVDM API.
  • The DataDashboard: The page(s) on the OpenVDM web-interface that visualize sub-sampled representations of raw data as well as vessel operator-defined QA test results and data statistics.
  • DashboardData: The simplified, sub-sampled JSON-formatted representation of raw datasets primarily used by the DataDashboard for providing quick visualizations of data.  DashboardData files optionally include QA test results and statistics. DashboardData files are created via OpenVDM plugins.  These plugins are developed by the vessel-operator and tailored to the data formats used by that vessel.

Back to the main event…

At key points in the OpenVDM workflow there is core code that asks “While I’m here, is there anything else you need done?”  OpenVDM get it’s answer from YAML-formatted configuration files.  These configuration files include lists of additional programs to run at that particular moment.  It allows the vessel operator to “hook” on their own programs and scripts for execution at the exact moment when they should be run.

For simple, low cpu intensity programs/scripts, the program/script connected to a hook may be run directly on the OpenVDM Data Warehouse.  If the program/script requires a different operating system or is too processor intensive to be run on the OpenVDM Data Warehouse, the hook architecture provides a mechanism for executing  a program/script installed on a remote system.

Each hook has a specific payload of information that is passed to each of the programs/scripts defined in the configuration files.  This payload can be used by a script/program to further tailor its behavior.

What Hooks are Available in OpenVDMv2.1?

Currently there are two hooks available with two more in development.

The first hook is connected to the tail of collection system transfers.  After new data arrives on the OpenVDM Data Warehouse, the hook reads a YAML-formatted configuration file and executes a list of programs/scripts specific to the collection system from which files were copied.

The payload for this hook is: the cruiseID, the name of the collection system (i.e. SCS), and an array of all new and updated files that were included in the recent transfer.

The second hook is connected to the tail of the DashboardData processor.  The DashboardData processor is something that should/will be explained in greater detail but for the purposes of this article the important thing to understand is that the job of the DashboardData processor is to run the vessel-operator developed plugins and create the concise, sub-sampled JSON-formatted DashboardData files.

Having a hook at the end of the DashboardData processor allows custom programs/scripts to be executed that depend on knowing when new DashboardData files are available.  In the next section I’ll work through an example that leveraged this specific hook.

The two hooks that are still in development are for the start and end of a cruise.  These hooks will allow vessel operators to develop and deploy their own programs/scripts at the beginning or end of the cruise.  It will open up the possibility to create remote directory structures and/or report templates prior to sailing as well as run post-cruise automated reporting and/or filesystem cleanups.

A Working Example

Sometimes the best way to grasp how something works is to see a working example.  In this example I’m going to work through the development of a script that automatically builds/updates a geoJSON-formatted and KML-formatted cruise track based on DashboardData objects from each of the vessel’s GPS devices.

Describing the vessel’s layout

Here’s how this vessel is configured and what’s already occurred by the time this script is executed by the hook.

The GPS Sources:

    • POSMV
    • CNAV

Data Acquisition and File Naming Convention:

Both of the GPS sources output a GGA formatted stream on a serial connection.  That stream is recorded on the vessel by an underway data acquisition system.  In this example the vessel uses NOAA’s Scientific Computing System (SCS).  Each GGA stream is saved to its own file that is truncated daily by SCS.  Within the SCS data directory structure the files are located within the ./NAV folder.  The streams have source-specific naming conventions:

  • POSMV-GGA-RAW_<YYYYMMDD>-<hhmmss>.Raw
  • CNAV-GGA-RAW_<YYYYMMDD>-<hhmmss>.Raw

Location of Raw Data Once Copied to the OpenVDM Data Warehouse

The vessel operator has configured OpenVDM to store all cruise data, from all cruises at: ‘/mnt/vault/CruiseData’ on the OpenVDM Data Warehouse.  This is referred to as the base directory.

The vessel operator has configured the collection system transfer within OpenVDM for SCS to copy all raw data from SCS to a the folder called ‘SCS’ within the cruise data directory.

The vessel operator has defined the cruiseID for the current cruise to be ‘CS1601’.  The root directory for all data related to a single cruise is a directory with the same name as the cruiseID and located within the base directory.

Therefore all the GGA raw files from SCS would be located at:

/mnt/vault/CruiseData/CS1601/SCS/NAV

I hope everything’s making sense so far.

DashboardData Files

Dashboard Data Files are stored using the same directory structure as their raw data file counterparts but within in the Dashboard Data directory defined within OpenVDM.  For this example the Dashboard Data directory is ‘OpenVDM/DashboardData’.  Therefore the DashboardData files for the GGA raw files collected by SCS are stored within:

/mnt/vault/CruiseData/CS1601/OpenVDM/DashboardData/SCS/NAV

Still here?  Good, the worst should be over.

Extra Directories… defining where to put data products built by OpenVDM

In addition to defining destinations directories for collection system transfers, OpenVDM also allows the vessel operator to define extra directories that they would like created within the cruise data directory.

The reason to configure OpenVDM to manage these directories vs creating the directories manually is that through OpenVDM the vessel operator can depend on the directories being named and located the same way, every time.  The other advantage of using OpenVDM for creating extra directories is that OpenVDM makes information about the existence and location of these extra directory available though OpenVDM’s API.

For this example OpenVDM has been configured to create an extra directory within the cruise data directory named ‘Products’.

Desired Results

Now that the vessel’s layout has been fully described let’s talk about what the end results should look like.

What the vessel operator would like to see is a pair of files for each GPS data source that represent the entire cruise track for the current cruise in both geoJSON and KML formats respectively.  The files should have the following naming conventions:

<cruiseID>_<device>_Trackline.<suffix>

The resulting files should be stored within the ‘Products’ extra directory in a sub-directory called ‘Tracklines’.

So in this example the resulting files for the POSMV would be:

/mnt/vault/CruiseData/CS1601/Products/Tracklines/CS1601_POSMV_Trackline.json
/mnt/vault/CruiseData/CS1601/Products/Tracklines/CS1601_POSMV_Trackline.kml

The Script

In this section I’m going to talk about the various parts of the script but will leave out a lot of the error-checking code to simplify the explanations.  I will include a link the final file which will have all the error-checking, command-line argument handling and other bells/whistles that would be required for a real deployment.

Access to the data stored in OpenVDM

Earlier in this post we discussed all of the things that were configured in OpenVDM.  To write a script that is aware of all that information we need to leverage the OpenVDM API via the OpenVDM python module.

import openvdm

# Initiate an OpenVDM object
openVDM = openvdm.OpenVDM()

Next we need to retrieve all of the relevant  information stored within OpenVDM so that we can located the DashboardData files and save the resulting files to the correct location.

# Retrieve the shipboard data warehouse configuration
shipboardDataWarehouseConfig = openVDM.getShipboardDataWarehouseConfig()

# Retrieve the cruiseID as the current cruiseID
cruiseID = openVDM.getCruiseID()

# Retrieve the information for the collection system defined in the command-line argument
collectionSystem = openVDM.getCollectionSystemTransferByName('SCS')

# Retrieve the information for the Products directory
productsDirectory = openVDM.getExtraDirectoryByName('Products')

# Retrieve the information for the Dashboard Data directory
dashboardDataDirectory = openVDM.getRequiredExtraDirectoryByName('Dashboard Data')

Next let’s build the required path variables.

# Build the cruise data directory
cruiseDir = shipboardDataWarehouseConfig['shipboardDataWarehouseBaseDir'] + '/' + cruiseID

# Build the Dashboard Data Directory
dashboardDataDir = cruiseDir + '/' + dashboardDataDirectory['destDir']

# Build the Directory containing the DashboardData files from the SCS Raw files
sourceDir = dashboardDataDir + '/' + collectionSystem['destDir']

# Build the Destination Directory for the trackline files
destDir = cruiseDir + '/' + productsDirectory['destDir'] + '/' + 'Tracklines'

Now let’s define the various locations and filenames for the geoJSON files we want to combine.  To help name the resulting files we’ve included the device name with a regex expression for filtering the file names

"GPSSources": [
    {
        "device":"POSMV",
        "regex":"NAV/POSMV-GGA_*.json"
    },
    {
        "device":"CNAV",
        "regex":"NAV/CNAV-GGA_*.json"
    }
]

This code block build the list of files related to a particular GPS Source.  The following code is hard-coded to only deal with the first element of the GPSSources array

# Build the list of files coorsponding to the current device based on the regex provided
files = glob.glob(sourceDir + '/' + GPSSources[0]['regex'])

Here’s the function to combine the individual files.  We pass this function the list of files, the cruiseID and the device name (GPSSources[0][‘device’])

def combineGeoJsonFiles(files, cruiseID, deviceName):
    
    # Blank geoJson object
    combinedGeoJsonObj = {
        "type":"FeatureCollection",
        "features":[
            {
                "type":"Feature",
                "geometry":{
                    "type":"LineString",
                    "coordinates":[]
                },
                "properties": {
                    "name": cruiseID + '_' + deviceName
                }
            }
        ]
    }
        
    if len(files) > 0:
        for file in files:
            #print file

            # Open the dashboardData file
            try:
                geoJsonFile = open(file, 'r')
                geoJsonObj = json.load(geoJsonFile)

                #print combinedGeoJsonObj[0]['features'][0]['geometry']['coordinates'] 
                #print geoJsonObj['geodata']['coordinates']

                combinedGeoJsonObj['features'][0]['geometry']['coordinates'] += geoJsonObj['visualizerData'][0]['features'][0]['geometry']['coordinates']


            # If the file cannot be processed return false.
            except:
                print "ERROR: Could not proccess file: " + file
                return False

            # Close the raw datafile
            finally:
                geoJsonFile.close()
    else:
        return

    # If processing is successful, return the (geo)json object 
    return combinedGeoJsonObj

Here’s the function to take the output from the previous function and convert it to a KML string.

def convToKML(geoJSONObj):
    kml = Element('kml')
    kml.set('xmlns', 'http://www.opengis.net/kml/2.2')
    kml.set('xmlns:gx','http://www.google.com/kml/ext/2.2')
    kml.set('xmlns:kml','http://www.opengis.net/kml/2.2')
    kml.set('xmlns:atom','http://www.w3.org/2005/Atom')
    document = SubElement(kml, 'Document')
    name = SubElement(document, 'name')
    name.text = geoJSONObj['features'][0]['properties']['name'] + "_trackline.kml"
    placemark = SubElement(document, 'Placemark')
    name2 = SubElement(placemark, 'name')
    name2.text = "path1"
    #styleurl = SubElement(placemark, 'styleUrl')
    #styleurl.text = "http://www.schmidtocean.org/files/style.kml#SecondSargassoSea"
    linestring = SubElement(placemark, 'LineString')
    tessellate = SubElement(linestring, 'tessellate')
    tessellate.text = "1"
    coordinates = SubElement(linestring, 'coordinates')

    coordinatesText = ''
    
    #print json.dumps(geoJSONObj['features'][0]['geometry']['coordinates'])
    
    for coordinate in geoJSONObj['features'][0]['geometry']['coordinates']:
        #print coordinate
        coordinatesText += str(coordinate[0]) + ',' + str(coordinate[1]) + ',0 '

    coordinatesText = coordinatesText.rstrip(' ')
#    coordinatesText = coordinatesText.rstrip(',')
    coordinates.text = coordinatesText

    return '<?xml version=\"1.0\" encoding=\"utf-8\"?>' + tostring(kml)

Finally, we call the two functions and write their output to files in the destination directory.

# Overly simplified function to write a string to a file.
def writeToFile(contents, filename):
    fileObj = open(filename, 'w')
    fileObj.write(contents)
    fileObj.close()

combineGeoJsonObj = combineGeoJsonFiles(files, cruiseID, GPSSources[0]['device'])

writeToFile(json.dumps(combineGeoJsonObj), destDir + '/' + cruiseID + '_' + GPSSources[0]['device'] + '_Trackline.json')
                
writeToFile(convToKML(combineGeoJsonObj), destDir + '/' + cruiseID + '_' + GPSSources[0]['device'] + '_Trackline.kml'

To see the final script with all the bells and whistles please look here.

Connecting to the OpenVDM Hook

So our script is written.  We have named it buildCruiseTracks.py  now we just need to connected to the OpenVDM hook so is can be run after each time new DashboardData files are created from the SCS raw data files.

To do this simply modify the postDashboardData.yaml file:

- collectionSystemTransferName: SCS
  commandList:
  - name: buildCruiseTracks
    command:
    - python
    - /usr/local/bin/buildCruiseTracks.py

With this text block OpenVDM will execute the following line every time new DashboardData is created from SCS raw files:

 python /usr/local/bin/buildCruiseTracks.py

To add additional commands to run simple replicate the ‘name’ and ‘command’ structures.

Conclusion

I hope this was easy to follow.  If it wasn’t I hope you will send your questions to oceandatarat (at) gmail (dot) com.  I’m always willing to help.

Cheers,
– Webb

This entry was posted in Uncategorized and tagged , , on by .

About webbpinner

I'm Webb, the owner/operator of oceandatarat.org. I started this blog to document some knowledge and tricks I've picked up along the way. My goal is to share what I know in hope that it is useful to others. I'm also the owner operator of Capable Solutions, a small company focused on helping oceanographers and vessel operators turn diesel fuel into quality data.

Leave a Reply