Cloud Data Processing

Cloud Data Processing Structure

In order to avoid having to install software on a local machine to manage transferred data, or the need to run a separate server, the WIBL project provides a cloud-deployed data processing chain for WIBL data (or the data from other loggers converted into WIBL format). Currently implemented for deployment in Amazon Web Services, the system uses primarily Amazon Lambda functions triggered through Simple Notification Service events so that the configuration is loosely coupled and can be reconfigured as required for any particular implementation (e.g., if local regulations require notification of data to be sent to the Hydrographic Office). An on-demand service to monitor the operation of the system and maintain metadata (e.g., files that succeeded or failed conversion, metadata validation results, upload status, etc.) is implemented in a Docker container and implemented using AWS FarGate, a serverless system, to minimize "always on" costs.

By default, the cloud segment will automatically convert from WIBL file format into GeoJSON as specified by the IHO Crowdsourced Bathymetry Working Group and supported by the Data Center for Digital Bathymetry (DCDB), validate the metadata generated using the CSB Schema model, and then upload directly to DCDB. Hooks for notification of upload, and other stages of the processing are provided. A RESTful interface to the process monitor server is also defined for external users to inspect operations.

Although primarily designed for cloud deployment, the functionality in the data processing system can be accessed on a local desktop or server if required. The only strong use cases for this are either for development or debugging, or for handling non-WIBL logger data. In the latter case, the files being converted from non-native formats (using software provided as part of the project) do not have the metadata necessary for submission to DCDB. Converting them locally into WIBL format allows for this metadata to be added before either upload into the cloud for further processing, or subsequent local processing and submission.

Further details of the Cloud segment can be found in the Wiki associated with the project. The entire cloud segment is implemented in Python.

Deployment Methods

Deploying and managing cloud-based systems can be problematic for non-specialists. Although the WIBL cloud segment can be deployed by hand using the instructions provided. automated scripts are provided to build all of the components and associated infrastructure based on a user configuration script, and then deploy the system into the user's AWS account. Use of the automated scripts provides for documentation of the current configuration, as well as making deployment significantly simpler; their use is strongly recommended.