BON in a Box
Mark Elvers
2 min read

Categories

  • geobon

Tags

  • tunbury.org

On a suggestion from Michael, I have had a quick look at BON in a Box, which is a web-based biodiversity analysis platform using Docker containerised pipelines running R, Julia, and Python scripts.

It couldn’t be easier to get started. Install Docker and Docker Compose, and make sure you can access GitHub via SSH using a public key. [Run ssh-keygen -t ed25519 and then publish the resulting ~/.ssh/id_ed25519.pub to your GitHub account.]

apt install docker.io docker-compose-v2

Clone the GEO-BON’s repository and make a working copy of the runner.env file. This file can be edit to add API keys of datasets, but I don’t have any so the default file is fine.

git clone git@github.com:GEO-BON/bon-in-a-box-pipelines.git
cd bon-in-a-box
cp runner-sample.env runner.env

To start the server run ./server-up.sh. There is also ./server-down.sh to stop the server.

The first run downloads the required Docker containers so takes a few minutes. Once complete visit http://localhost to see the web GUI.

I ran the “Get Country Polygon” script, creating a nice Colombia polygon.

There is a drag and drop pipeline editor which felt a lot like Microsoft Access.

I followed along with the tutorial and created an R script and a YAML file of the same name in the /scripts directory. These appeared in the GUI, allowing me to run them and use them in the pipeline editor. Annoyingly, the dataset was not provided in the tutorial, so I couldn’t run the code.

TestScript.R

The biab functions are how the script interacts with the BON in a Box system.

library(rjson)
library(sf)
library(terra)
library(dplyr)
library(ggplot2)

input <- biab_inputs()

dat <- st_read(input$country_polygon)

if (nrow(dat)==0) {
        biab_error_stop("Country polygon does not exist")
}   
  
dat.transformed <- st_transform(dat, crs=input$crs)

rasters <- terra::rast(c(input$rasters, crs=intput$crs))

country_vect <- vect(dat.transformed)
    
raster.cropped <- mask(rasters, country_vect) 
    
raster_change <- rasters[[1]]-rasters[[2]]

raster_change_path <- file.path(outputFolder, "raster_change.tif")
writeRaster(raster_change, raster_change_path)

biab_output("raster_change", raster_change_path)

layer_means <- global(rasters.cropped, fun="mean", na.rm=TRUE)
layer_means$name <- names(rasters.cropped)
  
means_plot <- ggplot(layer_means, aes(x=name, y=mean)) + geom_point()
  
means_plot_path <- file.path(outputFolder, "means_plot.png")
ggsave(means_plot_path, means_plot)
    
biab_output("means_plot", means_plot_path)

TestScript.yaml

The inputs and outputs section defines the inputs and outputs, where the names must match the names in the script above. The environment is set up using conda. A specific version can be specified like this: r-terra=0.9-12

script: TestScript.R
name: Test script
description: Demo script
author:
  - name: ME
inputs:
  country_ploygon:
    label: Country Polygon
    description: Polygon of the country of interest
    type: application/geo+json
    example: null
  crs:
    label: Coordinate reference system
    description: Coordinate reference system
    type: text
    example: "EPSG:3857"
  rasters:
    label: Rasters
    description: Raster layers of variable of interest
    type: image/tiff;application=geotiff[]
    example: null 
outputs:
  raster_change:
    label: Rasters
    description: Differences between raster values
    type: image/tiff;application=geotiff
  means_plot:
    label: Plot of raster means
    description: Plot of means of raster layers
    type: image/png
conda:
  channels:
    - conda-forge
    - r
  dependencies:
    - r-rjson
    - r-sf
    - r-dplyr
    - r-terra
    - r-ggplot2

The architecture appears to be designed as a single-server instance without built-in job queuing or concurrent execution limits.