Mainly for my future reference here is a walk-through of the Teserra pipeline.
Data Sources and Acronyms
The Sentinel-1 Radiometrically Terrain Corrected (RTC) collection on Microsoft Planetary Computer (MPC) provides processed C-band Synthetic Aperture Radar (SAR) data.
Observational Products for End-Users from Remote Sensing Analysis, OPERA, Radiometric Terrain Corrected (RTC) SAR Backscatter from Sentinel-1 (RTC-S1) has a 30m resolution.
S1 width is typically 250km x 250km, but the exact values vary.
Sentinel-1 makes two passes which view the ground from different angles.
- Ascending: satellite moving south-to-north (evening pass, ~6pm local time)
- Descending: satellite moving north-to-south (morning pass, ~6am local time)
Sentinel-1 transmits a vertically polarised radar pulse and records two return signals:
- VV: vertical transmit, vertical receive — the “like-polarised” return sensitive to surface roughness and moisture (soil, water)
- VH: vertical transmit, horizontal receive — the “cross-polarised” return sensitive to volume scattering (vegetation canopy, forest structure)
Sentinel-2 Level-2A (L2A) data provides surface reflectance images, formatted in 100km x 100km tiles based on the Military Grid Reference System (MGRS). These are 10,980 x 10,820 pixel at 10m resolution.
MGRS tiles are defined on Universal Transverse Mercator (UTM) projections, which are local flat approximations of the Earth’s surface.
Each “100km × 100km” tile is a 100km square in the local UTM coordinate system, which maps to a slightly trapezoidal shape on the actual Earth surface. The deviation from true square is small within a single tile (UTM distortion is <0.04% within a zone), but it means tiles at different latitudes cover different amounts of actual ground area when measured in degrees.
Sentinel-2 is an optical sensor which looks straight down.
COG = Cloud-Optimised GeoTIFF.
STAC = SpatioTemporal Asset Catalog.
ROI = Region of Interest.
SCL = Scene Classification Layer.
The Pipeline
The pipeline uses 0.1-degree blocks.
Load a GeoTIFF that defines the ROI’s spatial extent (CRS, bounds, resolution, dimensions) and a binary mask (1 = land, 0 = sea/skip). The bounds are reprojected to latitude/longitude for satellite data queries.
Query MPC or AWS for Sentinel-2 and Sentinel-1 data covering the ROI, for the entire year, filtered by cloud cover. S2 uses STAC on both sources; S1 uses STAC on MPC and NASA’s Common Metadata Repository CMR on AWS.
For Sentinel-2 data, there will be multiple passes, perhaps even on the same day. The cloud mask, SCL, is downloaded for all passes and used to identify valid (non-cloudy) dates. A second pass downloads the additional bands for the valid dates. This is nuanced, as a given day can be assembled from a mosaic of valid pixels rather than requiring an entirely cloud-free tile.
For Sentinel-1 data, both ascending and descending data is collected for all available dates.
This results in three 4D arrays, one 3D mask, and three arrays of dates:
- S2: [n_dates, H, W, 10] bands + [n_dates, H, W] masks + [n_dates] day-of-year
- S1: separate ascending and descending arrays [n_dates, H, W, 2] + [n_dates] DOYs each
For each pixel, the model needs exactly 40 S2 timesteps and 40 S1 timesteps as input. Since there are typically more valid timesteps available, a sampling step selects which ones to use. The pipeline uses random selection to pick the dates to use. It supports multiple passes with averaging, though it defaults to a single pass.
The S2 input is shaped as [40, 11], that is 10 spectral bands normalised plus the day-of-year. The S1 input is [40, 3], this is VV and VH (normalised) plus day-of-year. Ascending and descending S1 passes are merged into a single pool before sampling.
Thus for each pixel 10m x 10m pixel, there are 40 S2 dates, each with 10 spectral bands and for each of a (potentially different) 40 S1 dates, there are VV and VH values. These are passed to the model, which produces a 128-dimensional float32 embedding per pixel.
In the final step, the 128-dimensional embeddings are quantised to int8 with a per-pixel float32 scale factor, reducing storage to 132 bytes per pixel, compared to 512 bytes for full float32.