We use OCluster to manage the build cluster for the CI services backing OCaml-CI and opam-repo-ci. However, it is a general-purpose tool and isn’t tied to being a build system; it can distribute any jobs across multiple worker machines.
In my case, I need to generate some training data for Tessera by downloading some Sentinel-1 and Sentinel-2 satellite data and applying a tiny AI model for cloud masking. I need to generate about 5,000 tiles. With the HTTP round-trip, querying the STAC catalogue and downloading a single tile takes about 10 minutes. GNU Parallel showed that running more than four concurrently on my machine had no performance advantage. Enter OCluster.
The README.md on the project homepage explains how to set up a cluster. In brief, you need one machine with a fixed and accessible IP address to be the scheduler, plus as many worker machines as you like. The workers make an outgoing connection to the scheduler so they can be behind NAT. Workers are grouped in pools based upon administrative boundaries, such as machine architecture. Run ocluster-scheduler ... --capnp-listen-address=tcp:0.0.0.0:9000 --capnp-public-address=tcp:w.x.y.z:9000 --pools=foo,bar which generates pool-foo.cap and pool-bar.cap which include the public IP address. Run a worker with ocluster-worker --connect=pool-foo.cap --name worker-1 ...
For ease of local testing, I created a Dockerfile which builds my (OCaml) project from source and installs third-party libraries such as the ONNX runtime. I could submit the Dockerfile directly to the cluster, but the cache pruning is more sophisticated when using an OBuilder spec. OCluster runs docker system prune when the Docker partition is low on space, whereas OBuilder prunes individual layers on a least-recently-used basis.
An OBuilder spec is really just an s-expression version of the Dockerfile. For example, converting a trivial Dockerfile into hello.spec
Dockerfile.hello
FROM debian:13
USER 1000:1000
RUN echo Hello World
hello.spec
((from debian:13)
(user (uid 1000) (gid 1000))
(run (shell "echo Hello World"))
)
This can be submitted to OCluster using your capability.
$ ocluster-client submit-obuilder --connect ~/mtelvers.cap --pool test --local-file ./hello.spec
Tailing log:
Building on worker-1.ci.dev
(from debian:13)
Unable to find image 'debian:13' locally
13: Pulling from library/debian
ac9148dc57ca: Already exists
Digest: sha256:3615a749858a1cba49b408fb49c37093db813321355a9ab7c1f9f4836341e9db
Status: Downloaded newer image for debian:13
2026-03-09 11:28.03 ---> saved as "4ea035d1f0cfdda7660f299954022c3a974ec9e1ba5d06b3a9aa2bca24fdcfb7"
/: (user (uid 1000) (gid 1000))
/: (run (shell "echo Hello World"))
Hello World
2026-03-09 11:28.05 ---> saved as "e3859ae9dcce742a0d612e55f69b5ed1614551ca2b49109e43d08f2f2595fd57"
Job succeeded
Result: "e3859ae9dcce742a0d612e55f69b5ed1614551ca2b49109e43d08f2f2595fd57"
OCluster doesn’t provide any native mechanism to copy artefacts back from the worker machine. In the CI pipeline, there have been creative methods to do this. For example,
- print markers to the log, followed by JSON structured data, which can be extracted and parsed
- base64-encode some binary objects and print that to the log
- setup a remote SSH server and add steps to the build to
rsyncthe data
In a private environment, it may not be necessary to provide a secure upload and curl -X POST -F file=@somefile.bin http://w.x.y.z:8080 to a one-line Python HTTP server may be sufficient.
python3 -c "
import cgi
from http.server import HTTPServer, BaseHTTPRequestHandler
class H(BaseHTTPRequestHandler):
def do_POST(self):
form = cgi.FieldStorage(fp=self.rfile, headers=self.headers, environ={'REQUEST_METHOD':'POST'})
f = form['file']
open(f.filename, 'wb').write(f.file.read())
print(f'Saved {f.filename}')
self.send_response(200)
self.end_headers()
self.wfile.write(b'OK\n')
HTTPServer(('0.0.0.0', 8080), H).serve_forever()
"
However, if you do need an authentication mechanism, ocluster-client supports secrets via the command-line option --secret foo:/path/to/local/file, which can be read into an environment variable or placed in ~/.ssh/id_ed25519 or similar. e.g.
(run
(secrets (foo (target /path/on/remote/filesystem)))
(shell "TOKEN=$(cat /path/on/remote/filesystem) curl -H \"X-Token: $TOKEN\" ... "))
I used m4 to process my spec, replacing LAT and LON with actual latitude and longitude values, and submitted the jobs with a simple bash loop that invoked ocluster-client with each spec and redirected stdout to a log file.