Tunbury.ORG

Distributed ZFS Storage

2025-04-29T20:00:00+00:00

Following Anil’s note, we will design and implement a distributed storage archive system for ZFS volumes and associated metadata. Metadata here refers to key information about the dataset itself:

A summary of what the dataset is
Data retention requirement (both legal and desirable)
Time/effort/cost required to reproduce the data
Legal framework under which the data is available, restrictions on the distribution of the data, etc.

And also refers to the more systems style meanings such as:

Size of the dataset
List of machines/ZFS pools where the data is stored
Number and distribution of copies required
Snapshot and replication frequency/policy

These data will be stored in a JSON/YAML or other structured file format.

The system would have a database of machines and their associated storage (disks/zpools/etc) and location. Each item of storage would have a ‘failure domain’ to logically group resources for redundancy. This would allow copies of a dataset to be placed in different domains to meet the redundancy requirements. For example, given that we are committed to holding two distinct copies of the data, would we use RAIDZ on the local disks or just a dynamic stripe, RAID0, to maximise capacity?

While under development, the system will output recommended actions - shell commands - to perform the snapshot and replication steps necessary to meet the replication and redundancy policies. Ultimately, these commands could be executed automatically.

Utilising ZFS encryption, the remote pools can be stored as an encrypted filesystem without the encryption keys.

When the data is being processed, it will be staged locally on the worker’s NVMe drive for performance, and the resultant dataset may be uploaded with a new dataset of metadata.

Raptor Talos II - POWER9 unreliability

2025-04-29T12:00:00+00:00

We have two Raptor Computing Talos II POWER9 machines. One of these has had issues for some time and cannot run for more than 20 minutes before locking up completely. Over the last few days, our second machine has exhibited similar issues and needs to be power-cycled every ~24 hours. I spent some time today trying to diagnose the issue with the first machine, removing the motherboard as recommended by Raptor support, to see if the issue still exists with nothing else connected. Sadly, it does. I noted that a firmware update is available, which would move from v2.00 to v2.10.

Equinix Moves

2025-04-29T00:00:00+00:00

The moves of registry.ci.dev, opam-repo-ci, and get.dune.build have followed the template of OCaml-CI. Notable differences have been that I have hosted get.dune.build in a VM, as the services required very little disk space or CPU/RAM. For opam-repo-ci, the rsync was pretty slow, so I tried running multiple instances using GNU parallel with marginal gains.

cd /var/lib/docker/volumes2/opam-repo-ci_data/_data/var/job
ls -d * | parallel -j 5 rsync -azh c2-4.equinix.ci.dev:/var/lib/docker/volumes/opam-repo-ci_data/_data/var/job/{}/ {}/

The Ansible configuration script for OCaml-CI is misnamed as it configures the machine and deploys infrastructure: Caddy, Grafana, Prometheus and Docker secrets, but not the Docker stack. The Docker stack for OCaml-CI is deployed by make deploy-stack from ocurrent/ocaml-ci. Conversely, opam-repo-ci is deployed from the Ansible playbook, but there is a Makefile and an outdated stack.yml in ocurrent/opam-repo-ci.

As part of the migration away from Equinix, these services have been merged into a single large machine chives.caelum.ci.dev. With this change, I have moved the Docker stack configuration for opam-repo-ci back to the repository PR#428 and merged and renamed the machine configuration PR#44.

We want to thank Equinix for supporting OCaml over the years.

Moving OCaml-CI

2025-04-27T00:00:00+00:00

As noted on Thursday, the various OCaml services will need to be moved away from Equinix. Below are my notes on moving OCaml-CI.

Generate an SSH key on the new server chives using ssh-keygen -t ed25519. Copy the public key to c2-3.equinix.ci.dev and save it under ~/.ssh/authorized_keys.

Use rsync to mirror the Docker volumes. -z did improve performance as there appears to be a rate limiter somewhere in the path.

rsync -azvh --progress c2-3.equinix.ci.dev:/var/lib/docker/volumes/ /var/lib/docker/volumes/

After completing the copy, I waited for a quiet moment, and then scaled all of the Docker services to 0. I prefer to scale the services rather than remove them, as the recovery is much easier.

docker service scale infra_grafana=0
docker service scale infra_prometheus=0
docker service scale ocaml-ci_ci=0
docker service scale ocaml-ci_gitlab=0
docker service scale ocaml-ci_web=0

For the final copy, I used --checksum and also added --delete, as the Prometheus database creates segment files that are periodically merged into the main database.

rsync -azvh --checksum --delete --progress c2-3.equinix.ci.dev:/var/lib/docker/volumes/ /var/lib/docker/volumes/

The machine configuration is held in an Ansible Playbook, which includes the Docker stack for Grafana and Prometheus. It can be easily applied to the new machine:

ansible-playbook -e @secrets/ocaml.ci.dev.yml --vault-password-file secrets/vault-password ocaml.ci.dev.yml

OCaml-CI’s Docker stack is held on GitHub ocurrent/ocaml-ci and can be deployed with:

make deploy-stack

Bluesky SSH Authentication #2

2025-04-26T00:00:00+00:00

Addressing the glaring omissions from yesterday’s proof of concept, such as the fact that you could sign in as any user, you couldn’t revoke access, all hosts had the same users, and there was no mapping between Bluesky handles and POSIX users, I have updated mtelvers/bluesky-ssh-key-extractor and newly published mtelvers/bluesky-collection.

The tool creates ATProto collections using app.bsky.graph.list and populates them with app.bsky.graph.listitem records.

Each list should be named with a friendly identifier such as the FQDN of the host being secured. List entries have a subject_did, which is the DID of the user you are giving access to, and a displayName, which is used as the POSIX username on the system you are connecting to.

A typical usage would be creating a collection and adding records. Here I have made a collection called rosemary.caelum.ci.dev and then added to users anil.recoil.org and mtelvers.tunbury.org with POSIX usernames of avsm2 and mte24 respectively. Check my Bluesky record)

bluesky_collection create --handle mtelvers.tunbury.org --password *** --collection rosemary.caelum.ci.dev
bluesky_collection add --handle mtelvers.tunbury.org --password *** --collection rosemary.caelum.ci.dev --user-handle anil.recoil.org --user-id avsm2
bluesky_collection add --handle mtelvers.tunbury.org --password *** --collection rosemary.caelum.ci.dev --user-handle mtelvers.tunbury.org --user-id mte24

When authenticating using SSHD, the companion tool mtelvers/bluesky-ssh-key-extractor would have command line parameters of the Bluesky user account holding the collection, collection name (aka the hostname), and the POSIX username (provided by SSHD). The authenticator queries the Bluesky network to find the collection matching the FQDN, then finds the list entries comparing them to the POSIX user given. If there is a match, the subject_did is used to look up the associated sh.tangled.publicKey.The authenticator requires no password to access Bluesky, as all the records are public.

Bluesky SSH Authentication

2025-04-25T15:00:00+00:00

If you have sign up to tangled.sh you will have published your SSH public key on the Bluesky ATproto network. Have a browse to your Bluesky ID, or mine. Look under sh.tangled.publicKey.

BlueSky ATproto SSH Public Key Extractor extracts this public key information and outputs one public key at a time. The format is suitable to use with the AuthorizedKeysCommand parameter in your /etc/sshd/ssh_config file.

Build the project:

opam install . -deps-only
dune build

Install the binary by copying it to the local system. Setting the ownership and permissions is essential.

cp _build/install/default/bin/bluesky-ssh-key-extractor /usr/local/bin
chmod 755 /usr/local/bin/bluesky-ssh-key-extractor
chown root:root /usr/local/bin/bluesky-ssh-key-extractor

Test the command is working:

$ bluesky-ssh-key-extractor mtelvers.tunbury.org
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIA7UrJmBFWR3c7jVzpoyg4dJjON9c7t9bT9acfrj6G7i mark.elvers@tunbury.org

If that works, then edit your /etc/sshd/ssh_config:-

AuthorizedKeysCommand /usr/local/bin/bluesky-ssh-key-extractor your_bluesky_handle
AuthorizedKeysCommandUser nobody

Now you should be able to SSH to the machine using your published key

ssh root@your_host

Note, this program was intended as a proof of concept rather than something you’d actually use.

If you have a 1:1 mapping, between Bluesky accounts and system usernames, you might get away with:

AuthorizedKeysCommand /usr/local/bin/bluesky-ssh-key-extractor %u.bsky.social
AuthorizedKeysCommandUser nobody

Blade Server Reallocation

2025-04-25T10:15:00+00:00

We have changed our mind about using dm-cache in the SSD/RAID1 configuration. The current thinking is that the mechanical drives would be better served as extra capacity for our distributed ZFS infrastructure, where we intend to have two copies of all data, and these disks represent ~100TB of storage.

As mentioned previously, we have a deadline of Wednesday, 30th April, to move the workloads from the Equinix machines or incur hosting fees.

I also noted that the SSD capacity is 1.7TB in all cases. The new distribution is:

rosemary: FreeBSD CI Worker (releasing spring & summer)
oregano: OpenBSD CI Worker (releasing bremusa)
basil: Equinix c2-2 (registry.ci.dev)
mint: @mte24 workstation
thyme: spare
chives: Equinix c2-4 (opam-repo-ci) + Equinix c2-3 (OCaml-ci) + Equinix c2-1 (preview.dune.dev)
dill: spare
sage: docs-ci (new implementation, eventually replacing eumache)

OCaml Infra Map

2025-04-24T10:00:00+00:00

Yesterday, we were talking about extending the current infrastructure database to incorporate other information to provide prompts to return machines to the pool of resources after they have completed their current role/loan, etc. There is also a wider requirement to bring these services back to Cambridge from Equinix/Scaleway, which will be the subject of a follow-up post. However, the idea of extending the database made me think that it would be amusing to overlay the machine’s positions onto Google Maps.

I added positioning data in the Jekyll Collection _machines\*.md for each machine. e.g. ainia.md

---
name: ainia
...
latitude: 52.2109
longitude: 0.0917
---

Then Jekyll’s Liquid templating engine can create a JavaScript array for us

  // Define machines data array from Jekyll collection
  const machinesData = [
    {% for machine in site.machines %}
      {% if machine.latitude and machine.longitude %}
      {
        name: "{{ machine.name }}",
        lat: {{ machine.latitude }},
        lng: {{ machine.longitude }},
        {% if machine.description %}
        description: "{{ machine.description | escape }}",
        {% endif %}
        // Add any other properties you need
      },
      {% endif %}
    {% endfor %}
  ];

This array can be converted into an array of map markers. Google have an API for clustering the markers into a count of machines. I added a random offset to each location to avoid all the markers piling up on a single spot.

The interactive map can be seen at machines.html

Blade Server Allocation

2025-04-23T00:00:00+00:00

Equinix has stopped commercial sales of Metal and will sunset the service at the end of June 2026. Equinix have long been a supporter of OCaml and has provided free credits to use on their Metal platform. These credits are coming to an end at the end of this month, meaning that we need to move some of our services away from Equinix. We have two new four-node blade servers, which will become the new home for these services. The blades have dual 10C/20T processors with either 192GB or 256GB of RAM and a combination of SSD and spinning disk.

192GB, 20C/40T with 1.1TB SSD, 2 x 6T disks

rosemary: FreeBSD CI Worker (releasing spring & summer)
oregano: OpenBSD CI Worker (releasing bremusa)
basil: docs-ci (new implementation, eventually replacing eumache)
mint: spare

256GB, 20C/40T with 1.5TB SSD, 2 x 8T disks

thyme: Equinix c2-2 (registry.ci.dev)
chives: Equinix c2-4 (opam-repo-ci) + Equinix c2-3 (OCaml-ci) + Equinix c2-1 (preview.dune.dev)

256GB, 20C/40T with 1.1TB SSD, 2 x 6T disks

dill: spare
sage: spare

VMs currently running on hopi can be redeployed to chives, allowing hopi to be redeployed.

Machines which can then be recycled are:

sleepy (4C)
grumpy (4C)
doc (4C)
spring (8T)
tigger
armyofdockerness

OCaml < 4.14, Fedora 42 and GCC 15

2025-04-22T00:00:00+00:00

Late last week, @MisterDA added Fedora 42 support to the Docker base image builder. The new base images attempted to build over the weekend, but there have been a few issues!

The code I had previously added to force Fedora 41 to use the DNF version 5 syntax was specifically for version 41. For reference, the old syntax was yum groupinstall -y 'C Development Tools and Libraries’, and the new syntax is yum group install -y 'c-development'. Note the extra space.

let c_devtools_libs : (t, unit, string, t) format4 =
  match d with
  | `Fedora `V41 -> {|"c-development"|}
  | `Fedora _ -> {|"C Development Tools and Libraries"|}
  | _ -> {|"Development Tools”|}
...
let dnf_version = match d with `Fedora `V41 -> 5 | _ -> 3

To unburden ourselves of this maintenance in future releases, I have inverted the logic so unmatched versions will use the new syntax.

let (dnf_version, c_devtools_libs) : int * (t, unit, string, t) format4 =
  match d with
  | `Fedora
    ( `V21 | `V22 | `V23 | `V24 | `V25 | `V26 | `V27 | `V28 | `V29
    | `V30 | `V31 | `V32 | `V33 | `V34 | `V35 | `V36 | `V37 | `V38
    | `V39 | `V40 ) ->
    (3, {|"C Development Tools and Libraries"|})
  | `Fedora _ -> (5, {|"c-development"|})
  | _ -> (3, {|"Development Tools"|})

Fedora 42 also removed awk, so it now needs to be specifically included as a dependency. However, this code is shared with Oracle Linux, which does not have a package called awk. Fortunately, both have a package called gawk!

The next issue is that Fedora 42 is the first of the distributions we build base images for that has moved to GCC 15, specifically GCC 15.0.1. This breaks all versions of OCaml < 4.14.

The change is that the code below, which previously gave no information about the number or type of parameters. (see runtime/caml/prims.h)

typedef value (*c_primitive)();

Now means that there are no parameters, aka:

typedef value (*c_primitive)(void);

This is caused by a change of the default compilter language version. See GCC change log

C23 by default: GCC 15 changes the default language version for C compilation from -std=gnu17 to -std=gnu23. If your code relies on older versions of the C standard, you will need to either add -std= to your build flags, or port your code; see the porting notes.

Also see the porting notes, and this bug report.

This is not an immediate problem as OCaml-CI and opam-repo-ci only test against OCaml 4.14.2 and 5.3.0 on Fedora. I have opened issue#320 to track this problem.