CI for opam overlay repositories with day10 and GitHub merge queue
Mark Elvers
10 min read

Categories

  • ocaml,ci

Tags

  • tunbury.org

This post describes how to set up a CI for an opam overlay repository using day10 on a self-hosted GitHub Actions runner, with GitHub’s merge queue to gate PRs on build regressions.

The overlay repo contains opam package definitions for a handful of personal projects. The CI workflow builds every package in the overlay (plus the full upstream opam-repository) on every push, compares results against the previous run, and blocks merges that introduce regressions.

Overview

The setup has three parts:

  1. mtelvers/repo-tool builds an OCaml CLI that clones a list of git repos and generates an opam overlay repository from their .opam files
  2. day10 is a tool that solves and builds opam packages inside OCI containers, with layer caching
  3. A GitHub Actions workflow that ties it all together, running on a self-hosted runner with a large NVMe cache

The overlay repository

The overlay repo (tunbury/claude-repo) follows the standard opam repository layout:

claude-repo/
├── repo                              # opam-version: "2.0"
├── packages/
│   ├── braid/braid.dev/opam
│   ├── smtpd/smtpd.dev/opam
│   ├── zarr/zarr.0.1.0/opam
│   └── ...
└── .github/workflows/ci.yml

Each opam file has a url section pointing at the source repo and pinned to a specific git SHA:

url {
  src: "git+https://github.com/mtelvers/braid#0ea2907143a6ee54aa60c86f55fa753960afed4a"
}

The overlay is generated by repo-tool, which reads a list of git URLs from a text file, clones each repo, finds .opam files, strips the version: field, and appends the url block with the current HEAD SHA.

The self-hosted runner

The workflow runs on a self-hosted runner rather than GitHub-hosted runners because day10 needs runc and benefits from a persistent build cache. The runner is on a machine with 256 cores and NVMe mounted at /var/cache/day10.

Setting up the runner

Download and configure the runner:

useradd -m -s /bin/bash runner
cd /home/runner
mkdir -p actions-runner && cd actions-runner
RUNNER_VERSION=$(curl -s https://api.github.com/repos/actions/runner/releases/latest \
  | grep -oP '"tag_name": "v\K[^"]+')
curl -sL "https://github.com/actions/runner/releases/download/v${RUNNER_VERSION}/actions-runner-linux-x64-${RUNNER_VERSION}.tar.gz" \
  | tar xz

Get a registration token and configure:

# Get token (requires repo admin access)
TOKEN=$(gh api repos/OWNER/REPO/actions/runners/registration-token \
  -X POST --jq '.token')

sudo -u runner ./config.sh \
  --url https://github.com/OWNER/REPO \
  --token "$TOKEN" \
  --name my-runner \
  --labels self-hosted,linux,x64,day10 \
  --unattended

Install as a systemd service:

./svc.sh install runner
./svc.sh start

The runner user needs passwordless sudo for installing packages in the workflow and to run runc containers.

echo 'runner ALL=(ALL) NOPASSWD: ALL' > /etc/sudoers.d/runner
chmod 440 /etc/sudoers.d/runner

The workflow

The workflow is triggered by four events:

  • push to main – runs after every merge, establishes the baseline for regression detection
  • pull_request to main – runs on PR branches so the required status check passes before the PR can enter the merge queue
  • merge_group – runs when a PR enters the merge queue, with the regression gate active
  • workflow_dispatch – allows manual runs from the Actions UI or via gh workflow run

Step by step

Here is the complete workflow with commentary:

name: CI Build

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  merge_group:
  workflow_dispatch:

jobs:
  build:
    runs-on: [self-hosted, linux, x64, day10]
    steps:

Install dependencies

The self-hosted runner is a bare Ubuntu machine. These packages are installed at the start of each run rather than baked into the runner image, which serves as documentation of what the workflow needs:

      - name: Install dependencies
        run: sudo apt-get update && sudo apt-get install -y gh jq unzip

Checkout

The workflow checks out the overlay repo itself and the upstream opam-repository side by side. day10 accepts --opam-repository multiple times and merges them, with earlier repos taking priority:

      - name: Checkout claude-repo
        uses: actions/checkout@v4

      - name: Checkout opam-repository
        uses: actions/checkout@v4
        with:
          repository: ocaml/opam-repository
          path: opam-repository

Self-hosted runners reuse the workspace between runs, which causes git to complain about directory ownership. This is resolved by marking all directories as safe:

      - name: Mark workspace as safe for git
        run: git config --global --add safe.directory '*'

Download previous results

To detect regressions, the workflow downloads the build-results artefact from the most recent completed run on main. The continue-on-error: true means a missing artefact (e.g. on the first ever run) does not fail the job:

      - name: Download previous results
        continue-on-error: true
        env:
          GH_TOKEN: ${{ github.token }}
        run: |
          mkdir -p previous-results
          run_id=$(gh run list --repo ${{ github.repository }} \
            --branch main --workflow "CI Build" --status completed \
            --json databaseId --jq "[.[] | select(.databaseId != ${{ github.run_id }})][0].databaseId // empty")
          if [ -n "$run_id" ]; then
            gh run download "$run_id" --repo ${{ github.repository }} \
              --name build-results --dir previous-results
          fi

Install day10

day10 is distributed as a static binary attached to a GitHub release. No OCaml toolchain is needed on the runner:

      - name: Install day10
        run: |
          sudo curl -sL https://github.com/mtelvers/day10/releases/download/v0.0.1/day10-linux-x86_64 \
            -o /usr/local/bin/day10
          sudo chmod +x /usr/local/bin/day10

List, solve, and build

First, day10 list enumerates all packages compatible with the target compiler and OS across both repositories. The --json flag writes the package list to a file:

      - name: List packages
        run: |
          day10 list \
            --opam-repository opam-repository \
            --opam-repository . \
            --ocaml-version ocaml.5.4.1 \
            --os-distribution ubuntu --os-family debian --os-version 24.04 \
            --json packages.json

Then day10 health-check runs in two passes. The first pass (--dry-run) solves dependencies without building, using high parallelism (--fork 256). For each package, it produces a JSON file in results/ with a status of either solution (a valid dependency solution was found) or the cached result if they are present:

      - name: Solve packages (dry-run)
        run: |
          mkdir -p results
          day10 health-check \
            --cache-dir /var/cache/day10 \
            --opam-repository opam-repository \
            --opam-repository . \
            --ocaml-version ocaml.5.4.1 \
            --os-distribution debian --os-family debian --os-version 13 \
            --json results \
            --dry-run \
            --fork 256 \
            @packages.json

Before building, the results are filtered to extract only the solvable packages. The solve step is massively parallel (--fork 256) and completes in seconds, but building is more resource-intensive and runs with lower parallelism (--fork 64). By filtering out cached results first, the build step does not waste slots on cached results:

      - name: Filter solvable packages
        run: |
          jq -n --argjson pkgs "$(jq -s '[.[] | select(.status == "solution") | .name]' results/*.json)" \
            '{packages: $pkgs}' > solvable.json
          echo "Solvable: $(jq '.packages | length' solvable.json) / $(jq '.packages | length' packages.json)"

The second pass builds only the solvable packages which are not in the cache. Each package is built inside an OCI container managed by runc, with layers cached on the NVMe. The build step writes its results back into the same results/ directory, updating the status from solution to success, failure, or dependency_failed:

      - name: Build packages
        run: |
          day10 health-check \
            --cache-dir /var/cache/day10 \
            --opam-repository opam-repository \
            --opam-repository . \
            --ocaml-version ocaml.5.4.1 \
            --os-distribution debian --os-family debian --os-version 13 \
            --json results \
            --fork 64 \
            @solvable.json

Results and regression detection

The “Generate summary” step writes a GitHub Actions job summary containing:

  • A table of totals (success, failure, no solution, dependency failed)
  • An expandable list of failed packages with the last 5 lines of their build log
  • A diff against the previous run (newly broken, newly fixed, new packages, removed packages)
  • An expandable list of successful packages

With ~4300 packages, iterating the result files with per-file jq calls would fork tens of thousands of processes and take several minutes. Instead, the summary is generated in three bulk jq invocations: one to slurp all current results into an array, one to index previous results by name, and one to produce the entire summary markdown.

The results are concatenated using find -exec cat rather than shell glob expansion to avoid hitting ARG_MAX with thousands of files:

          find results -name '*.json' -exec cat {} + \
            | jq -s 'map({name, status, log: ...})' > /tmp/current.json

The diff section compares each package’s status against previous-results/. If a package result was success in the previous run but is now anything other than success, it is flagged as newly broken.

The regression information is exported via $GITHUB_OUTPUT, so the next step can act on it:

          has_regressions=$(jq --slurpfile prev /tmp/previous.json '
            [.[] | select(($prev[0][.name].status // null) == "success"
                          and .status != "success")]
            | if length > 0 then "true" else "false" end
          ' /tmp/current.json -r)
          echo "has_regressions=$has_regressions" >> $GITHUB_OUTPUT

The regression gate

This step runs only during merge queue checks (merge_group events) and only when regressions are detected. It fails the workflow, which prevents the merge queue from merging the PR:

      - name: Check for regressions
        if: github.event_name == 'merge_group' && steps.summary.outputs.has_regressions == 'true'
        run: |
          echo "::error::Regressions detected - the following packages broke:"
          cat /tmp/regressions.txt
          exit 1

On push and pull_request events, this step is skipped. The summary still reports regressions, but the workflow does not fail, as the push to main establishes the baseline and the PR run provides early feedback. The build cache is common, so the effort isn’t wasted.

Upload results

The results are uploaded as an artefact so the next run can download them for comparison:

      - name: Upload results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: build-results
          path: results/

GitHub merge queue

The merge queue ensures that every change to main has been tested against the current state of main before it lands.

How it works

  1. A contributor opens a PR
  2. The pull_request event triggers the CI workflow.
  3. When the build check passes, the contributor clicks “Merge when ready” (or uses gh pr merge --auto)
  4. GitHub adds the PR to the merge queue
  5. The merge queue creates a temporary branch (gh-readonly-queue/main/pr-N-...) that merges the PR on top of the current main
  6. The merge_group event triggers the CI workflow on this merged state
  7. If the build check passes and no regressions are detected, the PR is merged into main
  8. If the check fails (e.g. regressions detected), the PR is removed from the queue

Organisation requirement

Merge queues are only available for repositories owned by a GitHub organisation. It does not work on personal accounts, even for public repositories. Creating a free organisation is sufficient. No paid plan is required.

Configuring the merge queue

The merge queue is configured via a repository ruleset. This can be done through the GitHub UI (Settings > Rules > Rulesets) or via the API:

gh api repos/OWNER/REPO/rulesets -X POST --input - <<'EOF'
{
  "name": "main merge queue",
  "target": "branch",
  "enforcement": "active",
  "conditions": {
    "ref_name": {
      "include": ["refs/heads/main"],
      "exclude": []
    }
  },
  "rules": [
    {
      "type": "merge_queue",
      "parameters": {
        "check_response_timeout_minutes": 360,
        "grouping_strategy": "ALLGREEN",
        "max_entries_to_build": 5,
        "max_entries_to_merge": 5,
        "merge_method": "MERGE",
        "min_entries_to_merge": 1,
        "min_entries_to_merge_wait_minutes": 1
      }
    },
    {
      "type": "required_status_checks",
      "parameters": {
        "required_status_checks": [
          { "context": "build" }
        ],
        "strict_required_status_checks_policy": false
      }
    }
  ]
}
EOF

The required_status_checks rule is essential as it tells the merge queue which checks to wait for. Without it, PRs will be merged immediately without running any checks. The context value (build) must match the job name in the workflow.

Walkthrough with gh

Create a branch, make a change, and push:

git checkout -b my-feature
# ... make changes ...
git add -A && git commit -m "My change"
git push -u origin my-feature

Open a PR:

gh pr create --title "My change" --body "Description of the change"

Wait for the build check to pass on the PR branch:

gh pr checks 1 --watch

Add the PR to the merge queue:

gh pr merge 1 --auto

The --auto flag tells GitHub to merge the PR once all requirements are met. With the merge queue active, the PR enters the queue, the merge_group check runs, and if it passes, the PR is merged automatically.

Monitor the merge queue run:

# Find the merge_group run
gh run list --json databaseId,event,status --jq '.[] | select(.event == "merge_group")'

# Watch it
gh run watch <run-id>

Check if the PR was merged:

gh pr view 1 --json state,mergedAt

day10 result format

Each package produces a JSON file in the results/ directory:

{
  "name": "ocaml-slurm.dev",
  "status": "success",
  "sha": "4b07b3...",
  "layer": "22d45e...",
  "log": "Processing: [default: loading data]\n...",
  "solution": "digraph opam { ... }"
}

The status field is one of:

  • solution: the dry-run found a valid dependency solution (not yet built)
  • no_solution: the dependency solver could not find a valid solution
  • success: the package built and installed successfully
  • failure: the package build failed
  • dependency_failed: a dependency of this package failed to build

After the dry run, the status will be either solution, no_solution or the cached result. After the build step, solution entries are replaced with success, failure, or dependency_failed. All other entries remain unchanged.

The solution field contains a Graphviz DOT graph of the resolved dependency tree, and log contains the full build output.