Docker 29 and COPY --link --chown
Mark Elvers
8 min read

Categories

  • ci
  • docker
  • ocaml

Tags

  • tunbury.org

After deploying the QEMU RISC-V machines, the base image builder ran into issues on the first Docker build.

All of our CI builds use ocurrent/obuilder, which runs runc directly, giving us full control over how the layers are assembled. However, the initial layer is an extracted Docker container, docker export foo | tar -xf -, which is rebuilt every Saturday with the latest by the ocurrent/docker-base-images. The Dockerfile generated by ocurrent/ocaml-dockerfile

Today, one of the new QEMU RISC-V machines picked up the Docker build job and failed fairly near the end with a strange error:

#25 [stage-1 35/40] COPY --link --chown=opam:opam [ ".", "/home/opam/opam-repository" ]
#25 ERROR: invalid user index: -1
------
Dockerfile:76
--------------------
  74 |     RUN git config --global user.email "docker@example.com"
  75 |     RUN git config --global user.name "Docker"
  76 | >>> COPY --link --chown=opam:opam [ ".", "/home/opam/opam-repository" ]
  77 |     RUN opam-sandbox-disable
  78 |     RUN opam init -k git -a /home/opam/opam-repository --bare
--------------------
ERROR: failed to build: failed to solve: invalid user index: -1

invalid user index: -1 is a BuildKit error, which originates from the interaction of the two flags on COPY. Firstly, a quick recap on what --link actually does.

Ordinarily, COPY writes its files directly on top of the previous layer’s filesystem. The resulting layer therefore depends on everything beneath it. Any changes in an earlier layer and every subsequent COPY has to be rebuilt, as it now sites a different base.

COPY --link (and ADD --link), introduced in BuildKit v0.10, breaks the dependency. Instead of copying onto the parent filesystem, it copies the files into an independent layer built on an implicit FROM scratch, and then stitches that layer in using BuildKit’s MergeOp. Because the copied layer no longer depends on the layers below it, it can be cached and reused even when earlier instructions change, and images can be rebased onto a new parent without rebuilding.

Because it is an independant layer, --link, has no parent root filesystem mounted at copy time (only scratch). So if --chown is given a name rather than a numeric ID, there is nothing to resolve that name against:

  • --chown=opam:opam uses a user name, which BuildKit must resolve to a numeric UID/GID by reading /etc/passwd in the filesystem the copy is applied to.
  • --link means that filesystem is scratch, which has no /etc/passwd.

-1 is the exact manifest of that that issue. The user cannot be resolved. BuildKit aborts with invalid user index: -1. Numeric IDs avoid it entirely as there is nothing to look up. The BuildKit docs say “Using numeric IDs requires no lookup and does not depend on container root filesystem content.”

With that detail, we can create a trivial Dockerfile to test, all we need is a COPY. Note that ubuntu:noble already has a UID-1000 user called ubuntu, so no need to useradd --uid 1000. The cleanest reproduction therefore copies a file from the build context and chowns it to the pre-existing ubuntu user:

mkdir /tmp/ctx && echo hi > /tmp/ctx/foo
printf 'FROM ubuntu:noble\nCOPY --link --chown=ubuntu:ubuntu foo /home/ubuntu/x\n' > /tmp/ctx/Dockerfile
docker build --no-cache --pull -f /tmp/ctx/Dockerfile /tmp/ctx

On my new RISC-V worker:

 => ERROR [2/2] COPY --link --chown=ubuntu:ubuntu foo /home/ubuntu/x
------
ERROR: failed to build: failed to solve: invalid user index: -1

Swap the name for the numeric UID/GID and it builds cleanly:

printf 'FROM ubuntu:noble\nCOPY --link --chown=1000:1000 foo /home/ubuntu/x\n' > /tmp/ctx/Dockerfile
docker build --no-cache --pull -f /tmp/ctx/Dockerfile /tmp/ctx

Easy fix. Let’s create a PR! However, why is this a problem now? That same Docker file ran last week without a problem, which absolutely points at the new RISC-V workers as the only changed piece of the puzzle.

A quick search online for invalid user index: -1 find docker/buildx#1526 and docker/docs#20660. They describe this exact error, with this exact --link --chown=name mechanism. Perfect, I can link to it in my PR.

The story told on the those issues is about a version change, essentially before BuildKit v0.10, --link was not really implemented; the flag was silently ignored and the copy fell back to a normal copy onto the real filesystem, where username resolution worked correctly. BuildKit v0.10 made --link real, with the scratch-layer behaviour described above, and at that moment COPY --link --chown=name started failing for people who had unknowingly been relying on the flag doing nothing. I was a bit worried about how old BuildKit v0.10 was and the fact that these issues were closed in January 2023.

Last week’s run was on bare metal RISC-V running Ubuntu Noble which was released in April 2024 so BuildKit v0.10 was already ancient history. This was confirmed by directly on the machines.

# old worker
root@riscv-bm-01:~# docker buildx inspect default --bootstrap | grep -i buildkit
Buildkit:  v0.20.2

# new worker
root@riscv-qemu-11:~# docker buildx inspect default --bootstrap | grep -i buildkit
BuildKit version: v0.26.2

Those versions are quite different though. I built the new QEMU workers with Ubuntu 26.04 so perhaps there is a regression between v0.20.2 and v0.26.2. I cloned BuildKit and diffed the relevant code. The frontend that lowers COPY --link to the low-level build (frontend/dockerfile/dockerfile2llb/convert_copy.go) was identical as was the solver check that emits the error (solver/llbsolver/ops/file.go).

Looking the full output from buildx inspect there are other differences:

# old worker — note: no containerd labels
root@riscv-bm-01:~# docker buildx inspect default --bootstrap | grep -iE 'executor|snapshotter'
(nothing)

# new worker
root@riscv-qemu-11:~# docker buildx inspect default --bootstrap | grep -iE 'executor|snapshotter'
 org.mobyproject.buildkit.worker.executor:    containerd
 org.mobyproject.buildkit.worker.snapshotter: overlayfs

The old worker uses the moby graphdriver worker while the new worker uses the containerd image store. So perhaps the difference is just down to which store driver is being used.

There is a feature flag in /etc/docker/daemon.json which can enable or disable the containerd image store:

{
  "features": {
    "containerd-snapshotter": true
  }
}

The containerd driver is included in BuildKit v0.20.2 therefore, I can just enable the feature on the original machine and run my test Dockerfile.

On the old worker (BuildKit v0.20.2), with "containerd-snapshotter": false (which is the default):

root@riscv-bm-01:~# docker buildx inspect default --bootstrap | grep -iE 'executor|snapshotter'
(nothing — graphdriver)
root@riscv-bm-01:~# docker build --no-cache --pull -f /tmp/ctx/Dockerfile /tmp/ctx 2>&1 | grep -i ERROR
(empty — success)

Then set "containerd-snapshotter": true, and restart the Docker service:

root@riscv-bm-01:~# docker buildx inspect default --bootstrap | grep -iE 'executor|snapshotter'
 org.mobyproject.buildkit.worker.executor:    containerd
 org.mobyproject.buildkit.worker.snapshotter: overlayfs
root@riscv-bm-01:~# docker build --no-cache --pull -f /tmp/ctx/Dockerfile /tmp/ctx 2>&1 | grep -i ERROR
ERROR: failed to solve: invalid user index: -1

And there we have it, the same machine, same Docker, same BuildKit v0.20.2 with the only difference being containerd-snapshotter. Repeating the experiment on the new worker fills in the matrix:

  graphdriver containerd image store
BuildKit v0.20.2 builds invalid user index: -1
BuildKit v0.26.2 builds invalid user index: -1

The worker backend is the sole determinant. The BuildKit version is irrelevant. COPY --link --chown=<name> works on the graphdriver worker and fails on the containerd image-store worker, on both versions.

I have been updating some of the x86_64 workers to Ubuntu 26.04 so we can test io_uring, but now I’m concerned that there will be problems on any machine which I have updated.

kydoime was upgraded from 24.04 to 26.04 earlier today:

root@kydoime:~# docker --version
Docker version 29.1.3, build 29.1.3-0ubuntu4.1

root@kydoime:~# cat /etc/docker/daemon.json
{
  "experimental": true
}

root@kydoime:~# docker info --format 'Driver={{.Driver}}'
Driver=overlay2

root@kydoime:~# docker buildx inspect default --bootstrap | grep -iE 'executor|snapshotter'
(nothing — graphdriver)

Docker 29.1.3, is unsurprisingly the same version as the RISC-V machine which was failing, but kydoime runs overlay2 and there is no mention of containerd. daemon.json: it’s just { "experimental": true }, with an mtime from February 2022 so there’s no extra configuration hiding there pinning it to the older format.

Looking in moby’s source code, daemon/image_store_choice.go:

out := imageStoreChoiceContainerd          // Linux default is containerd
...
if enabled, ok := cfgStore.Features["containerd-snapshotter"]; ok {
    // honour the explicit feature flag if set
    ...
}
...
if out == imageStoreChoiceContainerd {
    if opts.hasPriorDriver(cfgStore.Root) {
        return imageStoreChoiceGraphdriverPrior, nil
    }
}

with the telling constant:

// would be containerd, but the system has already been running with a graphdriver
imageStoreChoiceGraphdriverPrior imageStoreChoice = "graphdriver-prior"

hasPriorDriver is graphdriver.HasPriorDriver, in daemon/graphdriver/driver.go:

func HasPriorDriver(root string) bool {
    return len(scanPriorDrivers(root)) > 0
}

// scanPriorDrivers returns an un-ordered scan of directories of prior storage
// drivers. The 'vfs' storage driver is not taken into account, and ignored.
func scanPriorDrivers(root string) map[string]bool {
    driversMap := make(map[string]bool)
    for driver := range drivers {
        p := filepath.Join(root, driver)
        if _, err := os.Stat(p); err == nil && driver != "vfs" {
            if !isEmptyDir(p) {
                driversMap[driver] = true
            }
        }
    }
    return driversMap
}

So at every daemon start, with the containerd-snapshotter feature unset, Docker 29 wants to use the containerd image store, but first it scans /var/lib/docker for a non-empty prior driver directory (overlay2, aufs, btrfs, etc, ignoring vfs). If it finds one, it stays on the graph driver to avoid orphaning the existing images.

This is interesting because it means that if you have upgraded, you’ll be running on the old format, but if you later wipe /var/lib/docker and restart you’ll be moved to the new format. The deciding factor whether this returns anything or not:

ls -A /var/lib/docker/overlay2 | head

The change in default is documented in Docker Engine v29 release notes where it states that the containerd image store becomes the default for new installs, and the legacy graph drivers are now deprecated. The Ubuntu 26.04 release notes put it plainly:

docker.io updated to version 29 … The containerd image store is now the default for fresh installs. This doesn’t apply to daemons configured with userns-remap or for users upgrading from a previous docker.io version.

I could add containerd-snapshotter: false in /etc/docker/daemon.json, but that’s pinning us a past “legacy graph driver”. Therefore, we should change the Dockerfile to use numeric IDs. The opam user is always created with --uid 1000, so the generated COPY should be:

COPY --link --chown=1000:1000 [ ".", "/home/opam/opam-repository" ]

PR#352