Improve the deployment time for opam2web
Mark Elvers
2 min read

Categories

  • opam

Tags

  • tunbury.org

The opam2web image for opam.ocaml.org is huge weighing in at more than 25 GB. The bulk of this data is opam archives, which are updated and copied into a stock caddy image.

There are two archives, ocaml/opam.ocaml.org-legacy, which hasn’t changed for 5 years and holds the cache for opam 1.x and ocaml/opam:archive, which is updated weekly.

The current Dockerfile copies these files into a new layer each time opam2web builds.

FROM --platform=linux/amd64 ocaml/opam:archive as opam-archive
FROM ocaml/opam.ocaml.org-legacy as opam-legacy
FROM alpine:3.20 as opam2web
...
COPY --from=opam-legacy . /www
...
RUN --mount=type=bind,target=/cache,from=opam-archive rsync -aH /cache/cache/ /www/cache/
...

And later, the entire /www structure is copied into a caddy:2.8.4 image.

FROM caddy:2.8.4
WORKDIR /srv
COPY --from=opam2web /www /usr/share/caddy
COPY Caddyfile /etc/caddy/Caddyfile
ENTRYPOINT ["caddy", "run", "--config", "/etc/caddy/Caddyfile", "--adapter", "caddyfile"]

This method is considered “best practice” when creating Docker images, but in this case, it produces a very large image, which takes a long time to deploy.

For Docker to use an existing layer, we need the final FROM ... to be the layer we want to use as the base. In the above snippet, the caddy:2.8.4 layer will be the base layer and will be reused.

The archive, ocaml/opam:archive, is created by this Dockerfile, which ultimately uses alpine:latest.

FROM ocaml/opam:archive AS opam-archive
FROM ocurrent/opam-staging@sha256:f921cd51dda91f61a52a2c26a8a188f8618a2838e521d3e4afa3ca1da637903e AS archive
WORKDIR /home/opam/opam-repository
RUN --mount=type=bind,target=/cache,from=opam-archive rsync -aH /cache/cache/ /home/opam/opam-repository/cache/
RUN opam admin cache --link=/home/opam/opam-repository/cache

FROM alpine:latest
COPY --chown=0:0 --from=archive [ "/home/opam/opam-repository/cache", "/cache" ]

In our opam2web build, we could use FROM ocaml/opam:archive and then apk add caddy, which would reuse the entire 15GB layer and add the few megabytes for caddy.

ocaml/opam.ocaml.org-legacy is another 8GB. This legacy data could be integrated by adding it to ocaml/opam:archive in a different directory to ensure compatibility with anyone else using this image. This is PR#324

 let install_package_archive opam_image =
   let open Dockerfile in
+  from ~alias:"opam-legacy" "ocaml/opam.ocaml.org-legacy" @@
   from ~alias:"opam-archive" "ocaml/opam:archive" @@
   from ~alias:"archive" opam_image @@
   workdir "/home/opam/opam-repository" @@
   run ~mounts:[mount_bind ~target:"/cache" ~from:"opam-archive" ()] "rsync -aH /cache/cache/ /home/opam/opam-repository/cache/" @@
   run "opam admin cache --link=/home/opam/opam-repository/cache" @@
   from "alpine:latest" @@
+  copy ~chown:"0:0" ~from:"opam-legacy" ~src:["/"] ~dst:"/legacy" () @@
   copy ~chown:"0:0" ~from:"archive" ~src:["/home/opam/opam-repository/cache"] ~dst:"/cache" ()

Finally, we need to update opam2web to use ocaml/opam:archive as the base layer rather than caddy:2.8.4, resulting in the final part of the Dockerfile looking like this.

FROM ocaml/opam:archive
RUN apk add --update git curl rsync libstdc++ rdfind caddy
COPY --from=build-opam2web /opt/opam2web /usr/local
COPY --from=build-opam-doc /usr/bin/opam-dev /usr/local/bin/opam
COPY --from=build-opam-doc /opt/opam/doc /usr/local/share/opam2web/content/doc
COPY ext/key/opam-dev-team.pgp /www/opam-dev-pubkey.pgp
ADD bin/opam-web.sh /usr/local/bin
ARG DOMAIN=opam.ocaml.org
ARG OPAM_REPO_GIT_SHA=master
ARG BLOG_GIT_SHA=master
RUN echo ${OPAM_REPO_GIT_SHA} >> /www/opam_git_sha
RUN echo ${BLOG_GIT_SHA} >> /www/blog_git_sha
RUN /usr/local/bin/opam-web.sh ${DOMAIN} ${OPAM_REPO_GIT_SHA} ${BLOG_GIT_SHA}
WORKDIR /srv
COPY Caddyfile /etc/caddy/Caddyfile
ENTRYPOINT ["caddy", "run", "--config", "/etc/caddy/Caddyfile", "--adapter", "caddyfile"]

I acknowledge that this final image now contains some extra unneeded packages such as git, curl, etc, but this seems a minor inconvenience.

The Caddyfile can be adjusted to make everything still appear to be in the same place:

:80 {
	redir /install.sh https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh
	redir /install.ps1 https://raw.githubusercontent.com/ocaml/opam/master/shell/install.ps1

	@version_paths path /1.1/* /1.2.0/* /1.2.2/*
	handle @version_paths {
		root * /legacy
		file_server
	}

	handle /cache/* {
		root * /
		file_server
	}

	handle {
		root * /www
		file_server
	}
}

In this configuration, the Docker push is only 650MB rather than 25GB.

The changes to opam2web are in PR#245

Test with some external URLs: