Files
alertmanager-gotify-bridge/ci/MIGRATION.md
T
Oleks e8f3e954e7 ci: design nix2container migration + scaffold amd64 publish app (emmett#44)
Archetype: oci-image (buildx -> in-cluster remote buildkit), the HARD case.
DESIGN/PARTIAL, not a finished migration:

- ci/MIGRATION.md: concrete plan to escape buildkit via nix2container/skopeo
  +regctl. The app is pure-stdlib Python, so both arches are buildable on
  emmett (amd64-native + Nix-cross-from-amd64 python3 closure) with no
  buildkit/qemu/docker -> no foreign-arch leg needed; Dockerfile retired on
  cutover. Covers per-arch build, entrypoints, .woodpecker.yaml target,
  escape hatch (unused here), risks, remaining work.
- flake.nix: scaffolds the natively-buildable amd64 leg only
  (stage-amd64, publish-amd64), dry-run by default (PUBLISH=1 to push),
  $REGISTRY_TOKEN -> pass fallback, registry-down/empty-token blockers.
  Mirrors reference impl claude-plugin-registry@9850745.

arm64 leg, publish-index/publish, and YAML cutover are designed but NOT wired.
Verified: nix eval .#apps.x86_64-linux (-> stage-amd64, publish-amd64); no
image build run (downloads closure).
2026-06-02 03:39:20 +03:00

10 KiB

Migration plan: buildx → nix2container/skopeo+regctl (emmett#44)

Status: DESIGN + PARTIAL SCAFFOLD. This document is a concrete plan, not a completed migration. A flake.nix in the repo root scaffolds the natively-buildable amd64 leg only (nix run .#publish-amd64, dry-run by default). The arm64 leg and the .woodpecker.yaml cutover are designed here but not yet wired — see "Remaining work".

Archetype: oci-image (buildx → in-cluster remote buildkit) — the HARD archetype in the emmett#44 local-pipeline-parity standard. Tracking: oleks/cluster milestone #57.


1. Why this repo is an easy migration (feasibility)

The current pipeline (.woodpecker.yaml) does:

docker buildx create --driver remote tcp://buildkit-rootless-arm64.infra.svc...
docker buildx build --platform linux/amd64,linux/arm64 --push .

i.e. it depends on in-cluster remote buildkit for the multi-arch build and on a node-pinned step (howard2404). That is exactly the cluster-coupled, not-reproducible-on-emmett shape emmett#44 wants gone.

The application (bridge.py) is the easiest possible payload:

  • Pure Python standard library. json, os, sys, http.server, urllib. No pip install, no requirements file, no C/Rust extension, no native wheel.
  • The Dockerfile is 5 lines: FROM python:3.12-alpine, copy one file, run it.

Consequences for nix2container:

  • The image's only real runtime dependency is a CPython interpreter + its Nix closure (glibc, openssl, zlib, ncurses, ...). bridge.py is a static asset copied next to it.
  • CPython is in nixpkgs and cross-compiles cleanly with pkgsCross — there is no source build of the app to cross, only the standard interpreter, which the binary cache already serves for aarch64. So both arches are buildable on emmett (amd64-native + Nix-cross-from-amd64) with no buildkit, no qemu, no docker daemon. This satisfies the emmett-OK definition in the standard ("amd64-native OR Nix-cross-from-amd64, never 'uses skopeo' as the build, and not a foreign-arch buildkit leg").

This repo is therefore a clean reference for "oci-image buildx leg that fully escapes buildkit", not one of the genuinely-hard foreign-arch cases.


2. What the flake builds, per arch

Mirrors the reference impl (oleks/claude-plugin-registry @ 9850745), minus the Rust/Dioxus build (we have no compiled artifact):

inputs: fleet (nixpkgs-projects pin), nix2container, flake-utils

mkApp targetPkgs:
  # a tiny derivation that places bridge.py + a python3 symlink under /app
  appRoot = runCommand "app-root-<arch>":
    mkdir -p $out/app
    cp ${./bridge.py}                       $out/app/bridge.py
    ln -s ${targetPkgs.python3}/bin/python3 $out/app/python3   # closure tracked

mkImage arch = nix2container.buildImage {
  name = "git.oleks.space/oleks/alertmanager-gotify-bridge";
  tag  = "${version}-${arch}";
  inherit arch;                         # "amd64" | "arm64"
  layers = [ (buildLayer {
    copyToRoot = [ (appRoot arch) cacert ];
    maxLayers = 25;
    reproducible = false;               # see Determinism note
  }) ];
  config = {
    Cmd        = [ "/app/python3" "/app/bridge.py" ];
    WorkingDir = "/app";
    ExposedPorts = { "8080/tcp" = {}; };
    Env = [ "PORT=8080" "SSL_CERT_FILE=/etc/ssl/certs/ca-bundle.crt" ];
  };
};
  • amd64: targetPkgs = pkgs (native). Fully buildable + pushable on emmett.
  • arm64: targetPkgs = pkgs.pkgsCross.aarch64-multiplatform. Cross from amd64; the aarch64 python3 closure comes from the binary cache. No qemu.

The Alpine base in the current Dockerfile is intentionally dropped: nix2container ships only the interpreter closure, so the image is smaller and has no distro package layer. urllib+TLS works because cacert is copied in and SSL_CERT_FILE points at it.


3. Entrypoints (uniform front door)

Per the corrected standard, the flake apps ARE the shared code; .woodpecker.yaml and local runs invoke the same nix run:

app does
stage-amd64 / stage-arm64 realize the image into the local Nix store, no registry contact (BUILD/STAGE-parity half)
publish-amd64 / publish-arm64 stage + copyTo (skopeo) push that arch (PUBLISH half)
publish-index build-free: regctl index create from pushed per-arch refs, then :latest = digest copy of :TAG as the LAST mutation
publish both arches + publish-index

Conventions carried verbatim from the reference:

  • Token: $REGISTRY_TOKEN → fallback pass infra/gitea/personal_access_token_packages_rw → named BLOCKER(empty-token). Never echoed; apps run under set -euo pipefail only (no set -x).
  • Dry-run by default: every push-capable app builds + prints what it would push; PUBLISH=1 / --publish required to mutate the registry. An accidental local run cannot push.
  • VERSION/TAG: TAG=${VERSION:-<flake version>}; CI derives version from CI_COMMIT_TAG (strip leading v). $VERSION overrides for local dev.
  • Dev-tag guard: :latest is only moved for a real release tag; dev tags push per-arch + immutable index only.
  • Preflight: BLOCKER(registry-down) if https://git.oleks.space/v2/ is unreachable (names the cluster-shared-fate failure mode).
  • Multi-arch: index assembled from digest-pinned per-arch refs, fails closed if a required arch is absent this run, :latest is a digest COPY (not a re-assembly) and the last mutation, staging per-arch tags can be pruned; gitea-oci-cleanup pins index child digests.

4. .woodpecker.yaml after migration (target)

The buildx/howard-pinned step is replaced by a single nix-ci step that runs the same app the laptop runs:

when: [{ event: tag, ref: "refs/tags/v*" }]
steps:
  - name: publish
    image: git.oleks.space/oleks/nix-ci:latest
    environment:
      REGISTRY_TOKEN: { from_secret: registry_token }
    commands:
      - export VERSION="$(echo "$CI_COMMIT_TAG" | sed 's/^v//')"
      - nix run .#publish -- --publish      # PUBLISH=1 via flag; CI is the only publisher

Gone: docker login, buildx create --driver remote, the arm64 remote builder, the nodeSelector: howard2404 pin, the multi-arch --platform build. The skip_clone+manual-clone dance can also drop to a normal clone with tags (needed for git describe fallback) once the version is derived in shared code.


5. Foreign-arch escape hatch

Not needed for this repo — arm64 is a pure Nix cross of a stock interpreter, so there is no foreign-arch buildkit leg at all. The general escape hatch (kept for the upstream-source oci-image repos that genuinely need it) is documented for the archetype, not used here:

For images whose payload genuinely cannot be cross-compiled by Nix (an upstream binary only published for a foreign arch, or a build that won't cross), keep a Dockerfile + a buildx step that parameterizes BUILDKIT_ADDR (default docker-container://local, CI overrides to the in-cluster tcp://buildkit-rootless-<arch>.infra.svc). That leg stays cluster-coupled by necessity; its per-arch digest still feeds the same regctl index create/publish-index join point, so the multi-arch assembly is uniform across native, cross, and foreign-arch legs.

Because alertmanager-gotify-bridge has neither a compiled artifact nor a foreign-only upstream binary, the escape hatch is dead code here and is not added — the Dockerfile is retired entirely on cutover.


6. Risks / caveats

  1. Determinism (nix2container reproducible = false). Same caveat as the reference: parity holds only when emmett and CI resolve the identical store path for the python3 closure from the shared cache. With the fleet nixpkgs-projects pin + flake.lock committed this holds; verify by comparing the pushed digest after a release. For a pure-stdlib interpreter image the risk is low (no project-specific compiler in the closure), but it is real.
  2. TLS / cert path. urllib to Gotify over HTTPS needs cacert in the image and SSL_CERT_FILE set — handled in config.Env above. Must be verified once against the live Gotify endpoint (the Alpine image got certs from the distro; we now ship them explicitly).
  3. Image shape change. Switching from python:3.12-alpine to a Nix closure changes the digest, size, and layer layout. Any consumer pinning a specific base-layer digest (unlikely here) would need updating. The CMD path changes from python (PATH) to /app/python3 (absolute symlink).
  4. arm64 cache coverage. The cross build is only fast if the aarch64 python3 closure is in the binary cache; a cold cache makes the first emmett arm64 run slow (still correct, no qemu). Aligns with the fleet-pins strategy.
  5. publish-index requires regctl/skopeo in nix-ci. The reference already relies on these being present; confirm the nix-ci image (or the app's runtimeInputs) provides regctl, skopeo (via copyTo), curl.

7. Remaining work (to finish the migration)

  • flake.nix scaffolding the amd64 leg (stage-amd64, publish-amd64) — present, dry-run by default. (this commit)
  • Add the arm64 cross leg (stage-arm64, publish-arm64) and publish / publish-index apps (copy the reference's index machinery verbatim).
  • Verify the cross build resolves from the binary cache (no source rebuild).
  • Verify TLS to Gotify from the Nix image (cacert + SSL_CERT_FILE).
  • Cut over .woodpecker.yaml to nix run .#publish -- --publish; delete the buildx/remote-builder/howard-pin steps and the Dockerfile.
  • One real release: compare emmett-built vs CI-pushed digest (determinism check).
  • Add just publish / just stage front door if/when the repo gains a justfile (uniform across all archetypes).

When N>3 oci-image repos have migrated, factor the per-arch image + publish apps into the shared semver-tagged ci-archetypes flake-module (parameterized) and have repos pin it via inputs.ci-archetypes?ref=<pin> rather than copying the flake.