From e8f3e954e76b62617102ba0c3f1bbc6b0c5d8707 Mon Sep 17 00:00:00 2001 From: Oleks Date: Mon, 1 Jun 2026 23:41:44 +0300 Subject: [PATCH] ci: design nix2container migration + scaffold amd64 publish app (emmett#44) Archetype: oci-image (buildx -> in-cluster remote buildkit), the HARD case. DESIGN/PARTIAL, not a finished migration: - ci/MIGRATION.md: concrete plan to escape buildkit via nix2container/skopeo +regctl. The app is pure-stdlib Python, so both arches are buildable on emmett (amd64-native + Nix-cross-from-amd64 python3 closure) with no buildkit/qemu/docker -> no foreign-arch leg needed; Dockerfile retired on cutover. Covers per-arch build, entrypoints, .woodpecker.yaml target, escape hatch (unused here), risks, remaining work. - flake.nix: scaffolds the natively-buildable amd64 leg only (stage-amd64, publish-amd64), dry-run by default (PUBLISH=1 to push), $REGISTRY_TOKEN -> pass fallback, registry-down/empty-token blockers. Mirrors reference impl claude-plugin-registry@9850745. arm64 leg, publish-index/publish, and YAML cutover are designed but NOT wired. Verified: nix eval .#apps.x86_64-linux (-> stage-amd64, publish-amd64); no image build run (downloads closure). --- ci/MIGRATION.md | 222 ++++++++++++++++++++++++++++++++++++++++++++++++ flake.lock | 140 ++++++++++++++++++++++++++++++ flake.nix | 204 ++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 566 insertions(+) create mode 100644 ci/MIGRATION.md create mode 100644 flake.lock create mode 100644 flake.nix diff --git a/ci/MIGRATION.md b/ci/MIGRATION.md new file mode 100644 index 0000000..8e92709 --- /dev/null +++ b/ci/MIGRATION.md @@ -0,0 +1,222 @@ +# Migration plan: buildx → nix2container/skopeo+regctl (emmett#44) + + + + +**Status: DESIGN + PARTIAL SCAFFOLD.** This document is a concrete plan, not a +completed migration. A `flake.nix` in the repo root scaffolds the +**natively-buildable amd64 leg only** (`nix run .#publish-amd64`, dry-run by +default). The arm64 leg and the `.woodpecker.yaml` cutover are designed here but +**not yet wired** — see "Remaining work". + +Archetype: `oci-image` (buildx → in-cluster remote buildkit) — the HARD +archetype in the emmett#44 local-pipeline-parity standard. Tracking: oleks/cluster +milestone #57. + +--- + +## 1. Why this repo is an easy migration (feasibility) + +The current pipeline (`.woodpecker.yaml`) does: + +``` +docker buildx create --driver remote tcp://buildkit-rootless-arm64.infra.svc... +docker buildx build --platform linux/amd64,linux/arm64 --push . +``` + +i.e. it depends on **in-cluster remote buildkit** for the multi-arch build and on +a node-pinned step (`howard2404`). That is exactly the cluster-coupled, +not-reproducible-on-emmett shape emmett#44 wants gone. + +The application (`bridge.py`) is the easiest possible payload: + +- **Pure Python standard library.** `json`, `os`, `sys`, `http.server`, + `urllib`. No `pip install`, no requirements file, no C/Rust extension, no + native wheel. +- The Dockerfile is 5 lines: `FROM python:3.12-alpine`, copy one file, run it. + +Consequences for nix2container: + +- The image's only real runtime dependency is a **CPython interpreter + its Nix + closure** (glibc, openssl, zlib, ncurses, ...). `bridge.py` is a static asset + copied next to it. +- CPython is in nixpkgs and **cross-compiles cleanly with `pkgsCross`** — there + is no source build of the app to cross, only the standard interpreter, which + the binary cache already serves for aarch64. So **both arches are buildable on + emmett (amd64-native + Nix-cross-from-amd64)** with **no buildkit, no qemu, no + docker daemon**. This satisfies the emmett-OK definition in the standard + ("amd64-native OR Nix-cross-from-amd64, never 'uses skopeo' as the build, and + not a foreign-arch buildkit leg"). + +This repo is therefore a clean reference for "oci-image buildx leg that fully +escapes buildkit", not one of the genuinely-hard foreign-arch cases. + +--- + +## 2. What the flake builds, per arch + +Mirrors the reference impl (oleks/claude-plugin-registry @ 9850745), minus the +Rust/Dioxus build (we have no compiled artifact): + +``` +inputs: fleet (nixpkgs-projects pin), nix2container, flake-utils + +mkApp targetPkgs: + # a tiny derivation that places bridge.py + a python3 symlink under /app + appRoot = runCommand "app-root-": + mkdir -p $out/app + cp ${./bridge.py} $out/app/bridge.py + ln -s ${targetPkgs.python3}/bin/python3 $out/app/python3 # closure tracked + +mkImage arch = nix2container.buildImage { + name = "git.oleks.space/oleks/alertmanager-gotify-bridge"; + tag = "${version}-${arch}"; + inherit arch; # "amd64" | "arm64" + layers = [ (buildLayer { + copyToRoot = [ (appRoot arch) cacert ]; + maxLayers = 25; + reproducible = false; # see Determinism note + }) ]; + config = { + Cmd = [ "/app/python3" "/app/bridge.py" ]; + WorkingDir = "/app"; + ExposedPorts = { "8080/tcp" = {}; }; + Env = [ "PORT=8080" "SSL_CERT_FILE=/etc/ssl/certs/ca-bundle.crt" ]; + }; +}; +``` + +- **amd64**: `targetPkgs = pkgs` (native). Fully buildable + pushable on emmett. +- **arm64**: `targetPkgs = pkgs.pkgsCross.aarch64-multiplatform`. Cross from + amd64; the aarch64 `python3` closure comes from the binary cache. No qemu. + +> The Alpine base in the current Dockerfile is intentionally dropped: nix2container +> ships only the interpreter closure, so the image is smaller and has no distro +> package layer. `urllib`+TLS works because `cacert` is copied in and +> `SSL_CERT_FILE` points at it. + +--- + +## 3. Entrypoints (uniform front door) + +Per the corrected standard, the flake apps ARE the shared code; `.woodpecker.yaml` +and local runs invoke the same `nix run`: + +| app | does | +|-----|------| +| `stage-amd64` / `stage-arm64` | realize the image into the local Nix store, **no registry contact** (BUILD/STAGE-parity half) | +| `publish-amd64` / `publish-arm64` | stage + `copyTo` (skopeo) push that arch (PUBLISH half) | +| `publish-index` | build-free: `regctl index create` from pushed per-arch refs, then `:latest` = digest copy of `:TAG` as the LAST mutation | +| `publish` | both arches + `publish-index` | + +Conventions carried verbatim from the reference: + +- **Token**: `$REGISTRY_TOKEN` → fallback `pass infra/gitea/personal_access_token_packages_rw` + → named `BLOCKER(empty-token)`. Never echoed; apps run under `set -euo pipefail` + only (no `set -x`). +- **Dry-run by default**: every push-capable app builds + prints what it would + push; `PUBLISH=1` / `--publish` required to mutate the registry. An accidental + local run cannot push. +- **VERSION/TAG**: `TAG=${VERSION:-}`; CI derives `version` from + `CI_COMMIT_TAG` (strip leading `v`). `$VERSION` overrides for local dev. +- **Dev-tag guard**: `:latest` is only moved for a real release tag; dev tags + push per-arch + immutable index only. +- **Preflight**: `BLOCKER(registry-down)` if `https://git.oleks.space/v2/` is + unreachable (names the cluster-shared-fate failure mode). +- **Multi-arch**: index assembled from digest-pinned per-arch refs, fails closed + if a required arch is absent this run, `:latest` is a digest COPY (not a + re-assembly) and the last mutation, staging per-arch tags can be pruned; + gitea-oci-cleanup pins index child digests. + +--- + +## 4. `.woodpecker.yaml` after migration (target) + +The buildx/howard-pinned step is replaced by a single nix-ci step that runs the +same app the laptop runs: + +```yaml +when: [{ event: tag, ref: "refs/tags/v*" }] +steps: + - name: publish + image: git.oleks.space/oleks/nix-ci:latest + environment: + REGISTRY_TOKEN: { from_secret: registry_token } + commands: + - export VERSION="$(echo "$CI_COMMIT_TAG" | sed 's/^v//')" + - nix run .#publish -- --publish # PUBLISH=1 via flag; CI is the only publisher +``` + +Gone: `docker login`, `buildx create --driver remote`, the `arm64` remote +builder, the `nodeSelector: howard2404` pin, the multi-arch `--platform` build. +The `skip_clone`+manual-clone dance can also drop to a normal clone with tags +(needed for `git describe` fallback) once the version is derived in shared code. + +--- + +## 5. Foreign-arch escape hatch + +Not needed for **this** repo — arm64 is a pure Nix cross of a stock interpreter, +so there is no foreign-arch buildkit leg at all. The general escape hatch (kept +for the upstream-source oci-image repos that genuinely need it) is documented for +the archetype, not used here: + +> For images whose payload genuinely cannot be cross-compiled by Nix (an +> upstream binary only published for a foreign arch, or a build that won't +> cross), keep a `Dockerfile` + a buildx step that parameterizes +> `BUILDKIT_ADDR` (default `docker-container://local`, CI overrides to the +> in-cluster `tcp://buildkit-rootless-.infra.svc`). That leg stays +> cluster-coupled by necessity; its per-arch digest still feeds the same +> `regctl index create`/`publish-index` join point, so the multi-arch assembly +> is uniform across native, cross, and foreign-arch legs. + +Because alertmanager-gotify-bridge has neither a compiled artifact nor a +foreign-only upstream binary, the escape hatch is dead code here and is **not** +added — the Dockerfile is retired entirely on cutover. + +--- + +## 6. Risks / caveats + +1. **Determinism (nix2container `reproducible = false`).** Same caveat as the + reference: parity holds only when emmett and CI resolve the **identical store + path** for the python3 closure from the shared cache. With the `fleet` + nixpkgs-projects pin + `flake.lock` committed this holds; verify by comparing + the pushed digest after a release. For a pure-stdlib interpreter image the + risk is low (no project-specific compiler in the closure), but it is real. +2. **TLS / cert path.** `urllib` to Gotify over HTTPS needs `cacert` in the image + and `SSL_CERT_FILE` set — handled in `config.Env` above. Must be verified once + against the live Gotify endpoint (the Alpine image got certs from the distro; + we now ship them explicitly). +3. **Image shape change.** Switching from `python:3.12-alpine` to a Nix closure + changes the digest, size, and layer layout. Any consumer pinning a specific + base-layer digest (unlikely here) would need updating. The `CMD` path changes + from `python` (PATH) to `/app/python3` (absolute symlink). +4. **arm64 cache coverage.** The cross build is only fast if the aarch64 python3 + closure is in the binary cache; a cold cache makes the first emmett arm64 run + slow (still correct, no qemu). Aligns with the fleet-pins strategy. +5. **`publish-index` requires `regctl`/`skopeo` in `nix-ci`.** The reference + already relies on these being present; confirm the `nix-ci` image (or the + app's `runtimeInputs`) provides `regctl`, `skopeo` (via `copyTo`), `curl`. + +--- + +## 7. Remaining work (to finish the migration) + +- [x] `flake.nix` scaffolding the **amd64** leg (`stage-amd64`, `publish-amd64`) + — present, dry-run by default. (this commit) +- [ ] Add the **arm64** cross leg (`stage-arm64`, `publish-arm64`) and `publish` / + `publish-index` apps (copy the reference's index machinery verbatim). +- [ ] Verify the cross build resolves from the binary cache (no source rebuild). +- [ ] Verify TLS to Gotify from the Nix image (cacert + `SSL_CERT_FILE`). +- [ ] Cut over `.woodpecker.yaml` to `nix run .#publish -- --publish`; delete the + buildx/remote-builder/howard-pin steps and the `Dockerfile`. +- [ ] One real release: compare emmett-built vs CI-pushed digest (determinism + check). +- [ ] Add `just publish` / `just stage` front door if/when the repo gains a + `justfile` (uniform across all archetypes). + +When N>3 oci-image repos have migrated, factor the per-arch image + publish apps +into the shared semver-tagged `ci-archetypes` flake-module (parameterized) and +have repos pin it via `inputs.ci-archetypes?ref=` rather than copying the +flake. diff --git a/flake.lock b/flake.lock new file mode 100644 index 0000000..b941ca8 --- /dev/null +++ b/flake.lock @@ -0,0 +1,140 @@ +{ + "nodes": { + "flake-utils": { + "inputs": { + "systems": "systems" + }, + "locked": { + "lastModified": 1731533236, + "narHash": "sha256-l0KFg5HjrsfsO/JpG+r7fRrqm12kzFHyUHqHCVpMMbI=", + "owner": "numtide", + "repo": "flake-utils", + "rev": "11707dc2f618dd54ca8739b309ec4fc024de578b", + "type": "github" + }, + "original": { + "owner": "numtide", + "repo": "flake-utils", + "type": "github" + } + }, + "fleet": { + "inputs": { + "nixpkgs": "nixpkgs", + "nixpkgs-armer": [ + "fleet", + "nixpkgs" + ], + "nixpkgs-bim": [ + "fleet", + "nixpkgs" + ], + "nixpkgs-ci": [ + "fleet", + "nixpkgs" + ], + "nixpkgs-emmett": [ + "fleet", + "nixpkgs" + ], + "nixpkgs-howard": [ + "fleet", + "nixpkgs" + ], + "nixpkgs-mermaid": [ + "fleet", + "nixpkgs" + ], + "nixpkgs-mermaid-gpu": [ + "fleet", + "nixpkgs" + ], + "nixpkgs-micron": [ + "fleet", + "nixpkgs" + ], + "nixpkgs-projects": [ + "fleet", + "nixpkgs" + ] + }, + "locked": { + "lastModified": 1779533061, + "narHash": "sha256-orWNYXtYURhEj3X4+xGMAhaEcKRvwXqTtJ8x2jV/M+Q=", + "ref": "refs/heads/main", + "rev": "b818e345ec4470e4b3e335bd2f864183c512116d", + "revCount": 13, + "type": "git", + "url": "https://git.oleks.space/oleks/fleet-pins" + }, + "original": { + "type": "git", + "url": "https://git.oleks.space/oleks/fleet-pins" + } + }, + "nix2container": { + "inputs": { + "nixpkgs": [ + "nixpkgs" + ] + }, + "locked": { + "lastModified": 1775487831, + "narHash": "sha256-2lguQpLPQaxpQCJjXhmEEAfabwsAhkP29Z7fgLzHARA=", + "owner": "nlewo", + "repo": "nix2container", + "rev": "76be9608a7f4d6c985d28b0e7be903ae2547df3e", + "type": "github" + }, + "original": { + "owner": "nlewo", + "repo": "nix2container", + "type": "github" + } + }, + "nixpkgs": { + "locked": { + "lastModified": 1777268161, + "narHash": "sha256-bxrdOn8SCOv8tN4JbTF/TXq7kjo9ag4M+C8yzzIRYbE=", + "owner": "NixOS", + "repo": "nixpkgs", + "rev": "1c3fe55ad329cbcb28471bb30f05c9827f724c76", + "type": "github" + }, + "original": { + "owner": "NixOS", + "repo": "nixpkgs", + "rev": "1c3fe55ad329cbcb28471bb30f05c9827f724c76", + "type": "github" + } + }, + "root": { + "inputs": { + "flake-utils": "flake-utils", + "fleet": "fleet", + "nix2container": "nix2container", + "nixpkgs": [ + "fleet", + "nixpkgs-projects" + ] + } + }, + "systems": { + "locked": { + "lastModified": 1681028828, + "narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=", + "owner": "nix-systems", + "repo": "default", + "rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e", + "type": "github" + }, + "original": { + "owner": "nix-systems", + "repo": "default", + "type": "github" + } + } + }, + "root": "root", + "version": 7 +} diff --git a/flake.nix b/flake.nix new file mode 100644 index 0000000..7bdf9d3 --- /dev/null +++ b/flake.nix @@ -0,0 +1,204 @@ +{ + description = "alertmanager-gotify-bridge — pure-stdlib Python forwarder, containerized with Nix (nix2container). DESIGN/PARTIAL: amd64 leg only — see ci/MIGRATION.md."; + + # ────────────────────────────────────────────────────────────────────────── + # SCAFFOLD / PARTIAL MIGRATION (emmett#44, archetype oci-image buildx→nix2container). + # This flake intentionally implements ONLY the natively-buildable amd64 leg + # (`nix run .#publish-amd64`, dry-run by default). The arm64 cross leg, the + # multi-arch `publish-index`/`publish` apps, and the `.woodpecker.yaml` cutover + # are DESIGNED in ci/MIGRATION.md but NOT yet wired. Do not treat this as the + # finished pipeline. Pattern mirrors the reference impl + # (oleks/claude-plugin-registry @ 9850745). + # ────────────────────────────────────────────────────────────────────────── + + inputs = { + fleet.url = "git+https://git.oleks.space/oleks/fleet-pins"; + nixpkgs.follows = "fleet/nixpkgs-projects"; + + nix2container.url = "github:nlewo/nix2container"; + nix2container.inputs.nixpkgs.follows = "nixpkgs"; + + flake-utils.url = "github:numtide/flake-utils"; + }; + + outputs = + { + nixpkgs, + nix2container, + flake-utils, + ... + }: + flake-utils.lib.eachDefaultSystem ( + system: + let + pkgs = import nixpkgs { inherit system; }; + inherit (pkgs) lib; + n2c = nix2container.packages.${system}.nix2container; + + registry = "git.oleks.space/oleks/alertmanager-gotify-bridge"; + + # TAG is derived identically for CI + local in shared code: CI exports + # VERSION (= strip-v $CI_COMMIT_TAG); local dev may override $VERSION. + # No version.nix side-channel — the app is a static asset, the flake has + # no version-baked source build, so the tag lives purely in $VERSION. + version = "0.0.0-dev"; + + # The whole payload: bridge.py + a python3 interpreter symlink under /app. + # The symlink keeps the (arch-correct) python3 Nix closure tracked while + # contributing no extra files. Parameterised over a pkg set so the SAME + # expression builds natively (amd64) and cross (arm64, future leg). + appRoot = + targetPkgs: arch: + pkgs.runCommand "app-root-${arch}" { } '' + mkdir -p $out/app + cp ${./bridge.py} $out/app/bridge.py + ln -s ${targetPkgs.python3}/bin/python3 $out/app/python3 + ''; + + mkImage = + targetPkgs: arch: + n2c.buildImage { + name = registry; + tag = "${version}-${arch}"; + inherit arch; + # reproducible = false materializes the layer tar so the image streams + # verbatim from any host (remote-builder + binary-cache safe). For a + # pure-stdlib interpreter closure the determinism risk is low, but the + # standard's caveat still applies: emmett+CI must resolve the identical + # python3 store path from the shared cache (fleet pin + flake.lock). + layers = [ + (n2c.buildLayer { + copyToRoot = [ + (appRoot targetPkgs arch) + pkgs.cacert + ]; + maxLayers = 25; + reproducible = false; + }) + ]; + config = { + Cmd = [ + "/app/python3" + "/app/bridge.py" + ]; + WorkingDir = "/app"; + ExposedPorts = { + "8080/tcp" = { }; + }; + Env = [ + "PORT=8080" + # urllib → Gotify over HTTPS needs an explicit CA bundle (the old + # Alpine base provided one via the distro; the Nix image ships it). + "SSL_CERT_FILE=/etc/ssl/certs/ca-bundle.crt" + ]; + }; + }; + + imageAmd64 = mkImage pkgs "amd64"; + + # ── publish/stage apps (shared by CI + local; cannot drift). ────────── + # SAFETY: dry-run by default. Set PUBLISH=1 / --publish to actually push. + passEntry = "infra/gitea/personal_access_token_packages_rw"; + + publishGate = '' + PUBLISH="''${PUBLISH:-0}" + for a in "$@"; do + case "$a" in + --publish) PUBLISH=1 ;; + --dry-run) PUBLISH=0 ;; + --help|-h) + echo "usage: [VERSION=x.y.z] [PUBLISH=1] $(basename "$0") [--publish|--dry-run|--help]" >&2 + echo " dry-run by default: builds the amd64 image and prints what would be pushed." >&2 + echo " --publish / PUBLISH=1 actually mutates the registry." >&2 + echo " NOTE: amd64 leg ONLY (scaffold) — see ci/MIGRATION.md." >&2 + exit 0 ;; + *) echo "error: unknown argument '$a' (try --help)" >&2; exit 2 ;; + esac + done + TAG="''${VERSION:-${version}}" + ''; + + # Token resolved lazily (only when actually publishing); never echoed, + # never under set -x (writeShellApplication uses -euo pipefail only). + resolveToken = '' + TOKEN="''${REGISTRY_TOKEN:-}" + if [ -z "$TOKEN" ] && command -v pass >/dev/null 2>&1; then + TOKEN="$(pass show ${passEntry} 2>/dev/null || true)" + fi + if [ -z "$TOKEN" ]; then + echo "BLOCKER(empty-token): set REGISTRY_TOKEN env (CI from_secret) or have 'pass ${passEntry}' available; refusing to publish without credentials." >&2 + exit 1 + fi + ''; + + registryPreflight = '' + if ! curl -fsS -o /dev/null --max-time 10 "https://git.oleks.space/v2/" 2>/dev/null; then + echo "BLOCKER(registry-down): https://git.oleks.space/v2/ is unreachable — CI shares fate with the cluster (Zot/buildkit on armer/k3s). Re-run when the registry is back; the staged image in the Nix store is unchanged." >&2 + exit 1 + fi + ''; + + stageAmd64 = '' + echo "→ staging ${registry}:$TAG-amd64 (local build, no registry contact)" + OUT="$(nix build --no-link --print-out-paths "$FLAKE#image-amd64")" + echo " staged image derivation: $OUT" + ''; + + mkStageAmd64 = pkgs.writeShellApplication { + name = "stage-amd64"; + runtimeInputs = [ pkgs.nix ]; + text = '' + FLAKE="''${FLAKE:-.}" + '' + + publishGate + + stageAmd64; + }; + + mkPublishAmd64 = pkgs.writeShellApplication { + name = "publish-amd64"; + runtimeInputs = [ + pkgs.regctl + pkgs.curl + pkgs.nix + ]; + text = '' + FLAKE="''${FLAKE:-.}" + '' + + publishGate + + stageAmd64 + + '' + echo "→ ${registry}:$TAG-amd64" + if [ "$PUBLISH" != "1" ]; then + echo " [dry-run] would push ${registry}:$TAG-amd64 (set PUBLISH=1 / --publish to push)" + echo " [dry-run] NOTE: amd64 leg only — arm64 + index not yet implemented (ci/MIGRATION.md)" + exit 0 + fi + '' + + resolveToken + + registryPreflight + + '' + ${lib.getExe imageAmd64.copyTo} --dest-creds "oleks:$TOKEN" "docker://${registry}:$TAG-amd64" + ''; + }; + in + { + packages = { + image-amd64 = imageAmd64; + default = imageAmd64; + }; + + apps = { + stage-amd64 = { + type = "app"; + program = lib.getExe mkStageAmd64; + meta.description = "Build the amd64 nix2container image into the local store (no registry contact). Scaffold: amd64 leg only."; + }; + publish-amd64 = { + type = "app"; + program = lib.getExe mkPublishAmd64; + meta.description = "Stage + push the amd64 image (dry-run by default; PUBLISH=1 to push). Scaffold: amd64 leg only — see ci/MIGRATION.md."; + }; + }; + } + ); +}