ci: design nix2container migration + scaffold amd64 publish app (emmett#44)
Archetype: oci-image (buildx -> in-cluster remote buildkit), the HARD case. DESIGN/PARTIAL, not a finished migration: - ci/MIGRATION.md: concrete plan to escape buildkit via nix2container/skopeo +regctl. The app is pure-stdlib Python, so both arches are buildable on emmett (amd64-native + Nix-cross-from-amd64 python3 closure) with no buildkit/qemu/docker -> no foreign-arch leg needed; Dockerfile retired on cutover. Covers per-arch build, entrypoints, .woodpecker.yaml target, escape hatch (unused here), risks, remaining work. - flake.nix: scaffolds the natively-buildable amd64 leg only (stage-amd64, publish-amd64), dry-run by default (PUBLISH=1 to push), $REGISTRY_TOKEN -> pass fallback, registry-down/empty-token blockers. Mirrors reference impl claude-plugin-registry@9850745. arm64 leg, publish-index/publish, and YAML cutover are designed but NOT wired. Verified: nix eval .#apps.x86_64-linux (-> stage-amd64, publish-amd64); no image build run (downloads closure).
This commit is contained in:
+222
@@ -0,0 +1,222 @@
|
||||
# Migration plan: buildx → nix2container/skopeo+regctl (emmett#44)
|
||||
|
||||
<!-- markdownlint-disable MD013 MD040 MD060 -->
|
||||
<!-- design doc: dense tables, command/output blocks, and long refs -->
|
||||
|
||||
**Status: DESIGN + PARTIAL SCAFFOLD.** This document is a concrete plan, not a
|
||||
completed migration. A `flake.nix` in the repo root scaffolds the
|
||||
**natively-buildable amd64 leg only** (`nix run .#publish-amd64`, dry-run by
|
||||
default). The arm64 leg and the `.woodpecker.yaml` cutover are designed here but
|
||||
**not yet wired** — see "Remaining work".
|
||||
|
||||
Archetype: `oci-image` (buildx → in-cluster remote buildkit) — the HARD
|
||||
archetype in the emmett#44 local-pipeline-parity standard. Tracking: oleks/cluster
|
||||
milestone #57.
|
||||
|
||||
---
|
||||
|
||||
## 1. Why this repo is an easy migration (feasibility)
|
||||
|
||||
The current pipeline (`.woodpecker.yaml`) does:
|
||||
|
||||
```
|
||||
docker buildx create --driver remote tcp://buildkit-rootless-arm64.infra.svc...
|
||||
docker buildx build --platform linux/amd64,linux/arm64 --push .
|
||||
```
|
||||
|
||||
i.e. it depends on **in-cluster remote buildkit** for the multi-arch build and on
|
||||
a node-pinned step (`howard2404`). That is exactly the cluster-coupled,
|
||||
not-reproducible-on-emmett shape emmett#44 wants gone.
|
||||
|
||||
The application (`bridge.py`) is the easiest possible payload:
|
||||
|
||||
- **Pure Python standard library.** `json`, `os`, `sys`, `http.server`,
|
||||
`urllib`. No `pip install`, no requirements file, no C/Rust extension, no
|
||||
native wheel.
|
||||
- The Dockerfile is 5 lines: `FROM python:3.12-alpine`, copy one file, run it.
|
||||
|
||||
Consequences for nix2container:
|
||||
|
||||
- The image's only real runtime dependency is a **CPython interpreter + its Nix
|
||||
closure** (glibc, openssl, zlib, ncurses, ...). `bridge.py` is a static asset
|
||||
copied next to it.
|
||||
- CPython is in nixpkgs and **cross-compiles cleanly with `pkgsCross`** — there
|
||||
is no source build of the app to cross, only the standard interpreter, which
|
||||
the binary cache already serves for aarch64. So **both arches are buildable on
|
||||
emmett (amd64-native + Nix-cross-from-amd64)** with **no buildkit, no qemu, no
|
||||
docker daemon**. This satisfies the emmett-OK definition in the standard
|
||||
("amd64-native OR Nix-cross-from-amd64, never 'uses skopeo' as the build, and
|
||||
not a foreign-arch buildkit leg").
|
||||
|
||||
This repo is therefore a clean reference for "oci-image buildx leg that fully
|
||||
escapes buildkit", not one of the genuinely-hard foreign-arch cases.
|
||||
|
||||
---
|
||||
|
||||
## 2. What the flake builds, per arch
|
||||
|
||||
Mirrors the reference impl (oleks/claude-plugin-registry @ 9850745), minus the
|
||||
Rust/Dioxus build (we have no compiled artifact):
|
||||
|
||||
```
|
||||
inputs: fleet (nixpkgs-projects pin), nix2container, flake-utils
|
||||
|
||||
mkApp targetPkgs:
|
||||
# a tiny derivation that places bridge.py + a python3 symlink under /app
|
||||
appRoot = runCommand "app-root-<arch>":
|
||||
mkdir -p $out/app
|
||||
cp ${./bridge.py} $out/app/bridge.py
|
||||
ln -s ${targetPkgs.python3}/bin/python3 $out/app/python3 # closure tracked
|
||||
|
||||
mkImage arch = nix2container.buildImage {
|
||||
name = "git.oleks.space/oleks/alertmanager-gotify-bridge";
|
||||
tag = "${version}-${arch}";
|
||||
inherit arch; # "amd64" | "arm64"
|
||||
layers = [ (buildLayer {
|
||||
copyToRoot = [ (appRoot arch) cacert ];
|
||||
maxLayers = 25;
|
||||
reproducible = false; # see Determinism note
|
||||
}) ];
|
||||
config = {
|
||||
Cmd = [ "/app/python3" "/app/bridge.py" ];
|
||||
WorkingDir = "/app";
|
||||
ExposedPorts = { "8080/tcp" = {}; };
|
||||
Env = [ "PORT=8080" "SSL_CERT_FILE=/etc/ssl/certs/ca-bundle.crt" ];
|
||||
};
|
||||
};
|
||||
```
|
||||
|
||||
- **amd64**: `targetPkgs = pkgs` (native). Fully buildable + pushable on emmett.
|
||||
- **arm64**: `targetPkgs = pkgs.pkgsCross.aarch64-multiplatform`. Cross from
|
||||
amd64; the aarch64 `python3` closure comes from the binary cache. No qemu.
|
||||
|
||||
> The Alpine base in the current Dockerfile is intentionally dropped: nix2container
|
||||
> ships only the interpreter closure, so the image is smaller and has no distro
|
||||
> package layer. `urllib`+TLS works because `cacert` is copied in and
|
||||
> `SSL_CERT_FILE` points at it.
|
||||
|
||||
---
|
||||
|
||||
## 3. Entrypoints (uniform front door)
|
||||
|
||||
Per the corrected standard, the flake apps ARE the shared code; `.woodpecker.yaml`
|
||||
and local runs invoke the same `nix run`:
|
||||
|
||||
| app | does |
|
||||
|-----|------|
|
||||
| `stage-amd64` / `stage-arm64` | realize the image into the local Nix store, **no registry contact** (BUILD/STAGE-parity half) |
|
||||
| `publish-amd64` / `publish-arm64` | stage + `copyTo` (skopeo) push that arch (PUBLISH half) |
|
||||
| `publish-index` | build-free: `regctl index create` from pushed per-arch refs, then `:latest` = digest copy of `:TAG` as the LAST mutation |
|
||||
| `publish` | both arches + `publish-index` |
|
||||
|
||||
Conventions carried verbatim from the reference:
|
||||
|
||||
- **Token**: `$REGISTRY_TOKEN` → fallback `pass infra/gitea/personal_access_token_packages_rw`
|
||||
→ named `BLOCKER(empty-token)`. Never echoed; apps run under `set -euo pipefail`
|
||||
only (no `set -x`).
|
||||
- **Dry-run by default**: every push-capable app builds + prints what it would
|
||||
push; `PUBLISH=1` / `--publish` required to mutate the registry. An accidental
|
||||
local run cannot push.
|
||||
- **VERSION/TAG**: `TAG=${VERSION:-<flake version>}`; CI derives `version` from
|
||||
`CI_COMMIT_TAG` (strip leading `v`). `$VERSION` overrides for local dev.
|
||||
- **Dev-tag guard**: `:latest` is only moved for a real release tag; dev tags
|
||||
push per-arch + immutable index only.
|
||||
- **Preflight**: `BLOCKER(registry-down)` if `https://git.oleks.space/v2/` is
|
||||
unreachable (names the cluster-shared-fate failure mode).
|
||||
- **Multi-arch**: index assembled from digest-pinned per-arch refs, fails closed
|
||||
if a required arch is absent this run, `:latest` is a digest COPY (not a
|
||||
re-assembly) and the last mutation, staging per-arch tags can be pruned;
|
||||
gitea-oci-cleanup pins index child digests.
|
||||
|
||||
---
|
||||
|
||||
## 4. `.woodpecker.yaml` after migration (target)
|
||||
|
||||
The buildx/howard-pinned step is replaced by a single nix-ci step that runs the
|
||||
same app the laptop runs:
|
||||
|
||||
```yaml
|
||||
when: [{ event: tag, ref: "refs/tags/v*" }]
|
||||
steps:
|
||||
- name: publish
|
||||
image: git.oleks.space/oleks/nix-ci:latest
|
||||
environment:
|
||||
REGISTRY_TOKEN: { from_secret: registry_token }
|
||||
commands:
|
||||
- export VERSION="$(echo "$CI_COMMIT_TAG" | sed 's/^v//')"
|
||||
- nix run .#publish -- --publish # PUBLISH=1 via flag; CI is the only publisher
|
||||
```
|
||||
|
||||
Gone: `docker login`, `buildx create --driver remote`, the `arm64` remote
|
||||
builder, the `nodeSelector: howard2404` pin, the multi-arch `--platform` build.
|
||||
The `skip_clone`+manual-clone dance can also drop to a normal clone with tags
|
||||
(needed for `git describe` fallback) once the version is derived in shared code.
|
||||
|
||||
---
|
||||
|
||||
## 5. Foreign-arch escape hatch
|
||||
|
||||
Not needed for **this** repo — arm64 is a pure Nix cross of a stock interpreter,
|
||||
so there is no foreign-arch buildkit leg at all. The general escape hatch (kept
|
||||
for the upstream-source oci-image repos that genuinely need it) is documented for
|
||||
the archetype, not used here:
|
||||
|
||||
> For images whose payload genuinely cannot be cross-compiled by Nix (an
|
||||
> upstream binary only published for a foreign arch, or a build that won't
|
||||
> cross), keep a `Dockerfile` + a buildx step that parameterizes
|
||||
> `BUILDKIT_ADDR` (default `docker-container://local`, CI overrides to the
|
||||
> in-cluster `tcp://buildkit-rootless-<arch>.infra.svc`). That leg stays
|
||||
> cluster-coupled by necessity; its per-arch digest still feeds the same
|
||||
> `regctl index create`/`publish-index` join point, so the multi-arch assembly
|
||||
> is uniform across native, cross, and foreign-arch legs.
|
||||
|
||||
Because alertmanager-gotify-bridge has neither a compiled artifact nor a
|
||||
foreign-only upstream binary, the escape hatch is dead code here and is **not**
|
||||
added — the Dockerfile is retired entirely on cutover.
|
||||
|
||||
---
|
||||
|
||||
## 6. Risks / caveats
|
||||
|
||||
1. **Determinism (nix2container `reproducible = false`).** Same caveat as the
|
||||
reference: parity holds only when emmett and CI resolve the **identical store
|
||||
path** for the python3 closure from the shared cache. With the `fleet`
|
||||
nixpkgs-projects pin + `flake.lock` committed this holds; verify by comparing
|
||||
the pushed digest after a release. For a pure-stdlib interpreter image the
|
||||
risk is low (no project-specific compiler in the closure), but it is real.
|
||||
2. **TLS / cert path.** `urllib` to Gotify over HTTPS needs `cacert` in the image
|
||||
and `SSL_CERT_FILE` set — handled in `config.Env` above. Must be verified once
|
||||
against the live Gotify endpoint (the Alpine image got certs from the distro;
|
||||
we now ship them explicitly).
|
||||
3. **Image shape change.** Switching from `python:3.12-alpine` to a Nix closure
|
||||
changes the digest, size, and layer layout. Any consumer pinning a specific
|
||||
base-layer digest (unlikely here) would need updating. The `CMD` path changes
|
||||
from `python` (PATH) to `/app/python3` (absolute symlink).
|
||||
4. **arm64 cache coverage.** The cross build is only fast if the aarch64 python3
|
||||
closure is in the binary cache; a cold cache makes the first emmett arm64 run
|
||||
slow (still correct, no qemu). Aligns with the fleet-pins strategy.
|
||||
5. **`publish-index` requires `regctl`/`skopeo` in `nix-ci`.** The reference
|
||||
already relies on these being present; confirm the `nix-ci` image (or the
|
||||
app's `runtimeInputs`) provides `regctl`, `skopeo` (via `copyTo`), `curl`.
|
||||
|
||||
---
|
||||
|
||||
## 7. Remaining work (to finish the migration)
|
||||
|
||||
- [x] `flake.nix` scaffolding the **amd64** leg (`stage-amd64`, `publish-amd64`)
|
||||
— present, dry-run by default. (this commit)
|
||||
- [ ] Add the **arm64** cross leg (`stage-arm64`, `publish-arm64`) and `publish` /
|
||||
`publish-index` apps (copy the reference's index machinery verbatim).
|
||||
- [ ] Verify the cross build resolves from the binary cache (no source rebuild).
|
||||
- [ ] Verify TLS to Gotify from the Nix image (cacert + `SSL_CERT_FILE`).
|
||||
- [ ] Cut over `.woodpecker.yaml` to `nix run .#publish -- --publish`; delete the
|
||||
buildx/remote-builder/howard-pin steps and the `Dockerfile`.
|
||||
- [ ] One real release: compare emmett-built vs CI-pushed digest (determinism
|
||||
check).
|
||||
- [ ] Add `just publish` / `just stage` front door if/when the repo gains a
|
||||
`justfile` (uniform across all archetypes).
|
||||
|
||||
When N>3 oci-image repos have migrated, factor the per-arch image + publish apps
|
||||
into the shared semver-tagged `ci-archetypes` flake-module (parameterized) and
|
||||
have repos pin it via `inputs.ci-archetypes?ref=<pin>` rather than copying the
|
||||
flake.
|
||||
Reference in New Issue
Block a user