fix(pipeline-doctor): accept pass-var token form + exempt --password-stdin from leak scan (#199)

The audit sweep wrongly FAILED ~9 converted ci-script repos on two heuristics:
- token-contract now accepts the secure indirection pass "$PASS_ENTRY" /
  pass "$VAR", not only a hard-coded pass <literal-path>.
- leak scan flattens \-continuations + folds the pipe target onto the echo
  line, then exempts the echo "$TOKEN" | <cmd> ... --password-stdin/--pass-stdin
  stdin-feed idiom; bare echo to stdout/file and set -x still FAIL.
Adds --self-test with six inline fixtures locking in both fixes and the
three must-still-catch leaks.
This commit is contained in:
Oleks
2026-06-03 10:42:26 +03:00
parent db0bf3b9ab
commit d265a79ddb
2 changed files with 107 additions and 8 deletions
+18
View File
@@ -7,6 +7,24 @@ semantic versioning; the version is a conceptual tag (no git tag is created).
## Unreleased
- **Fix: `pipeline-doctor` token-heuristic false positives (#199).** The audit
sweep wrongly FAILED ~9 correctly-converted ci-script repos on two heuristics.
(1) The token-contract check only accepted a hard-coded `pass <literal-path>`;
it now also accepts the secure indirection `pass "$PASS_ENTRY"` / `pass "$VAR"`
(a quoted/unquoted shell var, optionally via `pass show`). (2) The leak scan
flagged the blessed `echo "$TOKEN" | docker login … --password-stdin` (and
`--pass-stdin`, and helm `registry login`) idiom — the token goes to STDIN, not
the log — because the `--password-stdin` flag often sits on a `\`-wrapped
continuation line the line-based grep never saw on the `echo` line. The scan now
flattens `\`-continuations and folds the pipe target onto the `echo` line, then
exempts the `… | <cmd> … --password-stdin`/`--pass-stdin` feed. Real leaks still
FAIL: a bare `echo "$TOKEN"` to stdout or a file, and `set -x` in a token script.
Added `--self-test`: six inline fixtures lock in both fixes and the three
must-still-catch leaks. Verified: version-radar, xonsh, common-chronicle,
ii-researcher, ironclaw, openclaw now PASS; parity-lib `--strict .`, gitea-mcp,
numpy-s390x still 9/9. (commonground-legacy / cms-plugins / csi-s3 still FAIL one
UNRELATED check — dev-tag-guard — because their woodpecker config lives in a
`.woodpecker/` directory the doctor doesn't yet read; out of scope for #199.)
- **Feature: `mkAtticClosurePublish` — the attic-closure builder (cluster #198).**
Models the archetype parity-lib was missing: build a Nix closure and push it to
the Attic binary cache (NO registry artifact). Yields `stage-<arch>` (`nix build`
+89 -8
View File
@@ -17,14 +17,17 @@
set -euo pipefail
STRICT=0
SELFTEST=0
REPO="."
for a in "$@"; do
case "$a" in
--strict) STRICT=1 ;;
--self-test) SELFTEST=1 ;;
-h | --help)
echo "Usage: pipeline-doctor [--strict] [<repo-path>]"
echo " Assert the parity contract for <repo-path> (default: .)."
echo " --strict exit non-zero on any WARN as well as any FAIL."
echo " --strict exit non-zero on any WARN as well as any FAIL."
echo " --self-test run the token-leak heuristic fixtures and exit."
exit 0
;;
-*)
@@ -34,6 +37,68 @@ for a in "$@"; do
*) REPO="$a" ;;
esac
done
# --self-test (#199): exercise the two corrected heuristics on synthetic ci
# fixtures so a regression in the token-contract / leak-scan logic is caught here
# (and by a CI step) rather than by silently re-breaking 9 real repos. Each case
# builds a throwaway repo, runs THIS script on it, and asserts PASS/FAIL.
if [ "$SELFTEST" -eq 1 ]; then
self="$(cd "$(dirname "$0")" && pwd)/$(basename "$0")"
st_fail=0
# The fixture token references are ASSEMBLED from $d at runtime so the raw
# bytes of THIS file never contain a literal `echo "$TOKEN"` or a bare
# `set -x` line — otherwise the doctor's own leak/xtrace scan would flag
# pipeline-doctor.sh when run against parity-lib itself (#199). $sx is the
# xtrace directive, likewise assembled to dodge the self-scan.
d='$'
sx="set -""x" # the literal `set -x`, split so THIS file's bytes don't carry it
mk() { # mk <dir> <body-line(s) with $d for a literal dollar>
mkdir -p "$1/ci"
{
printf '%s\n' '#!/usr/bin/env bash' 'set -euo pipefail' \
"PUBLISH=\"${d}{PUBLISH:-0}\"" 'resolve_token() {' \
" if [ -n \"${d}{REGISTRY_TOKEN:-}\" ]; then printf '%s' \"${d}REGISTRY_TOKEN\"; return; fi" \
" pass \"${d}PASS_ENTRY\"" '}' "tok=\"${d}(resolve_token)\""
printf '%b\n' "$2"
} >"$1/ci/local.sh"
}
expect() { # expect <label> <PASS|FAIL> <grep-pat> <repo>
got="$({ bash "$self" "$4" 2>&1 || true; } | grep -E "$3" | { grep -oE 'PASS|FAIL' || true; } | { head -1 || true; })"
if [ "$got" = "$2" ]; then
printf ' ok %s (%s)\n' "$1" "$2"
else
printf ' FAIL %s: expected %s, got %s\n' "$1" "$2" "${got:-<none>}"
st_fail=1
fi
}
td="$(mktemp -d)"
login='docker login "'"$d"'H" -u u --password-stdin >/dev/null'
# token contract: `pass "$VAR"` indirection is a VALID resolution form.
mk "$td/tok" "echo \"${d}tok\" | $login"
expect "token-contract pass-var accepted" PASS 'token = ' "$td/tok"
# leak: password-stdin idiom, flag on a CONTINUATION line, is NOT a leak.
mk "$td/wrap" "echo \"${d}tok\" | docker login \"${d}H\" \\\\\n\t-u u --password-stdin >/dev/null"
expect "password-stdin wrapped is not a leak" PASS 'token leak' "$td/wrap"
# leak: helm registry login --password-stdin wrapped is NOT a leak.
mk "$td/helm" "echo \"${d}REGISTRY_TOKEN\" | helm registry login \"${d}H\" \\\\\n\t--username u --password-stdin >/dev/null"
expect "helm --password-stdin wrapped is not a leak" PASS 'token leak' "$td/helm"
# REAL leak: bare echo of the token to stdout MUST still fail.
mk "$td/bare" "echo \"${d}tok\""
expect "bare echo to stdout is a leak" FAIL 'token leak' "$td/bare"
# REAL leak: echo the token to a FILE MUST still fail.
mk "$td/file" "echo \"${d}REGISTRY_TOKEN\" > /tmp/leaked"
expect "echo to a file is a leak" FAIL 'token leak' "$td/file"
# REAL leak: a standalone `set -x` in a token script MUST still fail.
mk "$td/setx" "$sx\necho \"${d}tok\" | $login"
expect "set -x in token script is a leak" FAIL 'token leak' "$td/setx"
rm -rf "$td"
if [ "$st_fail" -eq 0 ]; then
echo "pipeline-doctor self-test: all heuristic fixtures pass."
exit 0
fi
echo "pipeline-doctor self-test: FAILURES above." >&2
exit 1
fi
REPO="$(cd "$REPO" && pwd)"
FLAKE="$REPO/flake.nix"
@@ -150,7 +215,10 @@ fi
if [ "$CONSUMES_PARITY" -eq 1 ]; then
ok "token = \$REGISTRY_TOKEN with pass fallback (delegated to parity-lib)"
elif printf '%s' "$token_src" | grep -Eq 'REGISTRY_TOKEN' &&
printf '%s' "$token_src" | grep -Eq 'parity_resolve_token|pass (show )?infra/gitea/personal_access_token_packages_rw'; then
printf '%s' "$token_src" | grep -Eq 'parity_resolve_token|pass (show )?infra/gitea/personal_access_token_packages_rw|pass (show )?["'"'"']?[$]'; then
# Accept the secure INDIRECTION form too: `pass "$PASS_ENTRY"` / `pass "$VAR"`
# (a quoted/unquoted shell var, optionally via `pass show`) is a valid
# token-resolution path, not just a hard-coded `pass <literal/path>` (#199).
ok "token = \$REGISTRY_TOKEN with pass fallback"
else
bad "token contract missing: need \$REGISTRY_TOKEN + pass fallback (parity_resolve_token)"
@@ -166,13 +234,26 @@ dollar='[$]'
leak_pat="(^|[;|&(]|then |do |else )[[:space:]]*(echo|printf)[^|]*${dollar}(REGISTRY_TOKEN|TOKEN|tok|token)([^A-Za-z0-9_]|$)"
leak=0
if printf '%s' "$token_src" | grep -Eq "^[[:space:]]*set -x([[:space:]]|$)"; then leak=1; fi
# Exclude: redactions, stdin-feeds, the doctor's own $token_src var, an escaped
# '\$' (a token NAME in a message, not its value), and the sanctioned
# `printf '%s' "$TOKEN"` capture idiom (the resolver returns the token on stdout
# for a caller to capture — that is the one blessed place a token is emitted).
if printf '%s' "$token_src" |
# Flatten line-continuations (`\`-newline) AND fold the pipe target onto the echo
# line, so the blessed `echo "$TOKEN" | docker login ... --password-stdin` idiom
# is seen as ONE logical command even when the flag sits on a continuation line
# (#199). Without this, the stdin-feed exemption below never matched a wrapped
# `--password-stdin` and the secure idiom was flagged as a leak.
flat_src="$(printf '%s\n' "$token_src" |
awk '{ if (prev ~ /\\$/) { sub(/\\$/, "", prev); prev = prev " " $0; }
else { if (NR > 1) print prev; prev = $0; } }
END { if (NR > 0) print prev; }' |
awk '{ if (NR > 1 && prev ~ /\|[[:space:]]*$/) { prev = prev " " $0; }
else { if (NR > 1) print prev; prev = $0; } }
END { if (NR > 0) print prev; }')"
# Exclude: redactions, stdin-feeds (a token piped to a login via
# `--password-stdin` / `--pass-stdin` goes to STDIN, not the log), the doctor's
# own $token_src var, an escaped '\$' (a token NAME in a message, not its value),
# and the sanctioned `printf '%s' "$TOKEN"` capture idiom (the resolver returns
# the token on stdout for a caller to capture — the one blessed emit).
if printf '%s' "$flat_src" |
grep -E "$leak_pat" |
grep -vE "REDACTED|sed |stdin|token_src" |
grep -vE "REDACTED|sed |stdin|--password-stdin|--pass-stdin|token_src" |
grep -vqE 'printf .%s. "[$](REGISTRY_TOKEN|TOKEN|tok|token)"|\\[$]'; then
leak=1
fi