SSH — replace standing SSH keys with short-lived certificates

What it is

Most SSH access works by copying a user's public key into a server's authorized_keys file. That scales badly and ages dangerously: keys pile up, nobody remembers whose they are, and removing access means hunting them down across every host. An SSH certificate replaces that model: you trust one SSH certificate authority, which signs short-lived certificates saying "this user may log in as alice until 5 p.m." No per-host key copying, automatic expiry, central control.

This page covers trstctl's three SSH pieces: the CA that signs host and user certificates (F43), the agent that safely configures hosts to trust it (F44), and attestation-gated short-lived user certificates tying SSH access to verified identity (F45).

Why it exists

Standing SSH keys are one of the most common audit findings and breach vectors: orphaned keys grant access nobody tracks, and offboarding rarely removes every key. SSH certificates fix the structural problem: access expires on its own, trust is centralized in the CA, and you grant exactly the principals and time window each session needs. The hard parts are changing host trust without locking yourself out, and ensuring only the right identity can get a certificate — what F44 and F45 address.

How it works

The SSH certificate authority (F43)

trstctl's SSH CA signs two kinds of OpenSSH certificate: host certificates (so clients verify a server without trust-on-first-use prompts) and user certificates (so servers authorize logins without a stored key). Each certificate carries principals, a validity window, and optional critical options and extensions.

All signing goes through the single crypto path — one SignSSHCertificate operation taking an opaque signer handle — so the CA key lives in an HSM, held in the isolated signing service, never in the API process or in the clear. An issuance profile bounds the maximum TTL and allowed certificate types; serial numbers increment under a lock; every issuance is recorded as an immutable ssh.cert.issued event in its own bounded lane. The CA also maintains a key revocation list (KRL): revoke by serial or key ID, then distribute a snapshot to hosts, pulling back a certificate before it expires.

The operator workflow is served two ways: OpenSSH-compatible protocol endpoints (/ssh/ca, /ssh/issue/user, /ssh/issue/host, /ssh/krl) and a guarded product API (GET /api/v1/ssh/status, POST /api/v1/ssh/certificates/revoke) used by the CLI and console, reporting the authority key, KRL version, revoked-certificate count, and configured attestors. Revocation appends an immutable ssh.cert.revoked event before publishing the updated KRL snapshot.

SSH deployment & trust configuration (F44)

For a host to accept the CA's certificates, it must trust the CA's public key, written into TrustedUserCAKeys and referenced from sshd_config. Editing sshd_config on a live fleet is exactly where people lock themselves out, so trstctl's agent follows a hard rule: additive-only, validated before it takes effect, and rolled back automatically on any failure.

The agent backs up both files, is idempotent (a CA line already present is a no-op), writes changes atomically (write-temp-then-rename), then runs a three-step gauntlet: validate (sshd -t), reload, and health-check that sshd still accepts connections. On any failure it restores both files from backup and reloads the known-good config; reload and health commands are operator-supplied, required, and run as validated argv lines with shell metacharacters rejected (--ssh-trust-reload-cmd, --ssh-trust-health-cmd) — reload success alone isn't proof of health. Removing trust is never implicit: RemoveCATrust needs an explicit confirmation flag. Every action is audited (ssh.trust.added, ssh.trust.removed, ssh.trust.rolled_back, and ssh.trust.rollback_failed on a failed restoration), leaving an unclear host state that needs operator attention.

The control plane also has a served handoff for this high-blast-radius path: POST /api/v1/ssh/trust-rollouts records the source, target hosts, CA fingerprint, reload/health commands, rollback plan, status, and an explicit confirmed=true acknowledgement; POST /api/v1/ssh/hosts/retire records retirement evidence once migration completes. The browser and CLI record/request the workflow, but host file edits happen only inside the operator-confirmed agent path.

Attestation-gated short-lived user certificates (F45)

The most powerful pattern: issue an SSH user certificate only to a caller who proves identity first. This issuer runs an attestation check (the same chain used for workload identity), then derives principals from the verified attestation and calls the SSH CA. It requires an approver distinct from the attested subject, rejects unbound principals, supports OpenSSH source-address and force-command critical options, fails closed on attestation failure, defaults to a 15-minute TTL (capped by the profile), and binds the attestation via an immutable ssh.attested_cert.issued event: access short-lived and provably tied to a specific CI job or cloud instance — no standing keys.

That issuer is served at POST /api/v1/ssh/attested-user-certs and by trstctl ssh issue-attested-user; the request carries an attestation method, base64 payload, SSH public key, approver, optional key ID, principals, TTL, source-address allowlist, and force-command policy. The response is the certificate plus serial, key ID, expiry, constraints, and the attestation record — the private key never crosses the API or UI.

Use it

Stand up the SSH CA, distribute its public key to hosts via the agent, then issue short-lived user certificates. The CA's public key goes into a host's trust config like this (what the agent writes, additively):

# /etc/ssh/sshd_config
TrustedUserCAKeys /etc/ssh/trusted_user_ca_keys

# /etc/ssh/trusted_user_ca_keys  (the CA public key in authorized_keys form)
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5... trstctl-ssh-ca

A user certificate is then issued with an attestation-bound principal and a short TTL (e.g. 15 minutes); the user connects normally and sshd validates the certificate against the trusted CA without any stored key.

trstctl ssh status
trstctl ssh trust-rollout \
  --hosts edge-1.internal \
  --ca-fingerprint SHA256:... \
  --reload-cmd 'systemctl reload sshd' \
  --health-cmd 'ssh -o BatchMode=yes localhost true' \
  --rollback-plan 'restore backup, reload sshd' \
  --status health_passed \
  --confirm
cat > ssh-attested-user.json <<EOF
{
  "method": "k8s_sat",
  "payload_base64": "$K8S_SAT_B64",
  "public_key": "$(cat ~/.ssh/id_ed25519.pub)",
  "approver": "ssh-approver",
  "principals": ["web"],
  "source_addresses": ["10.0.0.0/24"],
  "force_command": "/usr/local/bin/deploy",
  "ttl_seconds": 900
}
EOF
trstctl ssh issue-attested-user -f ssh-attested-user.json
trstctl ssh revoke --serial 42 --reason 'revoked'
trstctl ssh retire-host --host edge-1.internal --reason 'replaced'

Pitfalls & limits

Never hand-edit trust on a live host. Use the agent so the validate-reload-health-check-rollback safety net applies — a bad manual sshd_config edit can lock you out, and trstctl won't remove existing trust without explicit confirmation.
Serving status: the SSH CA is served by the running control plane (protocols.ssh.enabled, default off): cert issuance at /ssh/..., the OpenSSH binary KRL at /ssh/krl (sshd's RevokedKeys consumes it), and workflow API/CLI coverage for status, trust rollout evidence, attested user cert issue, KRL revocation, and host retirement. The CA key stays in the isolated signing service, never the API process, with every step recorded as an immutable event and tenant data isolated at the database layer. SSH host-key discovery is also served via ssh discovery sources on the outbox worker; privileged trust rewrites still need the explicit agent-safe rollout workflow — see Current limitations.
Short TTLs require renewal. That's the security benefit, but plan the renewal path for long-running sessions.
KRL distribution is push-based. Revoking a certificate means distributing the updated KRL to hosts — budget for that propagation.

Reference

CA operations: IssueUserCert, IssueHostCert, AuthorityKey (for TrustedUserCAKeys / @cert-authority), KRL.RevokeSerial, KRL.Distribute.
Served API/CLI: GET /api/v1/ssh/status, POST /api/v1/ssh/trust-rollouts, POST /api/v1/ssh/attested-user-certs, POST /api/v1/ssh/certificates/revoke, POST /api/v1/ssh/hosts/retire; trstctl ssh status|trust-rollout|issue-attested-user|revoke|retire-host.
Agent config: SSHDConfigPath, TrustedUserCAKeysPath, AllowUnconfirmedRemoval (default false).
Attested issuance: AttestedUserCertIssuer.Issue (method+payload → cert).
Events: ssh.cert.issued, ssh.attested_cert.issued, ssh.trust.added, ssh.trust.removed, ssh.trust.rolled_back, ssh.trust.rollback_failed.
Standard: OpenSSH certificate format (PROTOCOL.certkeys).
Design deep-dive: SSH trust-rewrite design.