Design: the semantic query layer (scoping boundary)
Status: reviewed design spike (SF.6). Specifies the mechanism SF.7 builds; this document contains no production behavior, only the design, the threat model, and the adversarial test plan SF.7 must pass.
Risk tier: catastrophic. The semantic query layer is the tenant-then-RBAC scoping boundary that the AI reasoning layer (Epoch 19b), the externally reachable MCP server, the developer self-service portal, and compliance reporting all route through. A single scoping defect here is a cross-tenant disclosure across every consumer at once — the same blast radius as the signer. It is therefore designed before it is built, like the signer (S1.3/S1.4) and the SSH trust rewrite (S13.2).
1. Purpose and position
trstctl's queryable state lives on four surfaces:
- the event / audit log (NATS JetStream; the AN-2 source of truth);
- the credential graph (F21; relationships between owners, identities, certificates, issuers, hosts);
- the cert / secret inventory (the Postgres read model projected from the log);
- the CBOM (F52; the cryptographic bill of materials from discovery).
The semantic query layer is one internal, read-only API that joins across these surfaces with tenant-then-RBAC scoping enforced by construction, so a query physically cannot return data the caller is not entitled to. Downstream consumers inherit the boundary for free and are never trusted to self-censor.
It sits on top of the existing guarantees, never beside them:
- AN-1 (RLS is the floor). Every read executes inside a tenant-scoped
transaction (
store.WithTenant), so PostgreSQL row-level security confines it to one tenant in the database itself. The query layer composes RBAC scope on top of that floor; it never replaces or bypasses it. - AN-2 (reads are projections). The layer reads the projected read model and the log; it returns results consistent with a known projection offset and never fabricates state outside the event-sourced model.
- AN-7 (bulkheaded). Queries run on a dedicated bounded pool with cost and timeout guards, so a heavy or adversarial query can never starve issuance, readiness, or the signer.
2. Enforcement by construction (not post-filtering)
The defining property: out-of-scope data is unreachable, not filtered out after the fact. Three mechanisms compose to guarantee it.
2.1 Tenant floor — mandatory RLS transaction
There is exactly one execution path, and it always opens a tenant-scoped
transaction. A query cannot be issued without a resolved tenant_id; the layer has
no API that runs a query outside WithTenant. Because the RLS policy is enforced
by PostgreSQL under the non-superuser app role, even a query builder bug that omitted
a tenant predicate would still return zero out-of-tenant rows — the floor holds
independently of the layer's own correctness. This mirrors, and reuses, the AN-1 RLS
proof.
2.2 RBAC scope — a mandatory, layer-injected predicate
Within the tenant, RBAC scope (subject → permitted resource scopes) is applied as a predicate the layer injects into every query plan from the authenticated principal, not a parameter the caller supplies. Callers describe what they want (a typed query spec, §2.3); the layer decides what they may see. The scope predicate is:
- derived solely from the request's authenticated
Principal(the same RBAC the API uses, S3.5), resolved server-side; - attached to every per-surface sub-read before execution; a plan with an unresolved or empty scope fails closed (returns nothing, with an error), never "all rows";
- not expressible as "no scope" by a caller — the type system has no such value.
2.3 No raw queries — a typed, parameterized query spec
Callers never submit SQL, Cypher, or free text. They submit a typed query spec: an allow-listed set of selectable surfaces, fields, filters, and join keys, each a Go value. The layer compiles the spec to parameterized statements ($1, $2 …) per surface; user-supplied values are only ever bound parameters, never concatenated into a statement. Field/relationship names are validated against a fixed allow-list, so neither a field name nor an operator can be attacker-controlled. Cross-surface joins are performed in-process over the per-surface, already-scoped result sets — there is no raw cross-tenant SQL join for a builder bug to widen.
3. Threat model — leak/abuse vectors and mitigations
| # | Vector | How it could leak / abuse | Mitigation (by construction) |
|---|---|---|---|
| V1 | Cross-tenant join leakage | An in-process join over two surfaces accidentally pairs rows from different tenants. | Every sub-read runs in the same WithTenant(tenant) transaction; results carry no other tenant's rows to begin with, so a join cannot reintroduce them. RLS is the floor; the join operates only on already-confined sets. A property test asserts no join output row has a foreign tenant_id. |
| V2 | RBAC-scope bypass | A caller widens its own scope (e.g. by passing a scope/tenant parameter) or a plan runs with empty scope = all rows. | Scope is injected by the layer from the authenticated principal, never accepted from the caller; the spec has no tenant/scope field. An unresolved/empty scope fails closed. Denial is at this layer, before execution. |
| V3 | Query / parameter injection | Attacker-controlled field names, operators, or values alter the statement. | No raw query input. Field/operator names are allow-listed enums; values are bound parameters only. A malformed or unknown field fails closed at compile time. |
| V4 | Projection-staleness disclosure | A read served from a lagging projection reveals state that policy already revoked, or hides a revocation. | Reads are pinned to a projection offset and report it; revocation-sensitive queries read at or after a required offset or fail closed. Results are consistent with a known point in the AN-2 log, never a torn mix. |
| V5 | Cost-exhaustion DoS | A deliberately expensive or looping query (deep graph walk, huge fan-out, cartesian blow-up) starves other subsystems. | §4 cost/timeout guard: bounded pool (AN-7), per-query row/Depth caps, statement_timeout, and a wall-clock deadline; over-budget queries are killed and fail closed. |
| V6 | Result-shape inference | Error messages or row counts leak the existence of out-of-scope data. | Out-of-scope and not-found are indistinguishable to the caller (same empty/forbidden result); errors never echo out-of-scope identifiers. |
| V7 | Graph traversal escaping tenant | A relationship walk follows an edge into another tenant's subgraph. | The graph is built per tenant inside the same RLS transaction; edges cannot reference rows RLS hides, so a walk cannot cross the boundary. |
4. Cost and timeout guard model
- Bulkhead (AN-7). Queries run on a dedicated, bounded worker pool with a bounded queue; a full queue rejects fast with a structured error. The query pool is isolated from the API, issuance, and signer pools.
- Row and depth caps. Each plan carries a maximum result-row cap and, for graph traversal, a maximum depth/fan-out; exceeding either aborts the query (fail closed), it does not truncate-and-return.
- Database statement timeout. The scoped transaction sets a
statement_timeoutso a runaway SQL read is killed by PostgreSQL itself. - Wall-clock deadline. Every query takes a context deadline; the in-process join and graph walk check it between steps and abort promptly.
- Determinism. Guards are configured per deployment, surfaced as metrics
(
trstctl_query_*), and a tripped guard is an explicit, audited error — never a silent partial result.
5. Interface stubs (no behavior)
Indicative shape only; SF.7 implements it. No method below has behavior in this spike.
// Package query is the tenant-then-RBAC scoping boundary over trstctl's four data
// surfaces. Every method executes inside store.WithTenant (AN-1 floor) on the
// bounded query pool (AN-7); callers submit a typed Spec, never raw SQL/Cypher.
package query
// Principal is the authenticated caller; the layer derives RBAC scope from it.
type Principal interface {
TenantID() string
Scopes() []Scope // resource scopes the subject may read; empty => fail closed
}
// Surface enumerates the joinable data surfaces (allow-listed).
type Surface int
const (
SurfaceLog Surface = iota
SurfaceGraph
SurfaceInventory
SurfaceCBOM
)
// Spec is a typed, parameterized query plan. Field/filter/join names are
// allow-listed enums; values are bound parameters. There is no tenant or scope
// field — scope is injected by the engine from the Principal.
type Spec struct {
Select []Field
From []Surface
Where []Predicate // operator is an enum; Value is a bound parameter
Join []JoinKey
Limit int // <= the engine's hard row cap
MaxDepth int // graph traversal bound
}
// Engine runs scoped queries. Query opens a WithTenant(principal.TenantID())
// transaction, injects the RBAC scope predicate, compiles Spec to parameterized
// per-surface reads, joins in-process, and enforces the cost/timeout guards.
type Engine interface {
Query(ctx context.Context, p Principal, s Spec) (*Result, error)
}
// Result carries rows plus the projection offset they are consistent with (AN-2).
type Result struct {
Rows []Row
Offset uint64
}
6. Adversarial test plan (SF.7 must pass, wired into CI)
This plan is a first-class deliverable of SF.7. The build is not done until each of these passes and the suite is part of CI.
- Cross-tenant returns nothing — by construction. Seed two tenants; every query shape a caller in tenant A can express returns zero of tenant B's rows. The defining test, mirroring the AN-1 RLS proof. Includes the variant where the query builder is deliberately fed a B-tenant id — RLS still returns nothing.
- Property-based no-leak. Generate random valid specs, random principals, and
random two-tenant/RBAC fixtures; assert no generated query path yields a row
whose
tenant_id≠ the caller's or whose resource scope ∉ the caller's scopes. This is the core security property and runs many iterations. - RBAC out-of-scope denied at this layer. A principal scoped to subset X cannot read resources in scope Y, even within its own tenant; denial happens before execution.
- Injection fails closed. Specs carrying crafted field/operator names, or values intended to break out of parameter binding, are rejected at compile time; no statement is executed.
- Cost/timeout guard kills. A deliberately expensive (deep/fan-out/looping) query
is aborted by the row/depth cap, the
statement_timeout, or the deadline; the pool is never exhausted (a concurrent cheap query still succeeds — AN-7). - Malformed query fails closed. A spec referencing unknown surfaces/fields, or an internally inconsistent plan, returns a structured error and no rows.
- Projection-staleness bound. A revocation-sensitive query reads at/after the required projection offset or fails closed; results report their offset (AN-2).
- Join correctness. A single query joining at least the log, the graph, and the inventory returns a correct, fully-scoped result (the positive acceptance).
7. Non-negotiables honored
- AN-1 — RLS is the floor; scoping composes on top; a cross-tenant query is impossible at the database, independent of the layer's own correctness.
- AN-2 — reads are consistent with a known projection offset; no state outside the event-sourced model.
- AN-7 — a dedicated bounded pool with cost/timeout guards; a heavy query cannot starve other subsystems.
8. Review & sign-off
This spike is reviewed against the acceptance: every enumerated leak/abuse vector (V1–V7) has a by-construction mitigation; the enforcement mechanism (mandatory RLS transaction + layer-injected RBAC predicate + typed parameterized spec + in-process scoped joins) is specified precisely enough to implement; the cost/timeout guard model is defined; and the adversarial test plan (§6) is enumerated for SF.7 to implement as a first-class deliverable. SF.7 must not begin until this design is the committed reference.