Runtime Safety Model
Safety and guardrails are runtime controls. They are separate from human annotations, async assessments, and end-user feedback signals:
annotation records human or system notes after the fact.
assessment records async quality or calibration work.
feedback_signal records end-user reactions.
safety_decision records the runtime policy outcome that controlled allow, warning, checkpoint, block, or quarantine behavior.
Safety decisions remain the source of truth. Read models and query sources are projections for analytics, not authoritative runtime state.
Selecting Safety
Requests and session defaults can select a safety profile with the safety object. The selector names a profile and can optionally override profile mode for the request or session:
1{
2 "safety": {
3 "profile": "default-runtime-safety",
4 "mode": "enforce"
5 }
6}
Use profile selectors to make runtime policy explicit at boundaries where different applications, tenants, or environments need different guardrail behavior.
Outcomes
allow means the request continued normally.
allow_with_warning means the runtime continued and retained warning reason codes for later review.
checkpoint means the runtime paused on a normal checkpoint object and requires a reviewer decision before continuing.
block means policy denied the runtime action. Runs can stop with safety_blocked; API errors use codes such as safety.policy_denied.
quarantine means asset admission isolated an asset. Release requires reviewer identity and creates follow-up safety and review records.
Investigating safety_blocked
Start from the exact runtime object that stopped, then follow safety attachments:
Read the run, tool execution, checkpoint, or asset that reported the stop.
List safety decisions on that parent with /safety-decisions .
Read the exact decision by id with GET /v1/safety/decisions/{id} .
Confirm profile_id , profile_version , stage , disposition , and reason_codes .
Inspect target refs to understand the object that was evaluated.
If a checkpoint was created, inspect checkpoint resolution and review annotations.
Check assessment or calibration records only as supporting context.
Do not treat async assessments as the runtime source of truth. They can explain or calibrate safety behavior, but they do not replace the safety decision that controlled the original request.
Reviewer Workflows
Safety checkpoints use the same checkpoint APIs as other runtime pauses. Safety review requires explicit reviewer headers:
1X-VAI-Reviewer-ID: user_123
2X-VAI-Reviewer-Kind: human
Checkpoint resolution accepts review_note and review_labels. These fields create review annotations that operators can use during later audits.
Asset release follows the same reviewer model. A quarantine release must carry reviewer identity and can include review note and labels.
Evidence and Privacy
Safety evidence is intentionally limited. Default safety telemetry and read models do not expose raw prompts, raw model inputs, raw model outputs, raw tool payloads, or raw vendor moderation responses.
Capture policy controls retained evidence shape:
store_input_excerpts=none keeps refs, hashes, lengths, and classifications only.
store_input_excerpts=minimal allows short sanitized excerpts.
store_asset_refs=true keeps asset refs without copying raw asset contents.
store_raw_vendor_payloads=false is the Phase 11 default and remains the public behavior.
Use evidence_redaction_state to understand whether later redaction has affected retained evidence.
Analytics
Operators can use the safety_decision_summaries query source for analytics across sessions, subjects, profiles, stages, and dispositions. It is built from safety_decisions, never from observability events.
Use analytics for trends and triage. Use exact decision reads for incident investigation.
Escalation Checklist
Confirm the profile id and profile version.
Inspect stage, disposition, and reason codes.
Inspect target refs and parent attachment routes.
Check checkpoint reviewer annotations when checkpoint escalation occurred.
Check asset release annotations for quarantined assets.
Review calibration or assessment jobs only as supporting evidence.
Avoid exposing raw vendor payloads or raw sensitive evidence in tickets, dashboards, or exports.
SDK Entry Points
Go and TypeScript SDKs expose profile management, exact safety-decision reads, parent safety-decision attachments, capability helpers, and reviewer header helpers. See SDKs & Libraries for examples.