Stateful AI Gateway
Any Model - Any API
- Vango AI - Calls
- OpenAI - Responses
- OpenAI - Chat Completions
- Anthropic - Messages
- Gemini - Contents
const message = await client.messages.create({ model: "gemini-3-flash-preview", max_tokens: 300, system, messages, } as any);
SDKs
- TypeScript
- Python
- Go
- Rust
import VAI, { textInput } from "vai"; const client = new VAI({ apiKey: "rhone_sk_..." }); const result = await client.run({ blocks: [textInput("Summarize the incident timeline.")], });
Connections
- REST
- SSE
- Websocket
- Live Audio
from anthropic import Anthropic client = Anthropic(api_key="YOUR_VAI_API_KEY", base_url="https://your-gateway.example.com") with client.messages.websocket( model="gemini-3-flash-preview", max_tokens=300, messages=[{"role": "user", "content": "Say hello."}], rhone={"session_id": "sess_123"}, ) as stream: for event in stream: print(event)
| 1 | import OpenAI from "openai"; |
| 2 | |
| 3 | const client = new OpenAI({ |
| 4 | apiKey: "rhone_sk_...", |
| 5 | baseURL: "https://api.rhone.dev/v1/compat/openai", |
| 6 | }); |
| 7 | |
| 8 | // First call — session auto-created |
| 9 | const first = await client.chat.completions.create({ |
| 10 | model: "Kimi-K2.5", |
| 11 | messages: [{ role: "user", content: "My name is Alice." }], |
| 12 | }); |
| 13 | |
| 14 | // Second call — one line makes it stateful |
| 15 | const second = await client.chat.completions.create({ |
| 16 | model: "Kimi-K2.5", |
| 17 | messages: [{ role: "user", content: "What's my name?" }], |
| 18 | rhone: { session_id: first.rhone.session_id }, |
| 19 | }); |
| 20 | // "Your name is Alice." — no history resent. |
One line to stateful.
Add a session ID to any provider call. The gateway manages context server-side — stop resending your entire message history on every request.
- Server-managed context across calls
- Every call recorded with full observability
- Branch, compact, and replay history
- Automatic provider failover and routing
- Built-in quality assessment on every run
- Works with any model through any SDK
Five primitives. Zero boilerplate.
From a single model call to a production agent — first-class, durable, observable abstractions at every level.
call
One model invocation, one response. The atomic unit of work.
resp, _ := client.Calls.Create(ctx, vai.CallParams{ Model: "claude-sonnet-4-20250514", Input: []vai.Block{vai.InputBlock("Hello!")}, }) fmt.Println(resp.Output[0].Text)
session
The continuity container. Calls and runs belong to sessions. Context persists automatically.
const call = await client.calls.create({ model: "claude-sonnet-4-20250514", sessionId: "sess_01abc...", // stateful by default input: [{ type: "input", text: "What did I just say?" }], }); // Gateway has the full context — you only send the delta.
run
Orchestrated execution over multiple calls. Tool loops, checkpoints, interrupt, and resume.
run = client.runs.create( session_id="sess_01abc...", model="claude-sonnet-4-20250514", tools=[search_tool, code_exec_tool], input=[vai.InputBlock("Find and fix the bug in auth.go")], ) # Gateway handles the tool loop. You get the final result. print(run.output)
compaction
Context window management. Summarize long histories to stay within limits while preserving meaning.
// 200+ turns deep — compact to keep the conversation going client.Sessions.Compact(ctx, "sess_01abc...", vai.CompactParams{ Strategy: "summarize", }) // History preserved in lineage. Context window freed.
harness
Codified orchestration. Define tool policies, stop conditions, and approval gates for production agents.
const run = await client.runs.create({ sessionId: "sess_01abc...", hostedHarnessId: "harness_coder_v2", input: [{ type: "input", text: "Refactor the auth module" }], // Harness defines: tools, stop conditions, approval gates });
Your AI system of record.
Every session, run, and tool execution is durably stored and queryable. Use structured queries to power your product, annotations to manage quality, and exports to feed your warehouse — without building your own storage layer.
/v1/sessions/{id}/timelineFull conversation timeline for rendering
/v1/sessions/{id}/blocksActive continuity head — what the model sees next
/v1/data/sources/{source}/queryStructured queries over sessions, runs, and calls
/v1/data/views/{id}/resultsNamed saved views for dashboards and review queues
/v1/annotationsLabel, review, and curate any run or session
/v1/runs/{id}/assessmentsMachine quality scores attached to every run
/v1/data/exportsExport to Parquet, JSONL, or CSV for your warehouse
| 1 | import VAI from "vai"; |
| 2 | |
| 3 | const client = new VAI({ apiKey: "rhone_sk_..." }); |
| 4 | |
| 5 | // Query runs by model and cost |
| 6 | for await (const row of client.data.sources.query("run_summaries", { |
| 7 | filter: { and: [ |
| 8 | { field: "primary_resolved_model", op: "eq", |
| 9 | value: "claude-sonnet-4" }, |
| 10 | { field: "cost_micros_usd", op: "gt", |
| 11 | value: 1000000 }, |
| 12 | ]}, |
| 13 | sort: [{ field: "started_at", direction: "desc" }], |
| 14 | })) { |
| 15 | console.log(row.id, row.primary_resolved_model); |
| 16 | } |
| 17 | |
| 18 | // Read quality assessments for a run |
| 19 | const assessments = await client.runs.assessments( |
| 20 | "run_01xyz..." |
| 21 | ); |
| 22 | console.log(assessments[0].outcome); // "pass" |
| 23 | console.log(assessments[0].scores.correctness); // 0.89 |
| 24 | |
| 25 | // Annotate for human review |
| 26 | await client.annotations.create({ |
| 27 | target: { kind: "run", runId: "run_01xyz..." }, |
| 28 | labels: { quality: "correct", reviewed: true }, |
| 29 | note: "Good trajectory, minimal tool waste.", |
| 30 | }); |
| 1 | import VAI, { textInput } from "vai"; |
| 2 | |
| 3 | const client = new VAI({ apiKey: "rhone_sk_..." }); |
| 4 | |
| 5 | // Assessment profiles work automatically. |
| 6 | // The gateway scores every qualifying run. |
| 7 | |
| 8 | const run = await client.run({ |
| 9 | sessionId: "sess_01abc...", |
| 10 | blocks: [textInput("Help me debug this")], |
| 11 | routing: { model: "gpt-5.4" }, |
| 12 | }); |
| 13 | |
| 14 | // Later — read the quality assessment |
| 15 | const bundles = await client.runs.assessments(run.run.id); |
| 16 | // [{ |
| 17 | // profile: "support_quality_v1", |
| 18 | // outcome: "pass", |
| 19 | // scores: { correctness: 0.89, clarity: 0.84 } |
| 20 | // }] |
| 21 | |
| 22 | // Collect user feedback |
| 23 | await client.feedback.thumbsUp(run.run.sessionId, run.run.id); |
Built-in quality assessment.
Every qualifying run is automatically scored. Define assessment profiles once, and the gateway grades response quality, routing decisions, and trajectory efficiency — without building an eval pipeline.
- Zero-config — org defaults assess every run automatically
- Machine scoring — deterministic checks, heuristics, and judge-model evals
- Human review — annotations and feedback signals on any object
- Routing quality — grade provider selection, fallback, and cost tradeoffs
- Experiments — compare model and route candidates offline with replay