RHONE

>_
{ }
( )
</>

Stateful AI Gateway

Any Model - Any API

  • Vango AI - Calls
  • OpenAI - Responses
  • OpenAI - Chat Completions
  • Anthropic - Messages
  • Gemini - Contents
const message = await
client.messages.create({
  model: "gemini-3-flash-preview",
  max_tokens: 300,
  system,
  messages,
} as any);

SDKs

  • TypeScript
  • Python
  • Go
  • Rust
import VAI, { textInput } from "vai";

const client = new VAI({ apiKey: "rhone_sk_..." });

const result = await client.run({
  blocks: [textInput("Summarize the incident timeline.")],
});

Connections

  • REST
  • SSE
  • Websocket
  • Live Audio
from anthropic import Anthropic

client = Anthropic(api_key="YOUR_VAI_API_KEY",
  base_url="https://your-gateway.example.com")

with client.messages.websocket(
    model="gemini-3-flash-preview",
    max_tokens=300,
    messages=[{"role": "user", "content": "Say hello."}],
    rhone={"session_id": "sess_123"},
) as stream:
    for event in stream:
        print(event)
stateful.ts
1import OpenAI from "openai";
2
3const client = new OpenAI({
4 apiKey: "rhone_sk_...",
5 baseURL: "https://api.rhone.dev/v1/compat/openai",
6});
7
8// First call — session auto-created
9const first = await client.chat.completions.create({
10 model: "Kimi-K2.5",
11 messages: [{ role: "user", content: "My name is Alice." }],
12});
13
14// Second call — one line makes it stateful
15const second = await client.chat.completions.create({
16 model: "Kimi-K2.5",
17 messages: [{ role: "user", content: "What's my name?" }],
18 rhone: { session_id: first.rhone.session_id },
19});
20// "Your name is Alice." — no history resent.

One line to stateful.

Add a session ID to any provider call. The gateway manages context server-side — stop resending your entire message history on every request.

  • Server-managed context across calls
  • Every call recorded with full observability
  • Branch, compact, and replay history
  • Automatic provider failover and routing
  • Built-in quality assessment on every run
  • Works with any model through any SDK

Five primitives. Zero boilerplate.

From a single model call to a production agent — first-class, durable, observable abstractions at every level.

call

One model invocation, one response. The atomic unit of work.

resp, _ := client.Calls.Create(ctx, vai.CallParams{
    Model: "claude-sonnet-4-20250514",
    Input: []vai.Block{vai.InputBlock("Hello!")},
})
fmt.Println(resp.Output[0].Text)

session

The continuity container. Calls and runs belong to sessions. Context persists automatically.

const call = await client.calls.create({
  model: "claude-sonnet-4-20250514",
  sessionId: "sess_01abc...",  // stateful by default
  input: [{ type: "input", text: "What did I just say?" }],
});
// Gateway has the full context — you only send the delta.

run

Orchestrated execution over multiple calls. Tool loops, checkpoints, interrupt, and resume.

run = client.runs.create(
    session_id="sess_01abc...",
    model="claude-sonnet-4-20250514",
    tools=[search_tool, code_exec_tool],
    input=[vai.InputBlock("Find and fix the bug in auth.go")],
)
# Gateway handles the tool loop. You get the final result.
print(run.output)

compaction

Context window management. Summarize long histories to stay within limits while preserving meaning.

// 200+ turns deep — compact to keep the conversation going
client.Sessions.Compact(ctx, "sess_01abc...", vai.CompactParams{
    Strategy: "summarize",
})
// History preserved in lineage. Context window freed.

harness

Codified orchestration. Define tool policies, stop conditions, and approval gates for production agents.

const run = await client.runs.create({
  sessionId: "sess_01abc...",
  hostedHarnessId: "harness_coder_v2",
  input: [{ type: "input", text: "Refactor the auth module" }],
  // Harness defines: tools, stop conditions, approval gates
});

Your AI system of record.

Every session, run, and tool execution is durably stored and queryable. Use structured queries to power your product, annotations to manage quality, and exports to feed your warehouse — without building your own storage layer.

Read
GET
/v1/sessions/{id}/timeline

Full conversation timeline for rendering

GET
/v1/sessions/{id}/blocks

Active continuity head — what the model sees next

Query
POST
/v1/data/sources/{source}/query

Structured queries over sessions, runs, and calls

GET
/v1/data/views/{id}/results

Named saved views for dashboards and review queues

Annotate & Assess
POST
/v1/annotations

Label, review, and curate any run or session

GET
/v1/runs/{id}/assessments

Machine quality scores attached to every run

Export
POST
/v1/data/exports

Export to Parquet, JSONL, or CSV for your warehouse

data-layer.ts
1import VAI from "vai";
2
3const client = new VAI({ apiKey: "rhone_sk_..." });
4
5// Query runs by model and cost
6for await (const row of client.data.sources.query("run_summaries", {
7 filter: { and: [
8 { field: "primary_resolved_model", op: "eq",
9 value: "claude-sonnet-4" },
10 { field: "cost_micros_usd", op: "gt",
11 value: 1000000 },
12 ]},
13 sort: [{ field: "started_at", direction: "desc" }],
14})) {
15 console.log(row.id, row.primary_resolved_model);
16}
17
18// Read quality assessments for a run
19const assessments = await client.runs.assessments(
20 "run_01xyz..."
21);
22console.log(assessments[0].outcome); // "pass"
23console.log(assessments[0].scores.correctness); // 0.89
24
25// Annotate for human review
26await client.annotations.create({
27 target: { kind: "run", runId: "run_01xyz..." },
28 labels: { quality: "correct", reviewed: true },
29 note: "Good trajectory, minimal tool waste.",
30});
assessment.ts
1import VAI, { textInput } from "vai";
2
3const client = new VAI({ apiKey: "rhone_sk_..." });
4
5// Assessment profiles work automatically.
6// The gateway scores every qualifying run.
7
8const run = await client.run({
9 sessionId: "sess_01abc...",
10 blocks: [textInput("Help me debug this")],
11 routing: { model: "gpt-5.4" },
12});
13
14// Later — read the quality assessment
15const bundles = await client.runs.assessments(run.run.id);
16// [{
17// profile: "support_quality_v1",
18// outcome: "pass",
19// scores: { correctness: 0.89, clarity: 0.84 }
20// }]
21
22// Collect user feedback
23await client.feedback.thumbsUp(run.run.sessionId, run.run.id);

Built-in quality assessment.

Every qualifying run is automatically scored. Define assessment profiles once, and the gateway grades response quality, routing decisions, and trajectory efficiency — without building an eval pipeline.

  • Zero-config — org defaults assess every run automatically
  • Machine scoring — deterministic checks, heuristics, and judge-model evals
  • Human review — annotations and feedback signals on any object
  • Routing quality — grade provider selection, fallback, and cost tradeoffs
  • Experiments — compare model and route candidates offline with replay

Ready to build?

Get your API key and start routing to any model in minutes.