Rhone AI Gateway

{ }

( )

</>

Stateful AI Gateway

Any Model - Any API

Vango AI - Calls
OpenAI - Responses
OpenAI - Chat Completions
Anthropic - Messages
Gemini - Contents

const message = await
client.messages.create({
  model: "gemini-3-flash-preview",
  max_tokens: 300,
  system,
  messages,
} as any);

SDKs

TypeScript
Python
Go
Rust

import VAI, { textInput } from "vai";

const client = new VAI({ apiKey: "rhone_sk_..." });

const result = await client.run({
  blocks: [textInput("Summarize the incident timeline.")],
});

Connections

REST
SSE
Websocket
Live Audio

from anthropic import Anthropic

client = Anthropic(api_key="YOUR_VAI_API_KEY",
  base_url="https://your-gateway.example.com")

with client.messages.websocket(
    model="gemini-3-flash-preview",
    max_tokens=300,
    messages=[{"role": "user", "content": "Say hello."}],
    rhone={"session_id": "sess_123"},
) as stream:
    for event in stream:
        print(event)

stateful.ts

1	import OpenAI from "openai";
2
3	const client = new OpenAI({
4	apiKey: "rhone_sk_...",
5	baseURL: "https://api.rhone.dev/v1/compat/openai",
6	});
7
8	// First call — session auto-created
9	const first = await client.chat.completions.create({
10	model: "Kimi-K2.5",
11	messages: [{ role: "user", content: "My name is Alice." }],
12	});
13
14	// Second call — one line makes it stateful
15	const second = await client.chat.completions.create({
16	model: "Kimi-K2.5",
17	messages: [{ role: "user", content: "What's my name?" }],
18	rhone: { session_id: first.rhone.session_id },
19	});
20	// "Your name is Alice." — no history resent.

One line to stateful.

Add a session ID to any provider call. The gateway manages context server-side — stop resending your entire message history on every request.

Server-managed context across calls
Every call recorded with full observability
Branch, compact, and replay history
Automatic provider failover and routing
Built-in quality assessment on every run
Works with any model through any SDK

Five primitives. Zero boilerplate.

From a single model call to a production agent — first-class, durable, observable abstractions at every level.

call

One model invocation, one response. The atomic unit of work.

resp, _ := client.Calls.Create(ctx, vai.CallParams{
    Model: "claude-sonnet-4-20250514",
    Input: []vai.Block{vai.InputBlock("Hello!")},
})
fmt.Println(resp.Output[0].Text)

session

The continuity container. Calls and runs belong to sessions. Context persists automatically.

const call = await client.calls.create({
  model: "claude-sonnet-4-20250514",
  sessionId: "sess_01abc...",  // stateful by default
  input: [{ type: "input", text: "What did I just say?" }],
});
// Gateway has the full context — you only send the delta.

run

Orchestrated execution over multiple calls. Tool loops, checkpoints, interrupt, and resume.

run = client.runs.create(
    session_id="sess_01abc...",
    model="claude-sonnet-4-20250514",
    tools=[search_tool, code_exec_tool],
    input=[vai.InputBlock("Find and fix the bug in auth.go")],
)
# Gateway handles the tool loop. You get the final result.
print(run.output)

compaction

Context window management. Summarize long histories to stay within limits while preserving meaning.

// 200+ turns deep — compact to keep the conversation going
client.Sessions.Compact(ctx, "sess_01abc...", vai.CompactParams{
    Strategy: "summarize",
})
// History preserved in lineage. Context window freed.

harness

Codified orchestration. Define tool policies, stop conditions, and approval gates for production agents.

const run = await client.runs.create({
  sessionId: "sess_01abc...",
  hostedHarnessId: "harness_coder_v2",
  input: [{ type: "input", text: "Refactor the auth module" }],
  // Harness defines: tools, stop conditions, approval gates
});

Your AI system of record.

Every session, run, and tool execution is durably stored and queryable. Use structured queries to power your product, annotations to manage quality, and exports to feed your warehouse — without building your own storage layer.

Read

GET

/v1/sessions/{id}/timeline

Full conversation timeline for rendering

GET

/v1/sessions/{id}/blocks

Active continuity head — what the model sees next

Query

POST

/v1/data/sources/{source}/query

Structured queries over sessions, runs, and calls

GET

/v1/data/views/{id}/results

Named saved views for dashboards and review queues

Annotate & Assess

POST

/v1/annotations

Label, review, and curate any run or session

GET

/v1/runs/{id}/assessments

Machine quality scores attached to every run

Export

POST

/v1/data/exports

Export to Parquet, JSONL, or CSV for your warehouse

data-layer.ts

1	import VAI from "vai";
2
3	const client = new VAI({ apiKey: "rhone_sk_..." });
4
5	// Query runs by model and cost
6	for await (const row of client.data.sources.query("run_summaries", {
7	filter: { and: [
8	{ field: "primary_resolved_model", op: "eq",
9	value: "claude-sonnet-4" },
10	{ field: "cost_micros_usd", op: "gt",
11	value: 1000000 },
12	]},
13	sort: [{ field: "started_at", direction: "desc" }],
14	})) {
15	console.log(row.id, row.primary_resolved_model);
16	}
17
18	// Read quality assessments for a run
19	const assessments = await client.runs.assessments(
20	"run_01xyz..."
21	);
22	console.log(assessments[0].outcome); // "pass"
23	console.log(assessments[0].scores.correctness); // 0.89
24
25	// Annotate for human review
26	await client.annotations.create({
27	target: { kind: "run", runId: "run_01xyz..." },
28	labels: { quality: "correct", reviewed: true },
29	note: "Good trajectory, minimal tool waste.",
30	});

assessment.ts

1	import VAI, { textInput } from "vai";
2
3	const client = new VAI({ apiKey: "rhone_sk_..." });
4
5	// Assessment profiles work automatically.
6	// The gateway scores every qualifying run.
7
8	const run = await client.run({
9	sessionId: "sess_01abc...",
10	blocks: [textInput("Help me debug this")],
11	routing: { model: "gpt-5.4" },
12	});
13
14	// Later — read the quality assessment
15	const bundles = await client.runs.assessments(run.run.id);
16	// [{
17	// profile: "support_quality_v1",
18	// outcome: "pass",
19	// scores: { correctness: 0.89, clarity: 0.84 }
20	// }]
21
22	// Collect user feedback
23	await client.feedback.thumbsUp(run.run.sessionId, run.run.id);

Built-in quality assessment.

Every qualifying run is automatically scored. Define assessment profiles once, and the gateway grades response quality, routing decisions, and trajectory efficiency — without building an eval pipeline.

Zero-config — org defaults assess every run automatically
Machine scoring — deterministic checks, heuristics, and judge-model evals
Human review — annotations and feedback signals on any object
Routing quality — grade provider selection, fallback, and cost tradeoffs
Experiments — compare model and route candidates offline with replay

Ready to build?

Get your API key and start routing to any model in minutes.

Get API key API reference