API Overview
One API across every TTS provider. OpenAI-compatible schemas, one-line provider switching.
VoxRouter's request and response schemas mirror the OpenAI Audio API, with one addition: the model field carries a provider prefix — "{provider}/{model_id}" — so a single VoxRouter key routes across every supported TTS provider. Swap providers by changing one string, no rewrites of client code, no new credentials.
The router is OpenAI-compatible. Any client that speaks POST /v1/audio/speech works against VoxRouter with a base URL change — see the Quickstart for the OpenAI SDK example.
OpenAPI specification
The machine-readable spec lives in the repo at voxrouter/router/openapi.yaml. Feed it into Swagger UI, Postman, or any OpenAPI code generator. We also use it as the source of truth for the first-party voxrouter SDK — the published TypeScript types are generated from this file on every spec change.
# Fetch the spec directly from GitHub
curl -L https://raw.githubusercontent.com/voxrouter/voxrouter/main/voxrouter/router/openapi.yaml \
-o voxrouter.openapi.yamlAuthentication
Every request carries a Bearer token in the Authorization header. Keys start with pk_ and are created from the console.
curl https://api.voxrouter.ai/v1/audio/speech \
-H "Authorization: Bearer $VOXROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"elevenlabs/eleven_turbo_v2_5","voice":"EXAVITQu4vr4xnSDxMaL","input":"hi"}'401 Unauthorized means the key is missing or invalid. 429 Rate-Limited means you've hit the per-key limit — see Rate limits.
Requests
The router exposes endpoints for speech synthesis, catalog discovery, and wallet inspection. Each is documented in the API Reference sidebar; the prose below is a quick orientation.
POST /v1/audio/speech
Synthesize speech from a text input. The response body is raw audio — audio/mpeg for response_format: "mp3", audio/l16 (16-bit LE PCM @ 24 kHz) for "pcm".
// Request body (application/json)
type SpeechRequest = {
/** Provider-prefixed model id, e.g. "elevenlabs/eleven_turbo_v2_5". */
model: string;
/** Text to synthesize. */
input: string;
/** Provider-local voice id. Use GET /v1/voices to discover. */
voice: string;
/** Output encoding. Defaults to "mp3". */
response_format?: "mp3" | "pcm";
/** Passthrough provider-specific options. */
provider_options?: Record<string, unknown>;
};GET /v1/voices
Return the voice catalog across every configured provider. Filter with query params:
// Query params
type VoicesQuery = {
/** Comma-separated provider list, e.g. "elevenlabs,cartesia". */
provider?: string;
/** ISO language prefix (case-insensitive), e.g. "en" or "en-US". */
language?: string;
/** Exact gender label (case-insensitive), e.g. "female". */
gender?: string;
};GET /v1/providers
Return the catalog of routable providers and the models each exposes. Near-static — safe to cache for hours. For live availability use /v1/status.
GET /v1/status
Per-provider live health (available / degraded / unavailable) plus the reason a provider is non-available (missing_api_key, circuit_open, circuit_half_open). Cheap to poll.
GET /v1/credits
Wallet snapshot for the authenticated key's account: balanceMicros (available credit) and reservedMicros (in-flight reservations). Both in USD micro-dollars (1_000_000 = $1).
GET /v1/credits/activity
Recent ledger entries for the wallet (newest first). Each row records a wallet mutation (top-up, reserve, commit, refund) with the signed microsDelta and resulting microsBalanceAfter.
Model strings
The model field always uses the "provider/model_id" shape. The part before the slash picks the provider; the part after is the provider-native model id (passed through unchanged).
elevenlabs/eleven_turbo_v2_5
cartesia/sonic-2
openai/gpt-4o-mini-ttsResponses
Successful POST /v1/audio/speech returns the raw audio stream. The provider that served the request is in the X-VoxRouter-Provider response header. Successful GET /v1/voices returns a JSON object with a voices array.
// Voice catalog response
type VoicesResponse = {
voices: Array<{
id: string;
provider: string;
name: string;
language: string;
labels: Record<string, string>;
preview_url?: string;
model_compatibility: string[];
}>;
};Errors
Non-2xx responses return a JSON error body with a machine-readable error code and an optional human-readable details. The first-party SDK surfaces these as VoxRouterError with .status, .code, and .details.
{
"error": "invalid_model",
"details": "unknown_provider: bad"
}| Status | Code | Meaning |
|---|---|---|
400 | invalid_body | JSON body failed schema validation |
400 | invalid_model | Malformed model string or unknown provider |
401 | unauthorized | Missing or invalid API key |
402 | insufficient_credit | Wallet does not have enough credit to cover the estimated cost. Top up and retry. |
402 | spend_limit_exceeded | The API key tripped its per-key daily or monthly spend cap. |
429 | rate_limited | Per-key rate limit exceeded. Retry-After header indicates seconds to wait. |
429 | concurrency_limited | Too many in-flight requests for this key. Slots free on completion. |
500 | internal_error | Unexpected server error. |
502 | upstream_error | Provider returned an unrecoverable error after automatic retries. |
503 | provider_unavailable | Provider's circuit-breaker is open. Retry-After indicates expected reset. |
504 | upstream_error | Provider did not respond within the per-attempt deadline. |
Rate limits
Requests are rate-limited per API key. When you exceed the limit, the router returns 429 with {"error":"rate_limited"}. Retry with backoff. Concrete per-key limits are not yet published — reach out if you need a higher ceiling.
Streaming
POST /v1/audio/speech returns the audio body as a chunked HTTP response. In the SDK, use audio.speech.createRaw(…) to get the raw Response and read .body as a ReadableStream. In fetch-land, iterate the Blob or stream directly; see the Quickstart.