Latency & performance

VoxRouter is designed to add as little latency as possible between your code and the provider. On many real-world paths, it is faster than calling the provider directly.

Alpha

Minimal overhead

Three design choices keep the routing cost low:

Edge deployment, colocated with providers. Every major TTS provider has a dominant inference region. We run where they run, so your request crosses one short hop to reach us and a second short hop to reach them.
Warm connection pools. Connections to every provider are kept open around the clock and already past authentication. The hot path never opens a fresh upstream socket.
Byte-for-byte forwarding. Your request body and the provider's audio response stream straight through. Nothing is buffered, re-encoded, or parsed beyond what's needed to pick the right provider. Throughput through VoxRouter is identical to the provider's.

The gateway itself adds roughly 1–2 ms of processing. The extra network leg through our region adds a few more. On a warm path, total added latency is under 10 ms. On a cold path — where your code would otherwise pay a fresh handshake to reach the provider — VoxRouter is typically faster than calling the provider directly.

Faster than direct

Because our pools stay warm, VoxRouter is faster than a direct-to-provider call whenever the direct call would have paid connection setup. Three common cases:

Cold HTTP requests

A client that hasn't talked to the provider recently pays the full setup chain — DNS, TCP, TLS, and HTTP/2 negotiation — typically 100–300 ms before the request body even starts flowing. Through VoxRouter, that cost disappears: the client's hop to us is short, and our connection to the provider is already warm.

This wins wherever your code doesn't hold a persistent connection to the provider — processes that just woke up, browser tabs that just loaded, CLI tools that open a socket per command, SDKs instantiated per request.

New WebSocket sessions

Every voice-agent turn opens a fresh WebSocket. The direct path is TLS handshake, WebSocket upgrade, then the provider's own auth and config exchange — often 150 ms before the first text byte reaches the inference engine.

VoxRouter holds pre-authenticated WebSockets to every provider. Your first frame flows immediately. For voice agents, this shaves roughly ~150 ms off time-to-first-audio on every new session.

Serverless without keep-alive

Serverless runtimes rarely keep outbound connections alive across invocations. For workloads shaped like "function calls TTS once and returns audio," every invocation pays a full handshake in the direct path. Through us, the long leg to the provider stays warm on every call, so you save the difference between a cross-continent cold connection and a cross-city warm one — typically tens to low-hundreds of milliseconds per invocation.

Performance considerations

VoxRouter is not always free. Two cases where it isn't a win:

A long-lived server with its own pinned, kept-alive connection to a single provider already has the optimization. Calling through us adds a network hop, worth a handful of milliseconds.
Steady-state streaming inside an already-open WebSocket is a pipe through us. We don't save you anything, and we don't cost you anything.

For every other shape — serverless, fresh clients, new voice-agent sessions, multi-provider workloads — routing through VoxRouter is a wall-clock win, on top of giving you one API across every provider.