// lux · voice & tts

A real voice conversation.
Not push-to-talk.

Lux uses WebRTC to have a live, back-and-forth voice conversation via OpenAI's Realtime API. When she responds, her voice comes from a custom TTS pipeline with four providers in priority order — from a GPU-accelerated cloud synthesizer down to the browser. All configurable per mode.

← Back to overview

WebRTC realtime voice4-provider TTS stackChatterbox GPU synthesizerCoqui self-hosted TTSElevenLabs APIPhone companion UIPassive listening modeVoice-triggered actions

Lux · Voice Interface · Active

// webrtc realtime · speaking

Lux is responding…

gpt-4o-realtime-preview · voice: marin

01 Chatterbox Turbo RunPod GPU active

02 Coqui TTS self-hosted fallback

03 ElevenLabs API fallback

04 Browser TTS built-in Web Speech API fallback

// webrtc realtime

Live voice — not a recording. Not push-to-talk. A real conversation.

Lux connects to OpenAI's Realtime API via WebRTC — the same technology that powers browser-based video calls. You speak, she listens, she responds with her voice. The exchange is live and bidirectional, with voice activity detection handling turn-taking automatically.

Supported Realtime voices: alloy · echo · shimmer · marin · cedar. Default: marin. A passive listening mode lets her hear the room without interrupting — she stays aware without responding unless addressed.

WebRTC · browser-native OpenAI Realtime API Voice activity detection Passive listening mode Phone companion UI Conversational voice UI Voice-triggered game start Session setup via voice

// tts pipeline & capabilities

A custom voice pipeline with four providers in fallback order.

Lux's text-to-speech isn't hardwired to one service. The TTS Router picks the best available provider — highest quality first, browser TTS as the last-resort fallback.

chatterbox turbo

GPU-accelerated synthesis

First priority in the TTS chain. Chatterbox Turbo runs on a RunPod GPU instance — fast, high-quality synthesis. Supports custom voice cloning via voice_slug reference. The closest thing to a natural-sounding custom voice at speed.

coqui tts

Self-hosted voice synthesis

Second in the fallback chain. Coqui runs self-hosted — no per-call API cost, no external dependency. Supports both built-in speaker IDs and custom voice references. Falls back to ElevenLabs if unavailable.

elevenlabs

API-based voice quality

Third in the chain. ElevenLabs provides premium voice synthesis via API when the self-hosted options aren't available. Per-call cost but exceptional output quality for fallback use.

browser tts

Always-available fallback

The Web Speech API is the final fallback — zero latency, zero cost, no server required. Lower quality than the others but Lux always has a voice even when all external services are unreachable.

voice context injection

Spoken responses feel different

When voice input is active, the system prompt gets an additional injection: "The user is speaking to you via voice. Respond conversationally. Keep responses concise and flowing." Lux adapts her response style for spoken delivery.

phone companion

A dedicated voice UI mode

A separate phone companion interface (starkx-phone-companion.js) provides a focused voice interaction mode — designed for situations where the full chat UI isn't needed and voice is the primary interface.

A real voice conversation.
Not push-to-talk.

Live voice — not a recording. Not push-to-talk. A real conversation.

A custom voice pipeline with four providers in fallback order.

GPU-accelerated synthesis

Self-hosted voice synthesis

API-based voice quality

Always-available fallback

Spoken responses feel different

A dedicated voice UI mode

Want a custom voice AI built into your platform?

Explore the full StarkX suite

Other builds from Stark Dev

A real voice conversation.Not push-to-talk.

Live voice — not a recording. Not push-to-talk. A real conversation.

A custom voice pipeline with four providers in fallback order.

GPU-accelerated synthesis

Self-hosted voice synthesis

API-based voice quality

Always-available fallback

Spoken responses feel different

A dedicated voice UI mode

Want a custom voice AI built into your platform?

Explore the full StarkX suite

Other builds from Stark Dev

A real voice conversation.
Not push-to-talk.