Work/Stark Dev AI/Voice & TTS
// lux · voice & tts

A real voice conversation.
Not push-to-talk.

Lux uses WebRTC to have a live, back-and-forth voice conversation via OpenAI's Realtime API. When she responds, her voice comes from a custom TTS pipeline with four providers in priority order — from a GPU-accelerated cloud synthesizer down to the browser. All configurable per mode.

← Back to overview
WebRTC realtime voice4-provider TTS stackChatterbox GPU synthesizerCoqui self-hosted TTSElevenLabs APIPhone companion UIPassive listening modeVoice-triggered actions
Lux · Voice Interface · Active
// webrtc realtime · speaking
Lux is responding…
gpt-4o-realtime-preview · voice: marin
01 Chatterbox Turbo RunPod GPU active
02 Coqui TTS self-hosted fallback
03 ElevenLabs API fallback
04 Browser TTS built-in Web Speech API fallback
// webrtc realtime

Live voice — not a recording. Not push-to-talk. A real conversation.

Lux connects to OpenAI's Realtime API via WebRTC — the same technology that powers browser-based video calls. You speak, she listens, she responds with her voice. The exchange is live and bidirectional, with voice activity detection handling turn-taking automatically.

Supported Realtime voices: alloy · echo · shimmer · marin · cedar. Default: marin. A passive listening mode lets her hear the room without interrupting — she stays aware without responding unless addressed.

WebRTC · browser-native OpenAI Realtime API Voice activity detection Passive listening mode Phone companion UI Conversational voice UI Voice-triggered game start Session setup via voice
// tts pipeline & capabilities

A custom voice pipeline with four providers in fallback order.

Lux's text-to-speech isn't hardwired to one service. The TTS Router picks the best available provider — highest quality first, browser TTS as the last-resort fallback.

chatterbox turbo

GPU-accelerated synthesis

First priority in the TTS chain. Chatterbox Turbo runs on a RunPod GPU instance — fast, high-quality synthesis. Supports custom voice cloning via voice_slug reference. The closest thing to a natural-sounding custom voice at speed.

coqui tts

Self-hosted voice synthesis

Second in the fallback chain. Coqui runs self-hosted — no per-call API cost, no external dependency. Supports both built-in speaker IDs and custom voice references. Falls back to ElevenLabs if unavailable.

elevenlabs

API-based voice quality

Third in the chain. ElevenLabs provides premium voice synthesis via API when the self-hosted options aren't available. Per-call cost but exceptional output quality for fallback use.

browser tts

Always-available fallback

The Web Speech API is the final fallback — zero latency, zero cost, no server required. Lower quality than the others but Lux always has a voice even when all external services are unreachable.

voice context injection

Spoken responses feel different

When voice input is active, the system prompt gets an additional injection: "The user is speaking to you via voice. Respond conversationally. Keep responses concise and flowing." Lux adapts her response style for spoken delivery.

phone companion

A dedicated voice UI mode

A separate phone companion interface (starkx-phone-companion.js) provides a focused voice interaction mode — designed for situations where the full chat UI isn't needed and voice is the primary interface.

// lux communicates with
This system connects to → 💬 Chat🖼 Images🧠 Memories🎮 Arcade

Want a custom voice AI built into your platform?

We built this entire voice pipeline from scratch. We can build the right version for your product.