Streaming TTS Explained

Why your AI co-hosts sometimes respond instantly and sometimes take a few seconds. Learn how streaming TTS creates natural conversations.

Total time · 5Difficulty · BeginnerHostFabric Control access

01 · Understand the two TTS methods

1 min

HostFabric uses two different audio playback systems depending on your TTS provider. Streaming TTS (ElevenLabs) starts playing immediately, while traditional TTS waits for complete audio generation.

Check your TTS provider in Settings → Voices
Look for "streaming" indicator in debug mode

02 · Experience streaming TTS

2 min

With ElevenLabs, responses start playing within 300-600ms. This creates natural conversation flow where AI can interrupt or respond quickly to live chat.

Switch to ElevenLabs TTS if not already using it
Send a short prompt and notice the fast response
Try interrupting with another prompt mid-response

03 · Compare with traditional TTS

1 min

Google TTS and OpenAI TTS use traditional methods with 2-4 second delays. Perfect for pre-recorded content but slower for live interaction.

Switch to Google TTS to see the difference
Notice the longer wait before audio starts
See how responses feel more "robotic" in timing

04 · Optimize for your streaming style

1 min

Choose ElevenLabs for live, interactive shows. Use Google/OpenAI TTS for tutorials, presentations, or when perfect lip-sync is critical.

Pick TTS based on your content type
Test both methods during rehearsal
Consider having backup TTS configured

Pro tip: Streaming TTS works best with ElevenLabs. If you're doing live conversations, it's worth the small premium for that natural, responsive feel.