Social Stream Ninja - Local AI TTS Guide

Social Stream Ninja supports fully local AI text-to-speech — your chat messages are converted to voice entirely on your own computer, with no data sent to external servers and no API key required.

There are two approaches:

Path 1 — Built-in TTS No Setup

Several high-quality AI voices are built directly into Social Stream Ninja. They run in your browser using WebAssembly or ONNX — no server, no Docker, no install.

Kokoro 82M (best quality)
Piper Neural TTS
Kitten TTS
eSpeak-NG

Just add a URL parameter and go.

Path 2 — Self-Hosted Server Docker Required

Run a local TTS server on your machine and point Social Stream Ninja at it. Gives you more voice options, voice cloning, and server-side control.

Kokoro-FastAPI
openedai-speech (Piper)
kokoro-web

Uses Social Stream's built-in OpenAI-compatible endpoint support.

Start with Path 1. The built-in Kokoro TTS rivals cloud services in quality, runs entirely in your browser, and works with OBS out of the box. Only move to Path 2 if you need more control or different models.

In the extension popup, open the TTS provider selector and choose Custom / Local TTS Endpoint. That shows the OpenAI-compatible local endpoint fields and the link back to this guide.

SSN Popup - TTS Provider

TTS Provider Custom / Local TTS Endpoint

API Key Optional for local endpoints. Leave blank for localhost / 127.0.0.1.

Custom / Local API Endpoint http://127.0.0.1:8124/v1/audio/speech for the SSN bridge, or the server endpoint from the setup section below.

Voice / Model / Format Use the voice name from the local server. Use wav when a voice-cloning server returns WAV.

Test

About screenshots: the SSN field map above shows the local endpoint fields. Third-party server UIs change by project version, so their current screenshots and UI details are linked from each project repo near the relevant setup step.

SSN treats a local/self-hosted TTS server like an OpenAI-compatible speech endpoint. The core flow is:

chat text -> SSN TTS request -> local endpoint or SSN bridge -> TTS server -> audio response -> SSN playback

Request Shape

For ttsprovider=customtts, localtts, or openai, SSN sends a JSON POST to the configured endpoint:

POST /v1/audio/speech { "model": "tts-1", "input": "Chat message text", "voice": "af_bella", "response_format": "mp3", "speed": 1.0 }

CORS and the Bridge

If the request comes from Chrome or an OBS browser source, the local server must allow browser CORS requests. If it does not, run the SSN local TTS bridge and point SSN at http://127.0.0.1:8124/v1/audio/speech. The bridge adds browser-safe CORS headers and can translate GPT-SoVITS or F5-TTS wrapper requests.

Supported Audio Responses

Response	SSN support	Notes
Binary audio	Yes	Best option. Return `audio/mpeg`, `audio/wav`, `audio/ogg`, `audio/aac`, or another browser-playable audio type.
JSON with audio URL	Yes	SSN checks `url`, `audio_url`, `output_url`, nested `data.url`, and the first `data[]` item.
JSON with base64 audio	Yes	SSN checks `audio`, `audio_data`, `audioContent`, `b64_json`, nested `data` fields, and data URLs.
Raw PCM	Only if wrapped	Return PCM as a WAV file or base64 WAV. A browser audio element cannot reliably play raw PCM bytes directly.

Recommended formats: use mp3 for small files and broad browser support, wav for local cloning servers and bridge testing, and opus only when the server and browser both support it.

Streaming Audio

SSN does not currently do progressive playback for custom/local TTS endpoints. It waits for the response blob or JSON audio payload, then plays it. Some upstream servers expose streaming endpoints, but SSN's current OpenAI-compatible path buffers before playback.

Practical result: keep chat TTS snippets short. Streaming support would need a separate playback path using streamed WAV/MP3 chunks, MediaSource, WebCodecs, or a server-side mixer.

These engines are bundled inside Social Stream Ninja and require no installation. They run in the browser using WebAssembly (WASM) or ONNX Runtime.

Provider	Quality	CPU Use	GPU/WebGPU	URL Parameter
Kokoro TTS	⭐⭐⭐⭐⭐ Excellent	Medium	Faster with GPU	`?ttsprovider=kokoro`
Piper TTS	⭐⭐⭐⭐ Very Good	Low	CPU only	`?ttsprovider=piper`
Kitten TTS	⭐⭐⭐ Good	Very Low	CPU only	`?ttsprovider=kitten`
eSpeak-NG	⭐⭐ Robotic	Minimal	CPU only	`?ttsprovider=espeak`

How to Enable

Add &ttsprovider= and &speech= to your Social Stream dock.html URL:

dock.html?session=YOUR_SESSION&speech=en-US&ttsprovider=kokoro

Kokoro TTS Options

Kokoro has 26 built-in voices. Specify one with &voicekokoro=:

English female: af_bella, af_sarah, af_nicole, af_sky English male: am_adam, am_michael British female: bf_emma, bf_isabella British male: bm_george, bm_lewis

dock.html?session=YOUR_SESSION&speech=en-US&ttsprovider=kokoro&voicekokoro=af_bella&kokorospeed=1.1

Piper TTS Options

Specify a voice model with &pipervoice=:

dock.html?session=YOUR_SESSION&speech=en-US&ttsprovider=piper&pipervoice=en_US-hfc_female-medium

Kitten TTS Options

dock.html?session=YOUR_SESSION&speech=en-US&ttsprovider=kitten&kittenvoice=expr-voice-4-f

eSpeak-NG Options

dock.html?session=YOUR_SESSION&speech=en-US&ttsprovider=espeak&espeakvoice=en&espeakspeed=175

First-time load: Kokoro and Piper need to download their model files on first use (~50–200 MB). This happens automatically in the background. Subsequent loads are instant (cached in browser).

OBS capture: All built-in TTS providers play audio directly through the browser. In OBS, add your dock.html as a Browser Source and enable "Control audio via OBS" — no virtual cables needed. See the OBS section below.

The Chrome extension, OBS browser source, and standalone Social Stream Ninja desktop app all use the same dock.html URL parameters for TTS.

Surface	Local TTS behavior	Audio capture
Chrome extension / OBS browser source	Browser fetch requires CORS from the local server, unless you use the SSN bridge.	Use OBS Browser Source with "Control audio via OBS".
Standalone desktop app	Uses the same provider settings. The app's local file windows are less CORS constrained, but the bridge is still the safest path for servers that reject browser-style requests.	Capture desktop/app audio, or route the app to a virtual audio cable.
Built-in Kokoro in desktop app	The app can use its local `ninjafy.tts` path for Kokoro instead of relying only on browser model loading.	Audio plays from the app, so use desktop/app audio capture.

Packaging note: when shipping the standalone app, make sure the current social_stream files, thirdparty/ assets, and docs/ guide are included in the app's fallback resource bundle.

If you want more voice options, voice cloning, or a dedicated server you can reuse across tools, you can run a local TTS server. Social Stream Ninja connects to it using its built-in OpenAI-compatible TTS endpoint support — no API key needed for local servers.

Requirements: Docker Desktop must be installed and running. Docker is free for personal use.

Three recommended options:

Kokoro-FastAPI repo openedai-speech repo kokoro-web repo

Server	Model	GPU	Disk	Default Port
Kokoro-FastAPI Recommended	Kokoro 82M	Optional	~2 GB	8880
openedai-speech (Piper) Lightweight	Piper TTS	CPU only	<1 GB	8000
kokoro-web	Kokoro 82M	Optional	~2 GB	3000

Which Package Fits?

Package	Best benefit	Tradeoff
Built-in Kokoro	Best first choice: no server, strong quality, private, works in browser and desktop app.	No voice cloning.
Kokoro-FastAPI	OpenAI-compatible server, easy Docker setup, CPU or GPU, many Kokoro voices.	No real voice cloning; voice blending and custom voice features depend on the server build.
openedai-speech	Light OpenAI-compatible endpoint; Piper is CPU-friendly and XTTS adds cloning around a 4 GB VRAM target.	The repo says it is mostly obsolete, so treat it as useful but not future-proof.
Chatterbox servers	Voice cloning, web UI options, OpenAI-compatible APIs, long-text tooling.	CUDA/GPU support is smoother than CPU for some builds; setup varies by server fork.
GPT-SoVITS	Strong cloning/control with short references and transcript support.	Not OpenAI-compatible by default; use the SSN bridge mode.
F5-TTS	Natural zero-shot cloning with prompt WAV + transcript.	Official project is not a simple OpenAI endpoint; use a wrapper or bridge mode.
Qwen3-TTS	Modern cloning and voice-design features, including smaller 0.6B/1.7B models.	Library/demo first; needs a wrapper for SSN.
MisoTTS	High-end prompted speech generation.	Not a 6 GB VRAM local target; use remote/custom hosting if needed.

Voice cloning is not a separate SSN mode. It is a feature inside some local TTS servers. SSN sends the chat text to a local endpoint; the server chooses the cloned voice from a saved reference audio file, a voice profile, or bridge configuration.

Typical Flow

Record a clean reference clip, usually 3 to 30 seconds of one speaker with little background noise.
Some engines also require the exact transcript of that reference clip.
The local server converts the reference into a speaker prompt, embedding, or voice profile.
SSN sends live chat text to the endpoint using ttsprovider=customtts.
The server returns a playable audio file, usually WAV or MP3, and SSN plays it in the dock/browser source.

Use consented voices only. Voice cloning can sound like a real person, so only use voices you own, have permission to use, or have clearly licensed for this purpose.

For 6 GB VRAM or less, target small zero-shot cloning models and OpenAI-compatible servers first. Bigger models can still work through the same SSN endpoint if the user hosts them elsewhere.

XTTS / openedai-speech chatterbox-tts-api Chatterbox-TTS-Server GPT-SoVITS F5-TTS Qwen3-TTS MisoTTS

Option	Voice cloning	6 GB VRAM fit	API path for SSN
Qwen3-TTS 0.6B Base	3-second reference audio	Likely	Use an OpenAI-compatible wrapper, then `ttsprovider=customtts`
XTTS-v2 / openedai-speech	Short WAV reference voices	Yes, about 4 GB reported by openedai-speech	`/v1/audio/speech`
Chatterbox Turbo / Server	Reference-audio cloning	Likely if using Turbo / small chunks	OpenAI-compatible server builds, or the bridge
GPT-SoVITS	5-second zero-shot, 1-minute few-shot	Likely with fp16 / lightweight install	Use `scripts/local-tts-bridge.cjs --mode gptsovits`
F5-TTS	Prompt WAV + transcript	Maybe; depends on build and vocoder	Use an OpenAI-compatible wrapper, or `--mode f5` for F5-TTS server wrappers
MisoTTS 8B	Prompted audio context	No; project recommends 24 GB VRAM	Remote/custom endpoint only

Best SSN target shape: accept POST /v1/audio/speech with { model, input, voice, response_format, speed } and return a playable audio file. That covers OpenAI, Coqui/XTTS, Kokoro wrappers, Qwen wrappers, and most proxy services.

These are practical starting points, not hard guarantees. Model version, quantization, text length, Docker image, and background apps can change memory use.

Option	Minimum practical computer	Good target	Notes
System TTS / eSpeak	Any modern PC	Any PC	Fast, low quality, no cloning.
Built-in Kitten	Low-end CPU, 4 GB RAM	Modern laptop CPU, 8 GB RAM	Small ONNX model, fast startup.
Built-in Piper	Modern CPU, 4-8 GB RAM	Modern CPU, 8 GB RAM	Good low-resource neural voice option.
Built-in Kokoro	Modern CPU, 8 GB RAM	WebGPU-capable GPU or fast CPU, 8-16 GB RAM	Best zero-setup quality. First load downloads model assets.
Kokoro-FastAPI	CPU Docker host, 8 GB RAM	NVIDIA GPU optional, 8-16 GB RAM	Good local server when browser model loading is not ideal.
openedai-speech Piper	CPU, 4-8 GB RAM	CPU, 8 GB RAM	Light OpenAI-compatible server.
openedai-speech XTTS	NVIDIA GPU around 4 GB VRAM, 8-16 GB RAM	6 GB+ NVIDIA GPU, 16 GB RAM	Voice cloning path; CPU is possible but slow.
Chatterbox servers	CPU can work for some builds but is slow	6 GB+ NVIDIA GPU, 16 GB RAM	Use GPU when cloning or processing long text.
GPT-SoVITS / F5-TTS / Qwen3-TTS	CPU testing only, slow	6 GB+ NVIDIA GPU for smaller/optimized models, 16 GB RAM	Wrapper choice and model size matter. Expect more setup.
MisoTTS 8B	Not recommended locally at 6 GB VRAM	24 GB VRAM or remote host	The repo recommends high-VRAM GPUs for interactive use.

These are the self-hosted voice-cloning targets checked for SSN compatibility. The local endpoint path was tested against both dock.html and featured.html.

SSN accepts direct binary audio responses, JSON responses with base64 audio, and JSON responses with an audio URL. Current custom/local playback buffers the returned audio before playing it; progressive streaming playback is not supported yet.

Server	SSN path	Notes
openedai-speech	Direct or bridge	OpenAI-compatible `/v1/audio/speech`. Piper mode was tested with real CPU synthesis from `dock.html` and `featured.html`, direct and through the bridge. If running from source on Windows, make sure the venv `Scripts` folder is on `PATH` so `piper.exe` and `ffmpeg.exe` can be found.
chatterbox-tts-api	Direct or bridge	OpenAI-compatible `/v1/audio/speech`. Uses configured reference audio for cloning. API shape was tested direct and through the bridge.
Chatterbox-TTS-Server	Direct or bridge	OpenAI-compatible endpoint and Web UI. Tested with real CPU synthesis using `Emily.wav` from `dock.html` and `featured.html`, direct and through the bridge.
GPT-SoVITS	Bridge mode	Run SSN bridge with `--mode gptsovits`; target server is `/tts`, not OpenAI-compatible.
F5-TTS_server	Bridge mode	Run SSN bridge with `--mode f5`; target server uses `GET /synthesize_speech/`.
F5-TTS official	Needs wrapper	CLI, Gradio, and socket server first. Use an OpenAI-compatible wrapper or the F5 bridge mode against a wrapper.
Qwen3-TTS	Needs wrapper	Library and Gradio demo first. Good candidate for a small OpenAI-compatible wrapper around `generate_voice_clone`.
MisoTTS	Remote/custom only	Voice cloning is supported, but the 8B model is not a 6 GB VRAM target and has no local REST endpoint in the repo.

Kokoro-FastAPI runs the Kokoro 82M model as a local server with an OpenAI-compatible API. It works on CPU (no GPU required) and has excellent voice quality.

Install with Docker

Open a terminal (Command Prompt, PowerShell, or Terminal) and run one of the following:

CPU (works on any computer):

docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.2

GPU (NVIDIA only — faster synthesis):

docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.0post4

First run: Docker will download the image (~1.5–2 GB). This only happens once. After that, the server starts in a few seconds.

Verify It's Running

Open your browser and go to http://localhost:8880/web/ — you should see a web UI where you can test voices.

Available Voices

67+ voices available. A few highlights:

af_bella, af_sarah, af_nicole, af_sky, af_heart (American female) am_adam, am_michael (American male) bf_emma, bf_isabella (British female) bm_george, bm_lewis (British male)

Browse and test all voices at http://localhost:8880/web/ once the server is running.

SSN URL

dock.html?session=YOUR_SESSION&speech=en-US&ttsprovider=openai&openaiendpoint=http://localhost:8880/v1/audio/speech&voiceopenai=af_bella

Keep the Server Running

To keep Kokoro-FastAPI running automatically in the background, use Docker's restart flag:

docker run -d --restart unless-stopped -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.2

It will now start automatically with Docker Desktop on every reboot.

openedai-speech is the lightest option — a CPU-only Piper TTS server under 1 GB. Good for older or less powerful computers.

Install with Docker Compose

1

Clone the repository or create a folder with the following docker-compose.min.yml. Alternatively, run the commands below directly.

2

Run the minimal Piper-only image:

docker run -d --restart unless-stopped \ -p 8000:8000 \ ghcr.io/matatonic/openedai-speech-min

Windows Source Install Note

If you run openedai-speech from a local checkout instead of Docker, add its virtual environment scripts folder to PATH before starting the server. Without this, requests can return HTTP 500 because the server cannot find piper.exe or ffmpeg.exe.

cd openedai-speech $env:Path = "$PWD\.venv\Scripts;$env:Path" .\.venv\Scripts\python.exe speech.py --xtts_device none -H 127.0.0.1 -P 8000

Available Voices

openedai-speech uses OpenAI-style voice names mapped to Piper voices:

alloy, echo, fable, onyx, nova, shimmer

SSN URL

dock.html?session=YOUR_SESSION&speech=en-US&ttsprovider=openai&openaiendpoint=http://localhost:8000/v1/audio/speech&voiceopenai=nova

If a local TTS server does not allow browser CORS requests, run SSN's Node bridge locally and point SSN at the bridge instead. The bridge returns the third-party audio response unchanged. The standalone starter folder is local-tts-bridge/; see the bridge README for the code and launch options.

OpenAI-Compatible Proxy

$env:SSN_TTS_TARGET="http://127.0.0.1:8000/v1/audio/speech" npm run local-tts-bridge

Or run it from the bridge folder:

cd local-tts-bridge node server.cjs

dock.html?session=YOUR_SESSION&speech=en-US&ttsprovider=customtts&openaiendpoint=http://127.0.0.1:8124/v1/audio/speech&voiceopenai=nova

GPT-SoVITS Proxy Mode

GPT-SoVITS uses its own /tts JSON shape, so the bridge can translate SSN's OpenAI-compatible request into the GPT-SoVITS request body.

$env:SSN_TTS_REF_AUDIO_PATH="C:\voices\speaker.wav" $env:SSN_TTS_REF_TEXT="Reference audio transcript here." $env:SSN_TTS_TARGET="http://127.0.0.1:9880/tts" npm run local-tts-bridge -- --mode gptsovits

dock.html?session=YOUR_SESSION&speech=en-US&ttsprovider=customtts&openaiendpoint=http://127.0.0.1:8124/v1/audio/speech&openaiformat=wav

F5-TTS Server Proxy Mode

Some F5-TTS server wrappers expose /synthesize_speech/?text=...&voice=... instead of an OpenAI-compatible endpoint. The bridge can translate SSN's request into that query format.

$env:SSN_TTS_TARGET="http://127.0.0.1:7860/synthesize_speech/" npm run local-tts-bridge -- --mode f5

dock.html?session=YOUR_SESSION&speech=en-US&ttsprovider=customtts&openaiendpoint=http://127.0.0.1:8124/v1/audio/speech&voiceopenai=default_en&openaiformat=wav

Bridge endpoint: http://127.0.0.1:8124/v1/audio/speech. Change the port with SSN_TTS_BRIDGE_PORT=8125 if needed.

All self-hosted servers above use the same connection method — Social Stream's built-in OpenAI TTS endpoint support with a custom local URL.

URL Parameters

Parameter	Value	Description
`ttsprovider`	`customtts` or `openai`	Use the OpenAI-compatible TTS path. Use `customtts` for local/self-hosted endpoints.
`openaiendpoint`	`http://localhost:8880/v1/audio/speech`	Your local server URL (change port as needed)
`speech`	`en-US`	Enables TTS for English
`voiceopenai`	`af_bella`	Voice name (depends on server)
`openaiformat`	`mp3`	Audio format: mp3, wav, opus, flac
`openaispeed`	`1.0`	Speaking speed (0.5–2.0)

Endpoint aliases: customttsendpoint and localttsendpoint also work. customttsvoice, localttsvoice, customttsmodel, localttsmodel, customttsformat, and localttsformat are accepted aliases for the OpenAI-style fields.

Full Example URLs

Kokoro-FastAPI:

dock.html?session=YOUR_SESSION&speech=en-US&ttsprovider=customtts&openaiendpoint=http://localhost:8880/v1/audio/speech&voiceopenai=af_bella&openaispeed=1.1

openedai-speech:

dock.html?session=YOUR_SESSION&speech=en-US&ttsprovider=customtts&openaiendpoint=http://localhost:8000/v1/audio/speech&voiceopenai=nova

kokoro-web:

dock.html?session=YOUR_SESSION&speech=en-US&ttsprovider=customtts&openaiendpoint=http://localhost:3000/api/v1/audio/speech&voiceopenai=af_bella

Additional TTS Options

These work with any TTS provider, including local servers:

Parameter	Example	Description
`simpletts`	`&simpletts`	Skip "says" — reads message only
`simpletts2`	`&simpletts2`	Skip usernames entirely
`volume`	`&volume=0.8`	Volume level (0.0–1.0)
`skipmessages`	`&skipmessages=3`	Read every 3rd message only
`ttscommand`	`&ttscommand=!say`	Only read messages starting with !say
`readevents`	`&readevents`	Also read subscriptions, donations, etc.

No API key needed. When using a local server (non-openai.com URL), Social Stream Ninja sends the request without an Authorization header. You do not need to configure a key.

Built-in Browser Options Worth Supporting

SSN already supports OS/browser speechSynthesis, built-in Kokoro, Piper, Kitten, and eSpeak. The most useful future browser-side additions would be an audio output device picker where setSinkId is available, more Piper voice choices, and a dedicated progressive streaming playback path for servers that can stream audio chunks.

How you capture TTS audio in OBS depends on how you're running Social Stream Ninja.

Method 1 — OBS Browser Source Recommended

This is the simplest method and works for all TTS providers (built-in and self-hosted server).

1

In OBS, add a new Browser Source

2

Set the URL to your dock.html URL with TTS parameters

3

Check "Control audio via OBS" in the browser source settings

4

Click OK — TTS audio will now appear as an OBS audio source you can adjust or route

5

Click the browser source once in the preview to allow browser audio autoplay

Why this works: Built-in TTS and self-hosted server TTS both play audio through the browser's audio context (not OS speech synthesis). OBS can capture browser audio directly when "Control audio via OBS" is checked.

Method 2 — SSN Desktop App + Desktop Audio

If you're using the Social Stream Ninja standalone desktop app (not an OBS browser source):

1

TTS audio plays through your system speakers/headphones from the app

2

In OBS, add an Audio Input Capture or Desktop Audio Capture source

3

If you want TTS isolated from other desktop audio, use a virtual audio cable:

Windows: VB-Audio Virtual Cable (free)
Set CABLE Input as the output for the SSN app in Windows Sound settings
Capture CABLE Output in OBS with Audio Input Capture

Windows Audio Routing Links

VB-Audio Virtual Cable Voicemeeter Audio Router (legacy)

Windows 10 Per-App Route

1

Open Sound Settings > App volume and device preferences.

2

Find the browser or SSN app in the app list.

3

Set Output to CABLE Input (VB-Audio Virtual Cable).

4

In OBS, add Audio Input Capture and choose CABLE Output.

Windows 11 Per-App Route

1

Open Settings > System > Sound > Volume Mixer.

2

Find the browser or SSN app.

3

Set Output device to CABLE Input (VB-Audio Virtual Cable).

4

In OBS, add Audio Input Capture and choose CABLE Output.

Audio Router Software

Audio Router can route one app to a virtual cable, but it is older software. Prefer the Windows per-app route when it works.

1

Install Audio Router.

2

Route the browser or SSN app to CABLE Input.

3

In OBS, capture CABLE Output.

Voicemeeter Advanced Route

Voicemeeter is best when you need to hear TTS locally, route it to OBS, and keep it separate from music/game audio.

1

Install Voicemeeter and set it as the Windows default output.

2

Set Hardware Out to your speakers/headphones.

3

Route the virtual output into OBS as an Audio Input Capture source.

System TTS (?speech=en-US without a provider) uses OS speech synthesis, which cannot be captured by OBS browser source. Use one of the providers above (kokoro, piper, etc.) instead.

Option	Setup	Quality	Private	OBS (Browser Source)	GPU Needed	Cost
Built-in Kokoro	None	⭐⭐⭐⭐⭐	Yes	Yes	No (faster with)	Free
Built-in Piper	None	⭐⭐⭐⭐	Yes	Yes	No	Free
Built-in Kitten	None	⭐⭐⭐	Yes	Yes	No	Free
Built-in eSpeak	None	⭐⭐	Yes	Yes	No	Free
Kokoro-FastAPI	Docker	⭐⭐⭐⭐⭐	Yes	Yes	No (optional)	Free
openedai-speech	Docker	⭐⭐⭐⭐	Yes	Yes	No	Free
ElevenLabs	API Key	⭐⭐⭐⭐⭐	No	Yes	No	Paid tiers
System TTS	None	⭐⭐	Yes	No*	No	Free

* System TTS requires virtual audio cable routing for OBS capture.

No audio playing

Click the page first. Browsers require user interaction before playing audio. Click anywhere on the dock.html page once.
Check that &speech=en-US is in your URL.
Check the browser console (F12) for error messages.

Kokoro / Piper takes a long time on first load

The model files are being downloaded (~50–200 MB). This only happens once — subsequent loads use the cached version. Wait for the first message before testing.

Local server not responding (self-hosted setup)

Confirm Docker Desktop is running and the container is started.
Open http://localhost:8880 (or your port) in a browser to confirm the server is up.
Check that no firewall is blocking the port.
If openedai-speech from source returns HTTP 500 on Windows, add .venv\Scripts to PATH and restart it.

CORS error in browser console (extension mode)

If you're using the Social Stream browser extension (not the standalone app), your local server must allow cross-origin requests from the extension.

Most FastAPI-based servers (Kokoro-FastAPI, openedai-speech) allow all origins by default. If you see a CORS error:

Check the server's documentation for CORS configuration.
Run npm run local-tts-bridge and point SSN to http://127.0.0.1:8124/v1/audio/speech.
The SSN standalone app does not have this limitation — use it instead.

"http://localhost" blocked in OBS browser source

OBS browser sources can have trouble reaching localhost servers. Try:

Use http://127.0.0.1:8880/v1/audio/speech instead of localhost
Or use the SSN standalone app instead of OBS browser source

Audio plays but OBS doesn't capture it

Make sure "Control audio via OBS" is checked in the browser source properties.
In OBS Audio Mixer, look for the browser source — its meter should move when TTS plays.
Ensure you're not using System TTS (?speech=en-US without &ttsprovider=) — that bypasses browser audio.

Wrong voice / voice not found

Voice names are case-sensitive and must match what the server supports. Visit http://localhost:8880/web/ (Kokoro-FastAPI) to browse and test available voices.

Docker image not found

Image tags change with new releases. If the tag in this guide no longer works, check the project's GitHub page for the latest version tag.

Kokoro-FastAPI: github.com/remsky/Kokoro-FastAPI
openedai-speech: github.com/matatonic/openedai-speech
chatterbox-tts-api: github.com/travisvn/chatterbox-tts-api
Chatterbox-TTS-Server: github.com/devnen/Chatterbox-TTS-Server
GPT-SoVITS: github.com/RVC-Boss/GPT-SoVITS
F5-TTS: github.com/SWivid/F5-TTS
F5-TTS_server: github.com/ValyrianTech/F5-TTS_server
Qwen3-TTS: github.com/QwenLM/Qwen3-TTS
MisoTTS: github.com/MisoLabsAI/MisoTTS

Overview

Path 1 — Built-in TTS No Setup

Path 2 — Self-Hosted Server Docker Required

Where to Click in SSN

Self-Hosted Flow

Request Shape

CORS and the Bridge

Supported Audio Responses

Streaming Audio

Path 1 — Built-in TTS (Zero Setup)

How to Enable

Kokoro TTS Options

Piper TTS Options

Kitten TTS Options

eSpeak-NG Options

Browser and Desktop App Notes

Path 2 — Self-Hosted TTS Server

Which Package Fits?

How Voice Cloning Works

Typical Flow

Computer Requirements

Tested Server Notes

Kokoro-FastAPI Setup

Install with Docker

CPU (works on any computer):

GPU (NVIDIA only — faster synthesis):

Verify It's Running

Available Voices

SSN URL

Keep the Server Running

openedai-speech Setup (Lightweight Piper)

Install with Docker Compose

Windows Source Install Note

Available Voices

SSN URL

Local TTS Bridge

OpenAI-Compatible Proxy

GPT-SoVITS Proxy Mode

F5-TTS Server Proxy Mode

Connecting to Social Stream Ninja

URL Parameters

Full Example URLs

Kokoro-FastAPI:

openedai-speech:

kokoro-web:

Additional TTS Options

Built-in Browser Options Worth Supporting

Getting Audio into OBS

Method 1 — OBS Browser Source Recommended

Method 2 — SSN Desktop App + Desktop Audio

Windows Audio Routing Links

Windows 10 Per-App Route

Windows 11 Per-App Route

Audio Router Software

Voicemeeter Advanced Route

Comparison Table

Troubleshooting

No audio playing

Kokoro / Piper takes a long time on first load

Local server not responding (self-hosted setup)

CORS error in browser console (extension mode)

"http://localhost" blocked in OBS browser source

Audio plays but OBS doesn't capture it

Wrong voice / voice not found

Docker image not found