Skip to main content

Overview

FluidAudio TTS supports custom pronunciation dictionaries that override how specific words are pronounced. Essential for domain-specific terminology, brand names, acronyms, and proper nouns.

Priority Order

  1. Per-word phonetic overrides — Inline markup like [word](/phonemes/)
  2. Custom lexicon — Your word=phonemes file entries
  3. Case-sensitive built-in lexicon
  4. Standard built-in lexicon
  5. Grapheme-to-phoneme (G2P) — eSpeak-NG fallback

File Format

# This is a comment
kokoro=kəkˈɔɹO
NASDAQ=nˈæzdæk
UN=junˈaɪtᵻd nˈeɪʃənz
Phonemes are compact IPA strings. Use whitespace to separate words in multi-word expansions.

Word Matching

Three-tier strategy:
  1. Exact matchNASDAQ matches only NASDAQ
  2. Case-insensitivenasdaq matches NASDAQ, Nasdaq
  3. Normalized — Strips to letters/digits/apostrophes, lowercased

Usage

CLI

swift run fluidaudio tts "The NASDAQ index rose today" \
  --lexicon custom.txt --output output.wav

Swift API

let lexicon = try TtsCustomLexicon.load(from: fileURL)

let manager = TtSManager(customLexicon: lexicon)
try await manager.initialize()
let audio = try await manager.synthesize(text: "Welcome to Kokoro TTS")

// Update at runtime
manager.setCustomLexicon(newLexicon)

Merging Lexicons

let combined = baseLexicon.merged(with: domainLexicon)

Example Lexicon

# Finance
NASDAQ=nˈæzdæk
EBITDA=iːbˈɪtdɑː

# Technology
NVIDIA=ɛnvˈɪdiə
Kubernetes=kuːbɚnˈɛtiːz

# Product Names
Kokoro=kəkˈɔɹO
FluidAudio=flˈuːɪd ˈɔːdioʊ