Documentation Index
Fetch the complete documentation index at: https://docs.fluidinference.com/llms.txt
Use this file to discover all available pages before exploring further.
ASR Models
Batch Transcription
| Model | Description |
|---|
| Parakeet TDT v3 | 25 European languages, 0.6B params. Default ASR model. |
| Parakeet TDT v2 | English only, 0.6B params. Better English recall. |
TDT models process audio in chunks (~15s with overlap). Fast enough for dictation-style workflows.
Streaming Transcription
| Model | Description |
|---|
| Parakeet EOU | 120M params. 160ms/320ms frames for real-time results with end-of-utterance detection. |
Custom Vocabulary
| Model | Description |
|---|
| Parakeet CTC 110M | CTC-based keyword spotting alongside TDT. |
| Parakeet CTC 0.6B | Larger CTC variant. |
VAD Models
| Model | Description |
|---|
| Silero VAD v6 | Voice activity detection on 256ms windows. |
Diarization Models
| Model | Description |
|---|
| Pyannote CoreML Pipeline | Segmentation + WeSpeaker embeddings. Online and offline modes. |
| Sortformer | End-to-end streaming diarization. Single neural network, 4 speaker slots. |
TTS Models
| Model | Description |
|---|
| Kokoro TTS | 82M params, 48 voices. Flow matching + Vocos vocoder. Requires espeak. |
| PocketTTS | 155M params. Autoregressive, no espeak dependency. |
HuggingFace Sources