Skip to main content

ASR Models

Batch Transcription

ModelDescription
Parakeet TDT v325 European languages, 0.6B params. Default ASR model.
Parakeet TDT v2English only, 0.6B params. Better English recall.
TDT models process audio in chunks (~15s with overlap). Fast enough for dictation-style workflows.

Streaming Transcription

ModelDescription
Parakeet EOU120M params. 160ms/320ms frames for real-time results with end-of-utterance detection.

Custom Vocabulary

ModelDescription
Parakeet CTC 110MCTC-based keyword spotting alongside TDT.
Parakeet CTC 0.6BLarger CTC variant.

VAD Models

ModelDescription
Silero VAD v6Voice activity detection on 256ms windows.

Diarization Models

ModelDescription
Pyannote CoreML PipelineSegmentation + WeSpeaker embeddings. Online and offline modes.
SortformerEnd-to-end streaming diarization. Single neural network, 4 speaker slots.

TTS Models

ModelDescription
Kokoro TTS82M params, 48 voices. Flow matching + Vocos vocoder. Requires espeak.
PocketTTS155M params. Autoregressive, no espeak dependency.

HuggingFace Sources