Skip to main content

When to Use

  • Transcribing recordings or files — Use batch ASR (this page). 210x real-time, 2.5% WER on English.
  • Live captions while user speaks — Use Streaming ASR with Parakeet EOU.
  • Domain-specific terms keep getting wrong — Add Custom Vocabulary boosting (91.7% F-score on earnings calls).

Models

ModelLanguagesAudio LengthUse Case
Parakeet TDT v325 European~15s chunksDefault — multilingual
Parakeet TDT v2English only~15s chunksBest English accuracy
Parakeet EOUEnglish320ms chunksReal-time streaming

Batch Transcription

Real-time factor: ~120x on M4 Pro (1 minute of audio in ~0.5 seconds).
import FluidAudio

Task {
    let models = try await AsrModels.downloadAndLoad(version: .v3) // .v2 for English-only
    let asrManager = AsrManager(config: .default)
    try await asrManager.initialize(models: models)

    let samples = try AudioConverter().resampleAudioFile(
        path: "path/to/audio.wav"
    )
    let result = try await asrManager.transcribe(samples, source: .system)
    print("Transcription: \(result.text)")
    print("Confidence: \(result.confidence)")
}

Transcribing from a File URL

let audioURL = URL(fileURLWithPath: "/path/to/audio.wav")
let result = try await asrManager.transcribe(audioURL, source: .system)
print(result.text)
Do not parse WAV/PCM bytes by hand. Always convert with AudioConverter so differing bit depths, channel layouts, metadata chunks, or compressed formats get normalized to the 16 kHz mono Float32 tensors that Parakeet expects.

Choosing a Model Version

  • v2 — English only. Tighter vocabulary, better recall on long-form English audio.
  • v3 — 25 European languages. English accuracy is still strong, but the broader vocab slightly trails v2 on rare words.
Both share the same API surface—set AsrModelVersion in code or pass --model-version in the CLI.
let models = try await AsrModels.downloadAndLoad(version: .v2)

Benchmarks

LibriSpeech test-clean (2,620 files, 5.4h audio):
ModelWERRTFx
Parakeet TDT v32.5%156x
Parakeet TDT v22.1%146x
Parakeet EOU (320ms)4.9%12x
FLEURS (14,085 files, 44.9h audio, 25 languages):
ModelAvg WERRTFx
Parakeet TDT v314.7%210x
See full benchmarks for per-language breakdown.

CLI

# Transcribe (multilingual)
swift run fluidaudio transcribe audio.wav

# English-only (better recall)
swift run fluidaudio transcribe audio.wav --model-version v2

# Multiple files in parallel
swift run fluidaudio multi-stream audio1.wav audio2.wav

# Benchmark on LibriSpeech
swift run fluidaudio asr-benchmark --subset test-clean --max-files 50