Skip to main content
FluidAudio is a Swift SDK for fully local, low-latency audio AI on Apple devices. All inference runs on the Apple Neural Engine (ANE), keeping CPU and GPU free for your app.

At a Glance

CapabilityModelSpeedAccuracyLanguages
TranscriptionParakeet TDT 0.6B210x RTFx2.5% WER (en), 14.7% avg (25 lang)25 European
Streaming ASRParakeet EOU 120M12x RTFx4.9% WER (en)English
Speaker DiarizationPyannote CoreML122x RTFx15% DER (offline)Language-agnostic
Streaming DiarizationSortformer127x RTFx31.7% DERLanguage-agnostic
Voice ActivitySilero VAD v61230x RTFx96% accuracyLanguage-agnostic
Text-to-SpeechKokoro 82M23x RTFx48 voicesEnglish
Text-to-SpeechPocketTTS 155MStreaming~80ms first audioEnglish
All benchmarks on M4 Pro. ASR on LibriSpeech / FLEURS, diarization on VoxConverse / AMI, VAD on VOiCES / MUSAN. See full benchmarks for per-language breakdowns and device comparisons.

When to Use Which

Transcription

NeedUseWhy
Transcribe recordings/filesParakeet TDT v3Fastest, 25 languages, 210x real-time
English-only, best accuracyParakeet TDT v22.1% WER vs 2.5% on LibriSpeech
Live captions as user speaksParakeet EOU160ms chunks, end-of-utterance detection
Domain-specific terms (names, jargon)TDT + CTC vocabulary boosting99.3% precision, 85.2% recall on earnings calls

Speaker Diarization

NeedUseWhy
Best accuracy (post-recording)Offline pipeline (VBx)15% DER, full pyannote-compatible pipeline
Real-time “who’s speaking now”Streaming pipeline26% DER at 5s chunks, speaker tracking across chunks
Simple 2-4 speaker meetingsSortformerSingle model, no clustering, 32% DER

Voice Activity Detection

NeedUseWhy
Segment audio before ASROffline segmentationClean segments with min/max duration controls
Real-time speech detectionStreaming VADPer-chunk events with hysteresis

Text-to-Speech

NeedUseWhy
Highest quality, full generationKokoro48 voices, SSML support, flow matching
Streaming audio (start playing fast)PocketTTS~80ms to first audio, no espeak dependency

Platform Support

PlatformPackage
Swift (iOS / macOS)FluidAudio
React Native / Expo@fluidinference/react-native-fluidaudio
Rust / Taurifluidaudio-rs

Requirements

  • macOS 14+ / iOS 17+
  • Swift 5.10+
  • Apple Silicon recommended

Model Conversion

All FluidAudio models are converted through möbius, our open-source model conversion framework. It handles export, numerical validation, and quantization for CoreML and other edge runtimes. See the möbius docs to convert your own models.