> ## Documentation Index
> Fetch the complete documentation index at: https://docs.fluidinference.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Introduction

> Local audio AI for Apple devices — speech-to-text, speaker diarization, voice activity detection, and text-to-speech on the Neural Engine.

FluidAudio is a Swift SDK for fully local, low-latency audio AI on Apple devices. All inference runs on the Apple Neural Engine (ANE), keeping CPU and GPU free for your app.

## At a Glance

| Capability                | Model             | Speed      | Accuracy                           | Languages         |
| ------------------------- | ----------------- | ---------- | ---------------------------------- | ----------------- |
| **Transcription**         | Parakeet TDT 0.6B | 210x RTFx  | 2.5% WER (en), 14.7% avg (25 lang) | 25 European       |
| **Streaming ASR**         | Parakeet EOU 120M | 12x RTFx   | 4.9% WER (en)                      | English           |
| **Speaker Diarization**   | Pyannote CoreML   | 122x RTFx  | 15% DER (offline)                  | Language-agnostic |
| **Streaming Diarization** | Sortformer        | 127x RTFx  | 31.7% DER                          | Language-agnostic |
| **Voice Activity**        | Silero VAD v6     | 1230x RTFx | 96% accuracy                       | Language-agnostic |
| **Text-to-Speech**        | Kokoro 82M        | 23x RTFx   | 48 voices                          | English           |
| **Text-to-Speech**        | PocketTTS 155M    | Streaming  | \~80ms first audio                 | English           |

All benchmarks on M4 Pro. ASR on [LibriSpeech](https://huggingface.co/datasets/openslr/librispeech_asr) / [FLEURS](https://huggingface.co/datasets/google/fleurs), diarization on [VoxConverse](https://www.robots.ox.ac.uk/~vgg/data/voxconverse/) / [AMI](https://groups.inf.ed.ac.uk/ami/corpus/), VAD on [VOiCES](https://iqtlabs.github.io/voices/) / [MUSAN](https://www.openslr.org/17/). See [full benchmarks](/reference/benchmarks) for per-language breakdowns and device comparisons.

## When to Use Which

### Transcription

| Need                                  | Use                               | Why                                             |
| ------------------------------------- | --------------------------------- | ----------------------------------------------- |
| Transcribe recordings/files           | **Parakeet TDT v3**               | Fastest, 25 languages, 210x real-time           |
| English-only, best accuracy           | **Parakeet TDT v2**               | 2.1% WER vs 2.5% on LibriSpeech                 |
| Live captions as user speaks          | **Parakeet EOU**                  | 160ms chunks, end-of-utterance detection        |
| Domain-specific terms (names, jargon) | **TDT + CTC vocabulary boosting** | 99.3% precision, 85.2% recall on earnings calls |

### Speaker Diarization

| Need                           | Use                        | Why                                                  |
| ------------------------------ | -------------------------- | ---------------------------------------------------- |
| Best accuracy (post-recording) | **Offline pipeline** (VBx) | 15% DER, full pyannote-compatible pipeline           |
| Real-time "who's speaking now" | **Streaming pipeline**     | 26% DER at 5s chunks, speaker tracking across chunks |
| Simple 2-4 speaker meetings    | **Sortformer**             | Single model, no clustering, 32% DER                 |

### Voice Activity Detection

| Need                       | Use                      | Why                                           |
| -------------------------- | ------------------------ | --------------------------------------------- |
| Segment audio before ASR   | **Offline segmentation** | Clean segments with min/max duration controls |
| Real-time speech detection | **Streaming VAD**        | Per-chunk events with hysteresis              |

### Text-to-Speech

| Need                                 | Use           | Why                                         |
| ------------------------------------ | ------------- | ------------------------------------------- |
| Highest quality, full generation     | **Kokoro**    | 48 voices, SSML support, flow matching      |
| Streaming audio (start playing fast) | **PocketTTS** | \~80ms to first audio, no espeak dependency |

## Platform Support

| Platform                | Package                                                                                              |
| ----------------------- | ---------------------------------------------------------------------------------------------------- |
| **Swift (iOS / macOS)** | [FluidAudio](https://github.com/FluidInference/FluidAudio)                                           |
| **React Native / Expo** | [@fluidinference/react-native-fluidaudio](https://github.com/FluidInference/react-native-fluidaudio) |
| **Rust / Tauri**        | [fluidaudio-rs](https://github.com/FluidInference/fluidaudio-rs)                                     |

## Showcase

40+ apps use FluidAudio for local speech recognition, speaker diarization, and text-to-speech.

| App                                                                            | Description                                                                                                                                                                           |
| ------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **[Voice Ink](https://tryvoiceink.com/)**                                      | Local AI for instant, private transcription with near-perfect accuracy. Uses Parakeet ASR.                                                                                            |
| **[Spokenly](https://spokenly.app/)**                                          | Mac dictation app for fast, accurate voice-to-text; supports real-time dictation and file transcription. Uses Parakeet ASR and speaker diarization.                                   |
| **[Slipbox](https://slipbox.ai/)**                                             | Privacy-first meeting assistant for real-time conversation intelligence. Uses Parakeet ASR (iOS) and speaker diarization across platforms.                                            |
| **[Talat](https://talat.app)**                                                 | Privacy-focused AI meeting notes app. Featured in [TechCrunch](https://techcrunch.com/2026/03/24/talats-ai-meeting-notes-stay-on-your-machine-not-in-the-cloud/). Uses Parakeet ASR.  |
| **[Paraspeech](https://paraspeech.com)**                                       | AI powered voice to text. Fully offline. No subscriptions.                                                                                                                            |
| **[OpenOats](https://github.com/yazinsai/OpenOats)**                           | Open-source meeting note-taker that transcribes conversations in real time and surfaces relevant notes from your knowledge base.                                                      |
| **[Senko](https://github.com/narcotic-sh/senko)**                              | A very fast and accurate speaker diarization pipeline. A [good example](https://github.com/narcotic-sh/senko/commit/51dbd8bde764c3c6648dbbae57d6aff66c5ca15c) for Python integration. |
| **[macos-speech-server](https://github.com/dokterbob/macos-speech-server)**    | OpenAI compatible STT/transcription and TTS/speech API server.                                                                                                                        |
| **[Whisper Mate](https://whisper.marksdo.com)**                                | Transcribes movies and audio locally; records and transcribes in real time from speakers or system apps. Uses speaker diarization.                                                    |
| **[BoltAI](https://boltai.com/)**                                              | Write content 10x faster using parakeet models.                                                                                                                                       |
| **[Voxeoflow](https://www.voxeoflow.app)**                                     | Mac dictation app with real-time translation. Lightning-fast transcription in over 100 languages.                                                                                     |
| **[WhisKey](https://whiskey.asktobuild.app/)**                                 | Privacy-first voice dictation keyboard for iOS and macOS. On-device transcription with 12+ languages, AI meeting summaries, and mindmap generation.                                   |
| **[Summit AI Notes](https://summitnotes.app/)**                                | Local meeting transcription and summarization with speaker identification. Supports 100+ languages.                                                                                   |
| **[Snaply](https://snaply.ai)**                                                | Free, Fast, 100% local AI dictation for Mac.                                                                                                                                          |
| **[Enconvo](https://enconvo.com)**                                             | AI Agent Launcher for macOS with voice input, live captions, and text-to-speech.                                                                                                      |
| **[Speakmac](https://speakmac.app)**                                           | Mac app that lets you type anywhere on your Mac using your voice. Fully local, private dictation built on FluidAudio.                                                                 |
| **[Starling](https://github.com/Ryandonofrio3/Starling)**                      | Open Source, fully local voice-to-text transcription with auto-paste at your cursor.                                                                                                  |
| **[Altic/Fluid Voice](https://github.com/altic-dev/Fluid-oss)**                | Lightweight, fully free and Open Source Voice to Text dictation for macOS.                                                                                                            |
| **[SamScribe](https://github.com/Steven-Weng/SamScribe)**                      | Open-source macOS app that captures and transcribes audio from your microphone and meeting apps in real-time.                                                                         |
| **[Dictate Anywhere](https://github.com/hoomanaskari/mac-dictate-anywhere)**   | Native macOS dictation app with global Fn key activation. Dictate into any app with 25 language support.                                                                              |
| **[Hex](https://github.com/kitlangton/Hex)**                                   | macOS app that lets you press-and-hold a hotkey to record your voice, transcribe it, and paste into any application.                                                                  |
| **[Super Voice Assistant](https://github.com/ykdojo/super-voice-assistant)**   | Open-source macOS voice assistant with local transcription.                                                                                                                           |
| **[VoiceTypr](https://github.com/moinulmoin/voicetypr)**                       | Open-source voice-to-text dictation for macOS and Windows.                                                                                                                            |
| **[Ora](https://futurelab.studio/ora)**                                        | Local voice assistant for macOS with speech recognition and text-to-speech.                                                                                                           |
| **[Flowstay](https://flowstay.app)**                                           | Easy text-to-speech, local post-processing and Claude Code integration for macOS. Free forever.                                                                                       |
| **[Meeting Transcriber](https://github.com/pasrom/meeting-transcriber)**       | macOS menu bar app that auto-detects, records, and transcribes meetings with dual-track speaker diarization.                                                                          |
| **[Hitoku Draft](https://hitoku.me/draft)**                                    | A local, private, voice writing assistant on your macOS menu bar.                                                                                                                     |
| **[Audite](https://github.com/zachatrocity/audite)**                           | macOS menu-bar app that records meetings and transcribes them locally into Markdown notes for Obsidian.                                                                               |
| **[Muesli](https://github.com/pHequals7/muesli)**                              | Native macOS dictation and meeting transcription with \~0.13s latency. Automatic speaker diarization.                                                                                 |
| **[NanoVoice](https://apps.apple.com/kz/app/nanovoice/id6760539688)**          | Free iOS voice keyboard for fast, private dictation in any app.                                                                                                                       |
| **[MiniWhisper](https://github.com/andyhtran/MiniWhisper)**                    | Open-source macOS menu bar for quick local voice-to-text with minimal setup.                                                                                                          |
| **[Volocal](https://github.com/fikrikarim/volocal)**                           | Fully local voice AI on iOS. Uses streaming Parakeet EOU ASR and streaming PocketTTS.                                                                                                 |
| **[VivaDicta](https://github.com/n0an/VivaDicta)**                             | Open-source iOS voice-to-text app with system-wide AI voice keyboard. 15+ AI providers, 40+ AI presets.                                                                               |
| **[hongbomiao.com](https://github.com/hongbo-miao/hongbomiao.com)**            | A personal R\&D lab that facilitates knowledge sharing.                                                                                                                               |
| **[mac-whisper-speedtest](https://github.com/anvanvan/mac-whisper-speedtest)** | Comparison of different local ASR, including one of the first versions of FluidAudio's ASR models.                                                                                    |

## Requirements

* macOS 14+ / iOS 17+
* Swift 5.10+
* Apple Silicon recommended

## Model Conversion

All FluidAudio models are converted through [möbius](https://github.com/FluidInference/mobius), our open-source model conversion framework. It handles export, numerical validation, and quantization for CoreML and other edge runtimes. See the [möbius docs](/mobius/getting-started) to convert your own models.
