Documentation Index
Fetch the complete documentation index at: https://docs.fluidinference.com/llms.txt
Use this file to discover all available pages before exploring further.
When to Use
- Post-recording analysis (meetings, interviews) — Use the Offline pipeline. 15% DER, 122x real-time.
- Real-time “who’s speaking now” — Use Streaming diarization. 26% DER at 5s chunks. Only use when you critically need real-time labels — offline is more accurate and still very fast.
- Simple 2-4 speaker conversations — Consider Sortformer. Single model, no clustering, 32% DER. Better in noisy environments but limited to 4 speakers max — does not work well with 5+ people or heavy crosstalk.
Quick Start
import FluidAudio
let models = try await DiarizerModels.downloadIfNeeded()
let diarizer = DiarizerManager()
diarizer.initialize(models: models)
let samples = try AudioConverter().resampleAudioFile(
URL(fileURLWithPath: "meeting.wav")
)
let result = try diarizer.performCompleteDiarization(samples)
for segment in result.segments {
print("Speaker \(segment.speakerId): \(segment.startTimeSeconds)s - \(segment.endTimeSeconds)s")
}
Configuration
let config = DiarizerConfig(
clusteringThreshold: 0.7, // Speaker separation (0.0-1.0)
minSpeechDuration: 1.0, // Minimum segment duration (seconds)
minSilenceGap: 0.5, // Minimum silence between speakers
minActiveFramesCount: 10.0, // Minimum active frames
debugMode: false
)
let diarizer = DiarizerManager(config: config)
Known Speaker Recognition
Pre-load speaker profiles for identification:
let aliceAudio = loadAudioFile("alice_sample.wav")
let aliceEmbedding = try diarizer.extractEmbedding(aliceAudio)
let alice = Speaker(id: "Alice", name: "Alice", currentEmbedding: aliceEmbedding)
let bob = Speaker(id: "Bob", name: "Bob", currentEmbedding: bobEmbedding)
diarizer.speakerManager.initializeKnownSpeakers([alice, bob])
// Will use "Alice" instead of "Speaker_1" when matched
let result = try diarizer.performCompleteDiarization(audioSamples)
Manual Model Loading
Stage Core ML bundles for offline deployment:
let basePath = "/opt/models/speaker-diarization-coreml"
let segmentation = URL(fileURLWithPath: basePath)
.appendingPathComponent("pyannote_segmentation.mlmodelc")
let embedding = URL(fileURLWithPath: basePath)
.appendingPathComponent("wespeaker_v2.mlmodelc")
let models = try await DiarizerModels.load(
localSegmentationModel: segmentation,
localEmbeddingModel: embedding
)
Benchmarks
VoxConverse (232 clips, multi-speaker conversations):
| Pipeline | Audio Length | DER | RTFx |
|---|
| Offline (default) | 10s windows | 15.1% | 122x |
| Offline (max accuracy) | 10s windows | 13.9% | 65x |
| Streaming | 5s chunks | 26.2% | 223x |
| Sortformer | 30.4s chunks | 31.7% | 127x |
Device comparison (offline pipeline, default config):
| Device | RTFx |
|---|
| M2 MacBook Air | 150x |
| M1 iPad Pro | 120x |
| iPhone 14 Pro | 80x |
CLI
swift run fluidaudio process meeting.wav --output results.json --threshold 0.6
swift run fluidaudio diarization-benchmark --auto-download