> ## Documentation Index
> Fetch the complete documentation index at: https://docs.fluidinference.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Diarization Getting Started

> Speaker diarization — identify who spoke when in audio.

## When to Use

* **Post-recording analysis** (meetings, interviews) — Use the [Offline pipeline](/diarization/offline-pipeline). 15% DER, 122x real-time.
* **Real-time "who's speaking now"** — Use [Streaming diarization](/diarization/streaming). 26% DER at 5s chunks. Only use when you critically need real-time labels — offline is more accurate and still very fast.
* **Simple 2-4 speaker conversations** — Consider [Sortformer](/diarization/sortformer). Single model, no clustering, 32% DER. Better in noisy environments but limited to 4 speakers max — does not work well with 5+ people or heavy crosstalk.

## Quick Start

```swift theme={null}
import FluidAudio

let models = try await DiarizerModels.downloadIfNeeded()
let diarizer = DiarizerManager()
diarizer.initialize(models: models)

let samples = try AudioConverter().resampleAudioFile(
    URL(fileURLWithPath: "meeting.wav")
)
let result = try diarizer.performCompleteDiarization(samples)

for segment in result.segments {
    print("Speaker \(segment.speakerId): \(segment.startTimeSeconds)s - \(segment.endTimeSeconds)s")
}
```

## Configuration

```swift theme={null}
let config = DiarizerConfig(
    clusteringThreshold: 0.7,      // Speaker separation (0.0-1.0)
    minSpeechDuration: 1.0,         // Minimum segment duration (seconds)
    minSilenceGap: 0.5,             // Minimum silence between speakers
    minActiveFramesCount: 10.0,     // Minimum active frames
    debugMode: false
)
let diarizer = DiarizerManager(config: config)
```

## Known Speaker Recognition

Pre-load speaker profiles for identification:

```swift theme={null}
let aliceAudio = loadAudioFile("alice_sample.wav")
let aliceEmbedding = try diarizer.extractEmbedding(aliceAudio)

let alice = Speaker(id: "Alice", name: "Alice", currentEmbedding: aliceEmbedding)
let bob = Speaker(id: "Bob", name: "Bob", currentEmbedding: bobEmbedding)
diarizer.speakerManager.initializeKnownSpeakers([alice, bob])

// Will use "Alice" instead of "Speaker_1" when matched
let result = try diarizer.performCompleteDiarization(audioSamples)
```

## Manual Model Loading

Stage Core ML bundles for offline deployment:

```swift theme={null}
let basePath = "/opt/models/speaker-diarization-coreml"
let segmentation = URL(fileURLWithPath: basePath)
    .appendingPathComponent("pyannote_segmentation.mlmodelc")
let embedding = URL(fileURLWithPath: basePath)
    .appendingPathComponent("wespeaker_v2.mlmodelc")

let models = try await DiarizerModels.load(
    localSegmentationModel: segmentation,
    localEmbeddingModel: embedding
)
```

## Benchmarks

[VoxConverse](https://www.robots.ox.ac.uk/~vgg/data/voxconverse/) (232 clips, multi-speaker conversations):

| Pipeline               | Audio Length | DER   | RTFx |
| ---------------------- | ------------ | ----- | ---- |
| Offline (default)      | 10s windows  | 15.1% | 122x |
| Offline (max accuracy) | 10s windows  | 13.9% | 65x  |
| Streaming              | 5s chunks    | 26.2% | 223x |
| Sortformer             | 30.4s chunks | 31.7% | 127x |

Device comparison (offline pipeline, default config):

| Device         | RTFx |
| -------------- | ---- |
| M2 MacBook Air | 150x |
| M1 iPad Pro    | 120x |
| iPhone 14 Pro  | 80x  |

## CLI

```bash theme={null}
swift run fluidaudio process meeting.wav --output results.json --threshold 0.6
swift run fluidaudio diarization-benchmark --auto-download
```
