> ## Documentation Index
> Fetch the complete documentation index at: https://docs.fluidinference.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Offline Pipeline

> Full VBx batch diarization with pyannote-compatible pipeline.

## Overview

`OfflineDiarizerManager` provides the full pyannote CoreML pipeline (powerset segmentation + VBx clustering) for highest accuracy offline diarization. Based on [pyannote/speaker-diarization-community-1](https://huggingface.co/pyannote/speaker-diarization-community-1).

Requires macOS 14 / iOS 17 or later.

## Quick Start

```swift theme={null}
import FluidAudio

let config = OfflineDiarizerConfig()
let manager = OfflineDiarizerManager(config: config)
try await manager.prepareModels()

let samples = try AudioConverter().resampleAudioFile(path: "meeting.wav")
let result = try await manager.process(audio: samples)

for segment in result.segments {
    print("\(segment.speakerId) \(segment.startTimeSeconds)s - \(segment.endTimeSeconds)s")
}
```

### File-Based API

For large files, use memory-mapped streaming:

```swift theme={null}
let url = URL(fileURLWithPath: "meeting.wav")
let result = try await manager.process(url)
```

## Pipeline Stages

1. **Segmentation** — 10s/160k sample chunks through Core ML segmentation (589 frame-level log probabilities)
2. **Binarization** — Log probabilities to soft VAD weights
3. **Weight Interpolation** — `scipy.ndimage.zoom`-compatible half-pixel mapping
4. **Embedding Extraction** — FBANK + embedding backend, L2-normalized 256-d embeddings
5. **VBx Clustering** — AHC warm start + PLDA + iterative VBx refinement
6. **Timeline Reconstruction** — Timestamps with minimum gap/duration constraints

## Configuration

`OfflineDiarizerConfig` groups knobs by pipeline stage:

* `segmentation` — Window length (10s), step ratio, min on/off durations
* `embedding` — Batch size, overlap handling
* `clustering` — VBx warm-start threshold, Fa/Fb priors
* `vbx` — Max iterations, convergence tolerance
* `postProcessing` — Minimum gap duration
* `export` — Optional `embeddingsPath` for JSON dump

## Benchmarks

[VoxConverse](https://www.robots.ox.ac.uk/~vgg/data/voxconverse/) (232 clips, multi-speaker conversations). Segmentation uses 10s windows:

| Config                                         | Audio Length | DER   | JER   | RTFx |
| ---------------------------------------------- | ------------ | ----- | ----- | ---- |
| Step ratio 0.2, min duration 1.0s (default)    | 10s windows  | 15.1% | 39.4% | 122x |
| Step ratio 0.1, min duration 0s (max accuracy) | 10s windows  | 13.9% | 42.8% | 65x  |

Default is \~2x faster for only \~1.2% worse DER. Use step ratio 0.1 for critical accuracy.

Reference: pyannote community-1 on CPU is 1.5-2x RTFx, on MPS is 20-25x RTFx. FluidAudio on ANE is 65-122x RTFx.

## CLI

```bash theme={null}
# Process a single file
swift run fluidaudio process meeting.wav --mode offline --threshold 0.6

# Benchmark on AMI dataset
swift run fluidaudio diarization-benchmark --mode offline \
  --dataset ami-sdm --threshold 0.6 --auto-download

# With ground-truth RTTM
swift run fluidaudio process meeting.wav --mode offline \
  --rttm ground_truth.rttm
```
