Workflow
Step 1: Export
Each model directory contains a conversion script (e.g.,convert-parakeet.py, convert-coreml.py). The script:
- Loads the original PyTorch / NeMo / ONNX model
- Traces or scripts the model with fixed input shapes
- Converts to CoreML using
coremltools - Saves
.mlpackagefiles
Fixed Input Shapes
CoreML requires static shapes at export time. Each model defines its input contract:| Model | Input Shape | Duration |
|---|---|---|
| Parakeet TDT v3 | 240,000 samples | 15s at 16kHz |
| Parakeet EOU | 5,120 samples | 320ms at 16kHz |
| Silero VAD | 576 samples | 36ms at 16kHz |
| Silero VAD (256ms) | 4,160 samples | 256ms at 16kHz |
| Kokoro (5s variant) | Variable tokens | ~5s output |
| Kokoro (15s variant) | Variable tokens | ~15s output |
Step 2: Validate
Parity scripts run the PyTorch and CoreML models side-by-side on identical inputs, comparing outputs numerically and measuring latency.- Numerical diff — max absolute error, max relative error, match/no-match per component
- Latency comparison — Torch CPU vs CoreML (CPU+ANE) with speedup ratios
- Plots — visual comparisons saved to
plots/directory - metadata.json — structured results for CI and reporting
Example Parity Results (Parakeet TDT v3)
| Component | Max Abs Error | Match | Torch CPU | CoreML ANE | Speedup |
|---|---|---|---|---|---|
| Encoder | 0.005 | Yes | 1030ms | 25ms | 40x |
| Preprocessor | 0.484 | Yes | 2.0ms | 1.2ms | 1.7x |
| Decoder | tolerance | Yes | 7.5ms | 4.3ms | 1.7x |
| Joint | 0.099 | Yes | 28ms | 23ms | 1.3x |
Step 3: Quantize (Optional)
Quantization reduces model size and can improve latency on ANE. The sweep evaluates multiple strategies and reports the trade-offs.Quantization Strategies
| Strategy | Size Reduction | Quality Impact | Best For |
|---|---|---|---|
| INT8 per-channel | ~2x smaller | Minimal loss | General deployment |
| INT8 per-tensor | ~2x smaller | Significant loss on large models | Small models only |
| 6-bit palettization | ~2.5x smaller | Varies by model | Size-constrained devices |
quantization_summary.json with per-component quality scores (1.0 = identical to baseline).
Common CoreML Modifications
PyTorch models often need modifications for CoreML tracing. Common patterns:| PyTorch Feature | CoreML Fix |
|---|---|
pack_padded_sequence | Explicit LSTM states + masking |
| Dynamic shapes / loops | Fixed shapes, broadcasting |
| In-place operations | Pure functional transforms |
| Random generation | Deterministic inputs passed externally |
| Complex number ops | Real/imaginary split |
Adding a New Model
- Create the directory:
models/{class}/{name}/coreml/ - Add
pyproject.tomlwith dependencies - Write
convert-*.py— export script - Write
compare-*.py— validation script (optional but recommended) - Add
README.mddocumenting the conversion - Push converted weights to Hugging Face
Deployment Targets
- Minimum: iOS 17 / macOS 14
- Format: MLProgram (
.mlpackagefor development,.mlmodelcfor compiled) - Compute units: Models traced with
CPU_ONLYfor determinism; runtime compute units set when loading (.cpuAndNeuralEngine,.cpuAndGPU,.all)