Why möbius
Running AI on NVIDIA GPUs is straightforward. The edge is a different story — fragmented devices, different accelerators, format incompatibilities. möbius handles the conversion, validation, and quantization so you get production-ready models with a few commands. Each conversion includes:- Parity validation — numerical comparison between PyTorch and converted outputs
- Latency benchmarks — Torch CPU vs CoreML (ANE/GPU) on real inputs
- Quantization sweeps — size, speed, and quality trade-offs for int8, palettization, etc.
Repository Structure
Models are organized by class, name, and target runtime. Each target directory is self-contained with its ownpyproject.toml and dependencies managed by uv.
Converted Models
These models have been converted and published to Hugging Face:| Class | Model | Source | CoreML |
|---|---|---|---|
| STT | Parakeet TDT v3 0.6B | NVIDIA | FluidInference |
| STT | Parakeet TDT v2 0.6B | NVIDIA | FluidInference |
| STT | Parakeet EOU 120M | NVIDIA | FluidInference |
| VAD | Silero VAD v6 | Silero | FluidInference |
| Diarization | Pyannote 3.1 | Pyannote | FluidInference |
| TTS | Kokoro 82M | Hexgrad | FluidInference |
| TTS | PocketTTS 155M | Kyutai | FluidInference |
| Embedding | CAM++ | 3D-Speaker | FluidInference |
Quick Start
Conversion Guidelines
- Trace with
.CpuOnly— ensures deterministic tracing without ANE/GPU side effects - Target iOS 17+ / macOS 14+ — minimum deployment target for all CoreML exports
- Use
uv— each model has isolated dependencies via its ownpyproject.toml - Validate numerically — always compare converted outputs against PyTorch reference