Per-platform notes¶
Sauti targets a wide range of platforms — desktop, mobile, and standalone VR. The pipeline shape is identical on every target. What changes is the model variant, the GPU acceleration backend, and a handful of permission / file-system quirks.
This page summarises everything a designer needs to know to budget for, and to test on, each target.
Per-platform model selection¶
Verbatim from memory/voice_ai_architecture.md § 6:
| Platform | STT | LLM | Embeddings | TTS |
|---|---|---|---|---|
| PC (Windows / Linux) | Whisper Small | Qwen3-1.7B Q5_K_M | MiniLM | Kokoro |
| Mac (Apple Silicon) | Whisper Small | Qwen3-1.7B Q5_K_M | MiniLM | Kokoro |
| iOS / visionOS | Whisper Small | Qwen3-1.7B Q5_K_M | MiniLM | Kokoro |
| Android (flagship) | Whisper Small | Qwen3-1.7B Q5_K_M | MiniLM | Kokoro |
| Quest 2 / 3 | Whisper Tiny | Qwen3-1.7B Q5_K_M 1 | MiniLM | Kokoro |
| Android (low-end) | Whisper Tiny | Qwen3-1.7B Q5_K_M 1 | MiniLM | Kokoro |
The build pre-processor (planned, tracked as BUILD-001) reads the per-stage manifests under ai-models/*/manifest.json and copies only the platform-tagged subset into StreamingAssets/VoiceAI/. A Quest build must not ship Whisper Small (250 MB wasted) — the build will refuse to include it.
GPU acceleration matrix¶
| Platform | STT (ONNX) | Embeddings (ONNX) | LLM (GGUF / llama.cpp) | TTS (ONNX) |
|---|---|---|---|---|
| Windows | DirectML / CUDA | DirectML | Vulkan | DirectML |
| Mac / iOS | CoreML | CoreML | Metal | CoreML |
| Android | NNAPI | NNAPI | CPU (ARM NEON) | NNAPI |
| Quest | CPU | CPU | CPU | CPU |
All runtimes auto-detect and fall back to CPU silently. No manual configuration. A designer never has to think about which execution provider runs — the runtime negotiates it.
Quest 3 RAM tightness¶
The Quest 3 has 8 GB of physical RAM. The Sauti runtime adds up roughly as:
| Component | Memory |
|---|---|
| Android OS + system services | ~1.5 GB |
| Unity baseline (rendered scene, asset cache) | ~1.5 GB |
| Qwen3-1.7B Q5_K_M (mmapped GGUF) | ~1.2 GB |
| Whisper Tiny INT8 | ~40 MB |
| MiniLM INT8 | ~22 MB |
| Kokoro INT8 + voices | ~95 MB |
| llama.cpp KV-cache + scratch | ~500 MB |
| ONNX Runtime sessions | ~200 MB |
That leaves ~3 GB of headroom for the rest of your game, which is tight if you also run a complex 3D scene with high-resolution textures.
Why this is the v1.2 reality¶
The spec's intended Quest LLM is Gemma3-1B Q4_K_M at ~720 MB — about 500 MB lighter than Qwen3-1.7B. Gemma was deferred from v1.2 because the Gemma Terms of Use require manual click-through acceptance and a Hugging Face authentication step that doesn't fit the automated download flow.
v1.2 Quest builds ship Qwen3-1.7B and live with the tighter budget. Post-v1.2 releases can reintroduce Gemma by flipping its manifest entry from status: deferred to status: ready (see the ai-models/llm/manifest.json notes field for the steps).
Quest budget tips¶
- Keep textures aggressive. ASTC 6×6 or 8×8 for environment art. A single 4K texture eats 64 MB.
- One audio voice at a time. Kokoro inference is single-threaded; queueing 3 NPCs to talk simultaneously won't speed anything up and will blow the audio buffer.
- Avoid loading the LLM on the main thread at scene start. Sauti scaffolds use lazy initialisation (see
KokoroTtsRunner.EnsureInitialised) — defer until the player presses talk for the first time. - Profile early. Use
adb shell dumpsys meminfo <your.package.name>to watch the resident set. If it climbs past 4 GB, audit your scene before blaming Sauti.
Quest latency¶
The end-to-end target on Quest from the spec is 3–5 seconds from "user stops speaking" to "first word played". The breakdown:
| Stage | Quest 3 budget |
|---|---|
| Whisper Tiny STT | ≤500 ms |
| RAG retrieval (MiniLM + ANN) | ≤100 ms |
| Qwen3 LLM streaming (first sentence) | 1–3 s |
| Kokoro TTS (first sentence) | ≤500 ms |
| Total to first audio | 2–4 s |
If you see longer, the most likely culprit is thermal throttling — Quest 3 throttles after sustained CPU load. Watch the device temperature; force-quit and let it cool between test runs.
Microphone permissions¶
Voice input requires platform-specific permission handling.
Windows / macOS / Linux desktop¶
- macOS prompts on first
Microphone.Start()call. Accept the dialog. The permission is per-app; if a tester denies once, they need to re-enable in System Settings -> Privacy & Security -> Microphone. - Windows 10+ has a global "Allow apps to access your microphone" setting. Unity respects it; if your build doesn't capture audio, check
Settings -> Privacy -> Microphone. - Linux has no central permission gate —
Microphone.devicesreturns devices ALSA / PulseAudio can see.
iOS / visionOS¶
Two things must be in place:
NSMicrophoneUsageDescriptioninInfo.plist. Project Settings -> Player -> iOS -> Other Settings -> Microphone Usage Description. Write a one-sentence reason the player will see in the permission dialog (e.g. "Sauti uses your voice to power conversations with in-game characters. Audio never leaves your device.").- Background audio capability if you want the mic to keep capturing when the app loses focus. Most games don't need this.
Android¶
- Add
<uses-permission android:name="android.permission.RECORD_AUDIO" />to the Android manifest. Project Settings -> Player -> Android -> Other Settings -> Microphone -> tick. - Android 6+ requires runtime permission. Unity's
Application.RequestUserAuthorization(UserAuthorization.Microphone)handles the prompt; call it before the firstMicrophone.Start(). - Some OEM-customised Androids (Xiaomi, Vivo) require an additional in-settings toggle. Document this in your support page if your audience skews to non-stock builds.
Quest 2 / 3¶
The microphone is built into the headset and treated by Android as a regular mic.
- Add
RECORD_AUDIOto the manifest as for Android above. - The first launch triggers a system-level Permissions dialog inside the headset. The player must accept in VR.
- You cannot enumerate microphone devices on Quest in a useful way. Always pass an empty string to
Microphone.Start(null, ...); the system picks the headset mic.
File-system access¶
Models live under StreamingAssets/VoiceAI/ at build time. Read-path quirks:
| Platform | How to read | Note |
|---|---|---|
| Windows / macOS / Linux | Direct File.OpenRead(Application.streamingAssetsPath + ...) |
Works as expected. |
| iOS / visionOS | Direct File.OpenRead(...) |
StreamingAssets ships unpacked in the bundle. |
| Android | First-launch copy required. | StreamingAssets is inside a compressed .jar and cannot be mmapped. Copy each model to Application.persistentDataPath/VoiceAI/... on first run, then load from there. |
| Quest | Same as Android. | Quest is an Android variant; the same copy-on-first-launch rule applies. |
Sauti's runtime is expected to handle the Android copy-on-first-launch transparently — this is part of the planned BUILD-001 packaging work tracked in memory/todo.md.
Disk-budget summary¶
| Platform | Total bundled models | Notes |
|---|---|---|
| PC / Mac / iOS / Android flagship | ~1.6 GiB | Whisper Small + Qwen3 + MiniLM + Kokoro |
| Quest 2 / 3 / Android low-end | ~1.4 GiB (v1.2) | Whisper Tiny instead of Small |
| Quest with Gemma3 (post-v1.2) | ~870 MiB | Whisper Tiny + Gemma3-1B + MiniLM + Kokoro |
A flagship-Android player downloading your game gets ~1.6 GB of AI models alongside whatever your game itself ships. Plan your store-page warning accordingly.
See AI models catalogue for the full per-file breakdown.
Network requirements¶
None. Sauti runs fully offline. The runtime never makes a network request. The only time the network is touched in the whole project is the Editor download step that fetches model files into ai-models/ on the developer machine — that step never runs on a player device.
This is a hard architectural constraint, not a recommendation. The runtime has no HTTP client, no credentials, no API endpoint configured. A privacy review can confirm this by grep-ing the runtime source for http, Uri, or WebRequest — there are no hits in Assets/Sauti/Runtime/.
Testing matrix (recommendation)¶
| Platform | Test on physical hardware | Worth-testing edge cases |
|---|---|---|
| Windows | Required | DirectML on a GTX-1660-class GPU; CPU fallback |
| macOS | Required | Apple Silicon (CoreML); Intel Mac if you support them |
| iOS | Required | iPhone 12 / 13 (mid-range Apple GPU); a 6-year-old device for floor |
| Android | Required | One Snapdragon 8-gen flagship, one mid-range Helio / Exynos |
| Quest 3 | Required | Sustained-load thermal throttle test (>10 min play session) |
| Quest 2 | Recommended | RAM headroom is tighter than Quest 3 — Whisper Tiny + Qwen3 may stutter |
The simulator / emulator targets (iOS Simulator, Android Emulator) cannot run Sauti meaningfully because ONNX Runtime's mobile execution providers don't surface in the simulator. Always test on hardware.
Cross-references¶
- The canonical per-platform table:
memory/voice_ai_architecture.md § 6. - GPU acceleration matrix: Architecture - GPU acceleration.
- AI models catalogue: Reference - Models.
- VR-specific scaffold: Experiment 06 — VR Quest NPC.
-
v1.2 Quest LLM falls back to Qwen3-1.7B; Gemma3-1B Q4_K_M is the spec's intended Quest pick but deferred — see the Quest RAM section below. ↩↩