Sauti¶

Native Unity voice-AI plugin. Fully offline. English. Privacy-first. Mic → Whisper → memory + RAG → Qwen3 GGUF → Kokoro → audio. One package. Zero cloud.

Sauti ("voice" in Swahili) lets a Unity game or VR experience hold a real spoken conversation with an AI character — entirely on the player's device, with no API keys, no cloud bill, and no audio ever leaving the headset.

What it does¶

🎤 Mic  →  Whisper ONNX  →  text  →  Memory (history + RAG + temp KV)  →  Qwen3 GGUF  →  tokens  →  Kokoro ONNX  →  🔊 Audio
            STT                          Three-layer enriched prompt           LLM                           TTS

Speech in — Whisper Small (flagship) / Tiny (Quest), ONNX INT8, English fixed.
Three-layer memory — conversation history + temporary key/value facts + RAG over a knowledge base you author.
LLM brain — Qwen3-1.7B Q5_K_M GGUF via llama.cpp, with /no_think voice mode.
Voice out — Kokoro 82M ONNX with 11 built-in voices at 24 kHz.
Drop-in for Unity 6+ — four UPM packages, one Editor menu, six runnable experiments.
Two parallel APIs (v1.3+) — pure-C# for programmers, drag-and-drop MonoBehaviour + ScriptableObject components for designers. Same runtime, choose either.

Two paths through these docs¶

For game designers

No code. Pick a JSON template, edit the placeholders, drop it on an NPC.

→ Designer guide
For Unity developers

Inject your own backends, extend the memory layers, ship your own experiment.

→ Developer guide

Architecture in one diagram¶

┌──────────────────────────────────────────────────────────────────┐
│                       Sauti voice-AI pipeline                     │
│                                                                   │
│  ┌──────────┐  ┌─────────────────┐  ┌─────────┐  ┌────────────┐  │
│  │ Whisper  │→ │ Three-Layer     │→ │ Qwen3   │→ │ Kokoro     │  │
│  │ STT ONNX │  │ Memory:         │  │ GGUF    │  │ TTS ONNX   │  │
│  │          │  │ • L1 history    │  │         │  │            │  │
│  │          │  │ • L2 KV facts   │  │         │  │            │  │
│  │          │  │ • L3 RAG        │  │         │  │            │  │
│  └──────────┘  └─────────────────┘  └─────────┘  └────────────┘  │
│       │                                                  │        │
│       └────────────────  String only  ──────────────────┘        │
│                                                                   │
│  ┌───────────────────────────────┐ ┌─────────────────────────┐  │
│  │ ONNX Runtime                  │ │ llama.cpp (LLMUnity)    │  │
│  │ (asus4/onnxruntime-unity)     │ │ (undreamai/LLMUnity)    │  │
│  │ STT • Embeddings • TTS        │ │ LLM only                │  │
│  └───────────────────────────────┘ └─────────────────────────┘  │
│  ── no shared memory · no shared GPU context · strings only ──   │
└──────────────────────────────────────────────────────────────────┘

Two strictly-partitioned runtimes (ONNX Runtime + llama.cpp). They share no memory and no GPU context — only C# strings flow across the boundary.

Full architecture →

Quick install¶

git clone https://github.com/SeedeXR/sauti-unity-plugin.git
cd sauti-unity-plugin
# Then: Unity Hub → Add Project → select this folder

Unity will fetch four UPM dependencies on first open. Set two scripting-define symbols, build the knowledge.db, open an experiment scene, press Play.

Full installation walkthrough → · 5-minute quickstart →

Privacy & offline-first¶

No internet at runtime. Models are read from disk; nothing phones home.
No telemetry. No analytics. No third-party trackers.
No model downloads after install. Everything ships in Assets/StreamingAssets/VoiceAI/.
User audio stays on device. Whisper runs locally; transcripts never leave.
Per-session memory clears on app exit. The RAG knowledge base is read-only.

Platform support¶

Platform	STT	LLM	Embeddings	TTS
Windows / macOS / Linux	Whisper Small	Qwen3-1.7B Q5_K_M	MiniLM	Kokoro
iOS / Android (flagship)	Whisper Small	Qwen3-1.7B Q5_K_M	MiniLM	Kokoro
Meta Quest 2 / 3	Whisper Tiny	Qwen3-1.7B Q5_K_M*	MiniLM	Kokoro
Android (low-end)	Whisper Tiny	Qwen3-1.7B Q5_K_M*	MiniLM	Kokoro

* v1.2 Quest path uses Qwen3-1.7B (1.26 GB; tight on Quest 3's 8 GB RAM but functional). Gemma3-1B was the original Quest pick — deferred to a future release. See per-platform notes.

Project status¶

Surface	State
Compile (Unity 6.4)	0 errors, 0 warnings
EditMode tests	38 / 38 pass
Knowledge.db build	End-to-end against real MiniLM weights
Six experiment scaffolds	Code + READMEs
Six `.unity` scene files	Manual Editor GUI work
Quest hardware validation	Needs physical device

See SHIP_READINESS.md for the step-by-step go-live guide.