Developer guide — overview¶
Sauti is composed of small, dependency-injectable subsystems. None of them is large. The mental model:
Each pipeline stage is a separate runner. They are wired together at the application level, not entangled inside a god-object. You can swap any one of them out by implementing one interface.
This page is the entry point for code-first integration. If you have not yet built and run the plugin, start with Installation and Quickstart.
The seams Sauti exposes¶
+--------------------------+
| Your game code |
+--------------------------+
|
v
+-------------------------------+
| Sauti.Memory (Runtime) |
| - TemporaryMemory | Pure C# static class.
| - SautiRag (façade) |
| - ISautiRagBackend <--seam--+| Inject any backend.
| ||
| -> LlmUnityRagBackend || Ships by default.
+-------------------------------+
|
v
+-------------------------------+
| Sauti.Tts (Runtime) |
| - KokoroTtsRunner | Hand-authored ORT runner.
| - EnglishG2P | Pure-C# G2P fallback.
+-------------------------------+
^
|
+-------------------------------+
| Sauti.Editor.Rag (Editor) |
| - KnowledgeBaseChunker | Pure C#.
| - IRagEmbedder <----seam----+| Inject any embedder.
| ||
| -> MiniLmRagEmbedder || Ships by default.
| - WordPieceTokenizer ||
| - RagDatabaseBuilder | MenuItem entry point.
+-------------------------------+
The two explicit injection points:
ISautiRagBackend—SautiRagwraps any backend that satisfies the interface. Default:LlmUnityRagBackend(delegates to LLMUnity'sDBSearch-backed RAG MonoBehaviour). Swap to fake out in tests, or to plug in a custom on-disk vector store.IRagEmbedder—RagDatabaseBuilder.BuildAsyncaccepts any embedder. Default:MiniLmRagEmbedder(raw ONNX Runtime + WordPiece). Swap to use a smaller/faster encoder, or to wire a hosted embedding service for offline-build-only scenarios.
Beyond those two, Sauti also lets you assemble your own prompt — the BuildPrompt method in experiments/05-full-voice-loop/FullVoiceLoop.cs is a reference shape, not a runtime requirement. See Extending Sauti for all three extension paths.
Public namespaces¶
| Namespace | Assembly | What's in it |
|---|---|---|
Sauti.Memory |
Sauti.Runtime |
TemporaryMemory, ISautiRagBackend, SautiRag, LlmUnityRagBackend. The Layer 2 / Layer 3 memory surface. |
Sauti.Tts |
Sauti.Runtime |
KokoroTtsRunner, EnglishG2P. The TTS pipeline. |
Sauti.Editor.Rag |
Sauti.Editor |
KnowledgeBaseChunker, IRagEmbedder, MiniLmRagEmbedder, WordPieceTokenizer, RagDatabaseBuilder. The offline build pipeline. Editor-only. |
Sauti.Experiments.* |
per-experiment | The reference MonoBehaviours under experiments/. Not part of the runtime API; treat as worked examples. |
Each namespace is small (1–5 public types) and has a single concern. There is no Sauti.Everything god namespace by design.
What Sauti does not ship¶
To set expectations:
| Concern | Status | What you do |
|---|---|---|
| LLM inference | Delegated to undreamai/LLMUnity (wraps llama.cpp). |
Install the package. Sauti's LlmUnityRagBackend plumbs it. |
| STT inference | Delegated to Macoron/whisper.unity (wraps Whisper ONNX via asus4/onnxruntime-unity). |
Install the package. Sauti experiments use it directly. |
| RAG vector store | Delegated to LLMUnity's DBSearch (usearch ANN). |
Provided by the package. |
| Voice Activity Detection | Out of scope. Push-to-talk only. | If you need VAD, vendor your own (Silero VAD is a good choice). |
| Multi-language support | Out of scope for v1.x. | English only. Whisper language is fixed to "en". |
| Cloud LLMs | Out of scope. | Sauti is offline-first by design. |
| User-data persistence | Session-scoped only — TemporaryMemory and LLMAgent.chat clear on app exit. |
If you need persistent memory, persist TemporaryMemory entries yourself. |
The four pipeline stages — runner-by-runner¶
STT — Whisper.WhisperManager¶
Lives in the whisper.unity package. Inspector-friendly MonoBehaviour facade over the Whisper ONNX session.
- Inject with
gameObject.AddComponent<WhisperManager>(). - Configure
ModelPath,IsModelPathInStreamingAssets,language = "en". - Call
await manager.InitModel()once at startup. - Per turn:
WhisperResult res = await manager.GetTextAsync(audioClip);then readres.Result.
See experiments/02-stt-loopback/WhisperLoopback.cs for the verified wiring.
Memory — Sauti.Memory.*¶
Three layers; see the dedicated Memory layers page. The relevant runtime calls are:
- Layer 1 —
LLMUnity.LLMAgent.chat(you manage; Sauti hard-cap helper trims to 20 messages). - Layer 2 —
Sauti.Memory.TemporaryMemory.Set/Clear/BuildPromptBlock(static, pure C#). - Layer 3 —
Sauti.Memory.SautiRag(constructor-injected backend,LoadAsync+SearchAsync).
LLM — LLMUnity.LLM + LLMUnity.LLMAgent¶
Lives in the LLMUnity package. The LLM MonoBehaviour boots llama.cpp; the LLMAgent MonoBehaviour is the chat facade.
LLM.SetModel(path)+await llm.WaitUntilReady().llmAgent.llm = llm; llmAgent.systemPrompt = ...; await llmAgent.Chat(prompt, OnCumulative, OnComplete, addToHistory: true);.
Important: the first Chat callback receives the cumulative assembled response, not a per-token delta. See Architecture — Streaming.
TTS — Sauti.Tts.KokoroTtsRunner¶
Hand-authored against raw Microsoft.ML.OnnxRuntime.InferenceSession. Self-contained — no LLMUnity / whisper.unity dependency.
- Construct with
new KokoroTtsRunner(modelPath, tokenizerPath, voicesDirectoryPath). (Lazy init — the ONNX session and voice scan happen on first synth call.) float[] pcm = await runner.SynthesizeAsync(sentence, voiceId);. PCM is 24 kHz mono, in[-1, 1].- Wrap in an
AudioClipwithAudioClip.SetData(pcm, 0)and play.
See experiments/01-tts-hello/KokoroHello.cs.
The canonical orchestration¶
If you want a worked, tested example of all four stages composed, read experiments/05-full-voice-loop/FullVoiceLoop.cs. It is ~300 lines of MonoBehaviour that wires:
- Microphone capture (
UnityEngine.Microphone). - Whisper transcription (
WhisperManager.GetTextAsync). - RAG retrieval (
SautiRag.SearchAsync). - § 4.5 prompt assembly.
- LLMUnity streaming chat (
LLMAgent.Chatwith cumulative-text callback). - Sentence-boundary detection (the
OnCumulativecursor pattern). OnSpeechReadyevent ready for a Kokoro hook.
It deliberately avoids depending on the EXP-002/03/04 MonoBehaviours — it reuses patterns, not classes — so you can copy it into your own project as a starting point.
Assembly definitions¶
Sauti is split into three asmdefs:
Sauti.Runtime— runtime code. Built for every player target. Depends on the LLMUnity asmdef (gated bySAUTI_LLMUNITY_AVAILABLE) andMicrosoft.ML.OnnxRuntime(transitively via the asus4 package).Sauti.Editor— Editor-only code. Includes the chunker, embedder, tokeniser, and the[MenuItem]builder.Sauti.Tests.Editor— NUnit tests (30 tests across the four test files).
SAUTI_LLMUNITY_AVAILABLE is the symbol that gates LLMUnity-dependent code in Sauti.Runtime. Define it in Project Settings -> Player -> Scripting Define Symbols (or via your asmdef's defineConstraints) once you have wired LLMUnity into the asmdef's references list.
The same gate convention exists for SAUTI_WHISPER_UNITY_AVAILABLE — define it once whisper.unity is added.
Testing¶
The test suite under Assets/Sauti/Tests/Editor/ covers:
TemporaryMemoryTests.cs— 5 tests on Layer 2 semantics.SautiRagTests.cs— 7 tests on the façade, usingFakeRagBackendto avoid pulling LLMUnity into the test assembly.RagDatabaseBuilderTests.cs— 10 tests on the chunker / writer round-trip.WordPieceTokenizerTests.cs— 8 tests on tokenisation edge cases (UNK, truncation, padding).
The pattern for swapping backends in tests:
var fake = new FakeRagBackend { NextSearchResult = (new[] { "chunk-A" }, new[] { 0.9f }) };
var rag = new SautiRag(fake);
await rag.LoadAsync(someFakePath);
(string[] chunks, float[] scores) = await rag.SearchAsync("query", 1);
Assert.AreEqual("chunk-A", chunks[0]);
See Extending Sauti for the full unit-test pattern for custom backends.
Where to go next¶
-
Architecture deep dive
The hybrid-runtime invariant, the three-layer memory, the asset flow, the per-platform table.
-
Memory layers
Layer 1 (history), Layer 2 (temp KV), Layer 3 (RAG) — and how
BuildPromptcombines them. -
Extending Sauti
Three extension points — backend, embedder, prompt assembler — with worked stubs.
-
API reference
Every public type, with line-number links to source.