AI models catalogue¶

Every model file Sauti can load, by stage. Each row is verbatim from the per-stage manifest under ai-models/<stage>/manifest.json. Total bundled assets: ~1.6 GiB (with Qwen3-only; ~2.3 GiB once Gemma3 lands).

Source of truth

These tables mirror ai-models/<stage>/manifest.json. If you change a manifest, the docs should follow. The build pre-processor reads the manifest at build time to pick the platform-relevant subset.

STT — Speech-to-text¶

Stage: stt Manifest: ai-models/stt/manifest.json Runtime: asus4/onnxruntime-unity via Macoron/whisper.unity Language: English only (language = "en")

Whisper Small (flagship)¶

Targets: windows, macos, linux, ios, android_flagship. Lives under ai-models/stt/whisper-small/.

File	Size	SHA-256 (first 16 chars)	Format	Status
`encoder_model_quantized.onnx`	88 MB	`a43a83f3c5361cd5...`	ONNX INT8	ready
`decoder_model_merged_quantized.onnx`	149 MB	`ec07c3cbb64172c3...`	ONNX INT8	ready
`tokenizer.json`	2 MB	`27fc476bfe7f1729...`	Binary	ready
`config.json`	2 KB	`457854d452f17661...`	Binary	ready
`generation_config.json`	4 KB	`f538b28220c6a6d6...`	Binary	ready

Source: onnx-community/whisper-small — MIT licensed, license confirmed 2026-05-26.

Total Whisper Small: ~239 MB.

Whisper Tiny (Quest / low-end)¶

Targets: quest, android_lowend. Lives under ai-models/stt/whisper-tiny/.

File	Size	SHA-256 (first 16 chars)	Format	Status
`encoder_model_quantized.onnx`	10 MB	`2af4a414ca47aa30...`	ONNX INT8	ready
`decoder_model_merged_quantized.onnx`	29 MB	`25e807a962b63493...`	ONNX INT8	ready
`tokenizer.json`	2 MB	`27fc476bfe7f1729...`	Binary	ready
`config.json`	2 KB	`46aeea0a406afbeb...`	Binary	ready
`generation_config.json`	4 KB	`f5c67e5a4f7102f8...`	Binary	ready

Source: onnx-community/whisper-tiny — MIT licensed.

Total Whisper Tiny: ~43 MB.

The Whisper Tiny tokenizer is byte-identical to the Whisper Small tokenizer (sha256 matches) — all Whisper variants share one tokeniser.

LLM — Large language model¶

Stage: llm Manifest: ai-models/llm/manifest.json Runtime: undreamai/LLMUnity (wraps llama.cpp)

Qwen3-1.7B Q5_K_M (flagship)¶

Property	Value
File	`Qwen3-1.7B-Q5_K_M.gguf`
Display name	Qwen3 1.7B (GGUF Q5_K_M)
Format	GGUF Q5_K_M
Size	1.26 GB (1 257 880 128 bytes)
SHA-256	`b0949de5b2e06cbed6aa96517f9bd8afb334584b6f95ee83479292ff4bdd8ed3`
Source	`unsloth/Qwen3-1.7B-GGUF`
License	Apache-2.0 (confirmed 2026-05-26)
Targets	`windows`, `macos`, `linux`, `ios`, `android_flagship`
Honours `/no_think`?	Yes
Status	ready

Source remap

The original voice_ai_architecture.md spec pointed at Qwen/Qwen3-1.7B-GGUF, which only publishes the Q8_0 (1.83 GB) variant. Sauti remaps to unsloth/Qwen3-1.7B-GGUF which provides the spec's Q5_K_M variant at 1.20 GB. See the manifest notes field for the full rationale.

Gemma3-1B Q4_K_M (Quest / low-end) — deferred post-v1.2¶

Property	Value
File	`gemma3-1b-q4_k_m.gguf`
Display name	Gemma 3 1B Instruct (GGUF Q4_K_M)
Format	GGUF Q4_K_M
Size	0.72 GB (751 619 276 bytes) approx
SHA-256	`TODO_FILL_AFTER_DOWNLOAD`
Source	`google/gemma-3-1b-it-GGUF`
License	Gemma Terms of Use (non-SPDX)
Requires explicit acceptance?	Yes
Targets	`quest`, `android_lowend`
Honours `/no_think`?	No
Status	deferred

Deferred to post-v1.2

The Gemma Terms of Use require manual acceptance via Hugging Face login. The team chose simplicity-of-shipping over second-LLM-variety for v1.2 — Quest builds in v1.2 fall back to Qwen3-1.7B-Q5_K_M (1.26 GB, tight on Quest 3's 8 GB RAM but functional). Future v1.3+ re-activates this entry: accept terms, download with an HF token, fill sha256 + licenseConfirmedAt, flip status to ready.

Embeddings — RAG encoder¶

Stage: embeddings Manifest: ai-models/embeddings/manifest.json Runtime: asus4/onnxruntime-unity Used by: offline build (KnowledgeBaseChunker -> MiniLmRagEmbedder) and runtime query path. Same encoder for both is mandatory.

all-MiniLM-L6-v2 INT8¶

Property	Value
File	`model_int8.onnx`
Display name	all-MiniLM-L6-v2 (INT8)
Format	ONNX INT8
Size	22 MB (22 972 370 bytes)
SHA-256	`afdb6f1a0e45b715d0bb9b11772f032c399babd23bfc31fed1c170afc848bdb1`
Output dim	384
Source	`Xenova/all-MiniLM-L6-v2`
License	Apache-2.0 (confirmed 2026-05-26)
Targets	all platforms
Status	ready

WordPiece vocab¶

Property	Value
File	`vocab.txt`
Size	232 KB (231 508 bytes)
SHA-256	`07eced375cec144d27c900241f3e339478dec958f92fddbc551f295c992038a3`
Vocab size	30 522 tokens (standard `bert-base-uncased`)
Source	`Xenova/all-MiniLM-L6-v2`

Source remap

Original manifest pointed at optimum/all-MiniLM-L6-v2, which only ships FP32 model.onnx. Sauti remaps to Xenova/all-MiniLM-L6-v2 which provides onnx/model_int8.onnx. The vocab is byte-identical to the optimum copy.

TTS — Text-to-speech¶

Stage: tts Manifest: ai-models/tts/manifest.json Runtime: asus4/onnxruntime-unity via Sauti's own KokoroTtsRunner Source: all files from onnx-community/Kokoro-82M-ONNX — Apache-2.0.

Core model + tokenizer¶

File	Size	SHA-256 (first 16)	Notes
`model_quantized.onnx`	88 MB	`0d55b15d4b735d61...`	Kokoro 82M INT8. Sample rate 24 kHz.
`tokenizer.json`	5 KB	`ee301fc39cf903dd...`	177-entry IPA + ASCII-punct vocab. Pad token `"$"` has id 0.

Voices (×11)¶

Each .bin is raw float32 of shape (-1, 1, 256), 524 288 bytes (= 131 072 floats = 512 × 1 × 256). The leading dim is "max token length"; the runner indexes by len(tokens) to pick the row.

Voice id convention: first letter = accent (a = American, b = British), second letter = gender (f = female, m = male). See Voice IDs for the full table.

File	Display name	SHA-256 (first 16)
`voices/af.bin`	American Female (default blend)	`a4f11d9d055a12bf...`
`voices/af_bella.bin`	American Female, Bella	`38e12d4b9b31a751...`
`voices/af_nicole.bin`	American Female, Nicole	`f27666996f2d2277...`
`voices/af_sarah.bin`	American Female, Sarah	`fe4f8b49c272dc5e...`
`voices/af_sky.bin`	American Female, Sky	`f8017c8507ec6a55...`
`voices/am_adam.bin`	American Male, Adam	`6d5255a4b4803f59...`
`voices/am_michael.bin`	American Male, Michael	`9c3be118019ddb41...`
`voices/bf_emma.bin`	British Female, Emma	`fd71ce57d2d69ccb...`
`voices/bf_isabella.bin`	British Female, Isabella	`d3c6f2737d586f01...`
`voices/bm_george.bin`	British Male, George	`68736d5397fcbc46...`
`voices/bm_lewis.bin`	British Male, Lewis	`45b693a17544cc98...`

Total Kokoro footprint: ~88 MB model + 5 KB tokenizer + 11 × 512 KB voices = ~94 MB.

RAG — Knowledge base¶

Stage: rag Manifest: none (built artefact, not a downloaded file)

Property	Value
File	`knowledge.db`
Format	Sauti binary (magic `0x01474152`, format documented at `RagDatabaseBuilder.WriteDatabase`)
Location (source-of-truth)	`ai-models/rag/knowledge.db`
Location (runtime)	`Assets/StreamingAssets/VoiceAI/rag/knowledge.db`
Built by	Sauti -> Build Knowledge Base Editor menu (`RagDatabaseBuilder.BuildFromMenu`)
Input	All `.md` / `.txt` under `knowledge-base/` except `README.md`
Status	pending — build via Editor menu once `MiniLmRagEmbedder` model is in place

See Knowledge base authoring for the full build pipeline.

Per-platform shipping matrix¶

Which files end up in a given build:

Platform	STT	LLM	Embeddings	TTS	Total bundle
Windows / Linux	Whisper Small (239 MB)	Qwen3 (1.26 GB)	MiniLM (22 MB)	Kokoro + voices (94 MB)	~1.6 GiB
macOS / iOS	Whisper Small (239 MB)	Qwen3 (1.26 GB)	MiniLM (22 MB)	Kokoro + voices (94 MB)	~1.6 GiB
Android (flagship)	Whisper Small (239 MB)	Qwen3 (1.26 GB)	MiniLM (22 MB)	Kokoro + voices (94 MB)	~1.6 GiB
Android (low-end)	Whisper Tiny (43 MB)	Qwen3 (1.26 GB) ¹	MiniLM (22 MB)	Kokoro + voices (94 MB)	~1.4 GiB
Quest 2 / 3	Whisper Tiny (43 MB)	Qwen3 (1.26 GB) ¹	MiniLM (22 MB)	Kokoro + voices (94 MB)	~1.4 GiB

How to verify a model on disk¶

shasum -a 256 ai-models/llm/Qwen3-1.7B-Q5_K_M.gguf
# Expected: b0949de5b2e06cbed6aa96517f9bd8afb334584b6f95ee83479292ff4bdd8ed3

The Editor build pre-processor (planned, tracked as BUILD-001) will perform this verification before copying into StreamingAssets/. Mismatches abort the build with a clear error.

Adding a new model¶

See Contributing — Adding a model.

When Gemma3-1B Q4_K_M is re-introduced post-v1.2 (728 MB), Quest and Android low-end builds will drop to ~870 MiB total. ↩↩