Documentation
Shared AI runtime crate
The embedded LLM that powers the admin SSH prompt mode has been factored out of backend-ai-ssh into its own shared crate, everlock-ai-runtime.
What changed
- A new
crates/ai-runtimecrate owns the heavy resource: the embedded GGUF model bytes, thellama.cppworker thread, and the smallAiToolandImageInferenceProvidertraits that backends consume. backend-ai-sshis now a thin wrapper that handles the SSH-facing auth check, injects thesearch_docstool and the handbook-grounded system prompt, and delegates everything else to the runtime.backend-image-httpno longer carries the (stale)gguf-runnerdependency. It looks updyn ImageInferenceProviderin the backend registry and reuses the same already-loaded runtime for image captioning — no second model load.everlock-coreno longer exposes any AI-related trait. TheImageInferenceProvidertrait lives with the resource it abstracts.
Why this matters
mail-storage set the precedent: when more than one backend depends on the same heavy resource, the resource gets its own crate and the backends become thin consumers. AI was already two consumers in disguise (the admin shell and the image backend), but the model and worker thread lived inside one of the backends. Promoting the runtime to its own crate makes that shape explicit and keeps core free of capability traits.
The qwen3 Cargo feature still selects between SmolVLM-256M (default, ~250 MB) and Qwen3.5-2B (~1.9 GB). It now lives on ai-runtime directly; the workspace and backend-ai-ssh forward to it.