Documentation

Last updated in git: 2026-06-11

Shared AI runtime crate

The embedded LLM that powers the admin SSH prompt mode has been factored out of backend-ai-ssh into its own shared crate, everlock-ai-runtime.

What changed

  • A new crates/ai-runtime crate owns the heavy resource: the embedded GGUF model bytes, the llama.cpp worker thread, and the small AiTool and ImageInferenceProvider traits that backends consume.
  • backend-ai-ssh is now a thin wrapper that handles the SSH-facing auth check, injects the search_docs tool and the handbook-grounded system prompt, and delegates everything else to the runtime.
  • backend-image-http no longer carries the (stale) gguf-runner dependency. It looks up dyn ImageInferenceProvider in the backend registry and reuses the same already-loaded runtime for image captioning — no second model load.
  • everlock-core no longer exposes any AI-related trait. The ImageInferenceProvider trait lives with the resource it abstracts.

Why this matters

mail-storage set the precedent: when more than one backend depends on the same heavy resource, the resource gets its own crate and the backends become thin consumers. AI was already two consumers in disguise (the admin shell and the image backend), but the model and worker thread lived inside one of the backends. Promoting the runtime to its own crate makes that shape explicit and keeps core free of capability traits.

The qwen3 Cargo feature still selects between SmolVLM-256M (default, ~250 MB) and Qwen3.5-2B (~1.9 GB). It now lives on ai-runtime directly; the workspace and backend-ai-ssh forward to it.

Where to read

updates ai architecture