repositories
loading repo index
repositories
loading repo index
repository
loading code, commits, and activity
public Clawd ADK gateway launch mirror
stars
latest
clone command
git clone gitlawb://did:key:z6Mkq5mY...iFZ5/my-project-publ...git clone gitlawb://did:key:z6Mkq5mY.../my-project-publ...2fa351d6docs: add automaton and perps launch sources15d ago| #1 | # Hermes Auxiliary LLM Integration For Clawd Memory |
| #2 | |
| #3 | When Clawd Memory runs through the Mnemosyne-compatible Hermes memory provider, it can optionally route its |
| #4 | LLM-backed memory operations — both consolidation (sleep) **and** structured |
| #5 | fact extraction — through Hermes' authenticated auxiliary client. This lets a |
| #6 | Hermes user reuse their configured provider (including OAuth-backed providers |
| #7 | such as `openai-codex`) without giving Mnemosyne its own credentials. |
| #8 | |
| #9 | ## Why |
| #10 | |
| #11 | Mnemosyne's standalone LLM path expects an OpenAI-compatible URL plus an API |
| #12 | key (`MNEMOSYNE_LLM_BASE_URL` / `MNEMOSYNE_LLM_API_KEY`). That cannot reach |
| #13 | OAuth/session-backed providers like ChatGPT/Codex. Hermes already authenticates |
| #14 | those providers through `agent.auxiliary_client.call_llm(task="compression", ...)`, |
| #15 | so the cleanest fix is for Mnemosyne to delegate to that helper when it is |
| #16 | running inside Hermes — without dragging Hermes' auth into Mnemosyne core. |
| #17 | |
| #18 | ## Behavior |
| #19 | |
| #20 | The host backend is **disabled by default** to preserve existing standalone |
| #21 | behavior after upgrading. To opt in: |
| #22 | |
| #23 | ```bash |
| #24 | export MNEMOSYNE_HOST_LLM_ENABLED=true |
| #25 | ``` |
| #26 | |
| #27 | When enabled and a host backend is registered (which happens automatically |
| #28 | when Mnemosyne is loaded as a Hermes memory provider): |
| #29 | |
| #30 | ```text |
| #31 | 0. Host backend (Hermes auxiliary client). |
| #32 | - On success: return the host text. |
| #33 | - On failure (errors, empty response, no extractable content): |
| #34 | skip MNEMOSYNE_LLM_BASE_URL entirely. Fall to the local GGUF path, |
| #35 | then return None / []. |
| #36 | 1. Remote OpenAI-compatible API (only if MNEMOSYNE_LLM_BASE_URL is set |
| #37 | AND MNEMOSYNE_HOST_LLM_ENABLED is unset/false). |
| #38 | 2. Local llama-cpp-python / ctransformers GGUF (TinyLlama by default). |
| #39 | 3. Return None (consolidation) or [] (extraction) — caller falls back to |
| #40 | the existing non-LLM path. |
| #41 | ``` |
| #42 | |
| #43 | The "skip remote on host failure" rule prevents Mnemosyne from accidentally |
| #44 | routing memory content to a stale `MNEMOSYNE_LLM_BASE_URL` the user forgot |
| #45 | to clear after switching to Hermes. |
| #46 | |
| #47 | When `HOST_LLM_ENABLED=true` but no backend is registered (e.g., the env var |
| #48 | is set in a non-Hermes process), Mnemosyne treats the host as "not attempted" |
| #49 | and proceeds with the existing remote/local fallback chain. |
| #50 | |
| #51 | ## Configuration |
| #52 | |
| #53 | ```bash |
| #54 | # Required: opt in to the host backend. |
| #55 | MNEMOSYNE_HOST_LLM_ENABLED=true |
| #56 | |
| #57 | # Optional: override the host default compression provider/model for |
| #58 | # Clawd Memory engine calls. Leave unset to inherit Hermes' auxiliary.compression |
| #59 | # resolution. These are NOT credentials — Hermes still owns auth, OAuth |
| #60 | # refresh, and transport. |
| #61 | MNEMOSYNE_HOST_LLM_PROVIDER=openai-codex |
| #62 | MNEMOSYNE_HOST_LLM_MODEL=gpt-5.1-mini |
| #63 | |
| #64 | # Optional: prompt context budget when the host is the chosen path. |
| #65 | # Default 32000. The existing MNEMOSYNE_LLM_N_CTX (default 2048) is |
| #66 | # calibrated for TinyLlama and is far too small for typical Codex/GPT |
| #67 | # context windows — using it as the host budget produces wastefully many |
| #68 | # small chunks and lossy multi-chunk summaries. |
| #69 | MNEMOSYNE_HOST_LLM_N_CTX=32000 |
| #70 | |
| #71 | # Existing global gate. When false, ALL LLM-backed memory operations |
| #72 | # are disabled, including the host path. |
| #73 | MNEMOSYNE_LLM_ENABLED=true |
| #74 | ``` |
| #75 | |
| #76 | To control the default host model without Mnemosyne-specific overrides, |
| #77 | configure Hermes itself: |
| #78 | |
| #79 | ```yaml |
| #80 | # ~/.hermes/config.yaml |
| #81 | auxiliary: |
| #82 | compression: |
| #83 | provider: auto # default; uses main provider/model first |
| #84 | model: "" # empty inherits Hermes behavior |
| #85 | timeout: 15 # per attempt; Hermes may retry internally |
| #86 | ``` |
| #87 | |
| #88 | The `timeout` value is **per-attempt**. Hermes can retry internally for |
| #89 | auth refresh, payment fallback, or provider fallback, so the total |
| #90 | wall-clock can exceed the configured timeout on cold start. |
| #91 | |
| #92 | ## Codex/ChatGPT subscriptions |
| #93 | |
| #94 | For OAuth-backed providers like `openai-codex`, **do not** point |
| #95 | `MNEMOSYNE_LLM_BASE_URL` at `https://chatgpt.com/backend-api/codex`. That |
| #96 | endpoint is not an OpenAI-compatible API-key endpoint; the host backend is |
| #97 | the right path. Configure the provider through your normal Hermes login |
| #98 | (`hermes login` / `hermes config`) and let Mnemosyne route through Hermes. |
| #99 | |
| #100 | ## Fact-extraction determinism |
| #101 | |
| #102 | Fact extraction uses `temperature=0.0` so re-ingesting the same content |
| #103 | produces the same facts. This avoids near-duplicate writes to the facts |
| #104 | table when the same conversation is processed twice. Consolidation continues |
| #105 | to use `temperature=0.3` — paraphrasing variance is acceptable there. |
| #106 | |
| #107 | ## Session shutdown |
| #108 | |
| #109 | Mnemosyne's `on_session_end()` hook runs sleep/consolidation in a daemon |
| #110 | thread with a 15-second join timeout. If consolidation cannot finish in time |
| #111 | (e.g., a slow host LLM call), the join returns and Hermes shutdown proceeds |
| #112 | unblocked; the daemon thread continues in the background and is reaped when |
| #113 | the process exits. A warning is logged when the timeout fires: |
| #114 | |
| #115 | ```text |
| #116 | WARNING Mnemosyne session-end sleep timed out after 15s — consolidation deferred |
| #117 | ``` |
| #118 | |
| #119 | This protects Hermes from getting stuck on a slow LLM provider during |
| #120 | session shutdown without losing the chance for consolidation to complete on |
| #121 | faster paths. |
| #122 | |
| #123 | ## Standalone (non-Hermes) use |
| #124 | |
| #125 | Standalone Mnemosyne is unaffected. The host backend is opt-in, never imports |
| #126 | Hermes at module load, and the existing |
| #127 | `MNEMOSYNE_LLM_BASE_URL`/`MNEMOSYNE_LLM_API_KEY`/`MNEMOSYNE_LLM_MODEL` and |
| #128 | local GGUF paths continue to work exactly as before when |
| #129 | `MNEMOSYNE_HOST_LLM_ENABLED` is unset or false. |
| #130 | |
| #131 | ## For other agents |
| #132 | |
| #133 | Any host that wants to expose its authenticated LLM to Mnemosyne can register |
| #134 | its own backend through the same tiny interface: |
| #135 | |
| #136 | ```python |
| #137 | from mnemosyne.core.llm_backends import LLMBackend, set_host_llm_backend |
| #138 | |
| #139 | class MyAgentBackend: |
| #140 | name = "my-agent" |
| #141 | |
| #142 | def complete(self, prompt, *, max_tokens, temperature, timeout, |
| #143 | provider=None, model=None): |
| #144 | # Route through your own authenticated client and return text-or-None. |
| #145 | ... |
| #146 | |
| #147 | set_host_llm_backend(MyAgentBackend()) |
| #148 | ``` |
| #149 | |
| #150 | This mirrors the pattern Hermes uses today and avoids per-agent forks of |
| #151 | Mnemosyne core. |
| #152 |