my-project-public

repository

loading code, commits, and activity

repositories

loading repo index

my-project-public — gitlawb

#1	# Hermes Auxiliary LLM Integration For Clawd Memory
#2
#3	When Clawd Memory runs through the Mnemosyne-compatible Hermes memory provider, it can optionally route its
#4	LLM-backed memory operations — both consolidation (sleep) and structured
#5	fact extraction — through Hermes' authenticated auxiliary client. This lets a
#6	Hermes user reuse their configured provider (including OAuth-backed providers
#7	such as `openai-codex`) without giving Mnemosyne its own credentials.
#8
#9	## Why
#10
#11	Mnemosyne's standalone LLM path expects an OpenAI-compatible URL plus an API
#12	key (`MNEMOSYNE_LLM_BASE_URL` / `MNEMOSYNE_LLM_API_KEY`). That cannot reach
#13	OAuth/session-backed providers like ChatGPT/Codex. Hermes already authenticates
#14	those providers through `agent.auxiliary_client.call_llm(task="compression", ...)`,
#15	so the cleanest fix is for Mnemosyne to delegate to that helper when it is
#16	running inside Hermes — without dragging Hermes' auth into Mnemosyne core.
#17
#18	## Behavior
#19
#20	The host backend is disabled by default to preserve existing standalone
#21	behavior after upgrading. To opt in:
#22
#23	```bash
#24	export MNEMOSYNE_HOST_LLM_ENABLED=true
#25	```
#26
#27	When enabled and a host backend is registered (which happens automatically
#28	when Mnemosyne is loaded as a Hermes memory provider):
#29
#30	```text
#31	0. Host backend (Hermes auxiliary client).
#32	- On success: return the host text.
#33	- On failure (errors, empty response, no extractable content):
#34	skip MNEMOSYNE_LLM_BASE_URL entirely. Fall to the local GGUF path,
#35	then return None / [].
#36	1. Remote OpenAI-compatible API (only if MNEMOSYNE_LLM_BASE_URL is set
#37	AND MNEMOSYNE_HOST_LLM_ENABLED is unset/false).
#38	2. Local llama-cpp-python / ctransformers GGUF (TinyLlama by default).
#39	3. Return None (consolidation) or [] (extraction) — caller falls back to
#40	the existing non-LLM path.
#41	```
#42
#43	The "skip remote on host failure" rule prevents Mnemosyne from accidentally
#44	routing memory content to a stale `MNEMOSYNE_LLM_BASE_URL` the user forgot
#45	to clear after switching to Hermes.
#46
#47	When `HOST_LLM_ENABLED=true` but no backend is registered (e.g., the env var
#48	is set in a non-Hermes process), Mnemosyne treats the host as "not attempted"
#49	and proceeds with the existing remote/local fallback chain.
#50
#51	## Configuration
#52
#53	```bash
#54	# Required: opt in to the host backend.
#55	MNEMOSYNE_HOST_LLM_ENABLED=true
#56
#57	# Optional: override the host default compression provider/model for
#58	# Clawd Memory engine calls. Leave unset to inherit Hermes' auxiliary.compression
#59	# resolution. These are NOT credentials — Hermes still owns auth, OAuth
#60	# refresh, and transport.
#61	MNEMOSYNE_HOST_LLM_PROVIDER=openai-codex
#62	MNEMOSYNE_HOST_LLM_MODEL=gpt-5.1-mini
#63
#64	# Optional: prompt context budget when the host is the chosen path.
#65	# Default 32000. The existing MNEMOSYNE_LLM_N_CTX (default 2048) is
#66	# calibrated for TinyLlama and is far too small for typical Codex/GPT
#67	# context windows — using it as the host budget produces wastefully many
#68	# small chunks and lossy multi-chunk summaries.
#69	MNEMOSYNE_HOST_LLM_N_CTX=32000
#70
#71	# Existing global gate. When false, ALL LLM-backed memory operations
#72	# are disabled, including the host path.
#73	MNEMOSYNE_LLM_ENABLED=true
#74	```
#75
#76	To control the default host model without Mnemosyne-specific overrides,
#77	configure Hermes itself:
#78
#79	```yaml
#80	# ~/.hermes/config.yaml
#81	auxiliary:
#82	compression:
#83	provider: auto # default; uses main provider/model first
#84	model: "" # empty inherits Hermes behavior
#85	timeout: 15 # per attempt; Hermes may retry internally
#86	```
#87
#88	The `timeout` value is per-attempt. Hermes can retry internally for
#89	auth refresh, payment fallback, or provider fallback, so the total
#90	wall-clock can exceed the configured timeout on cold start.
#91
#92	## Codex/ChatGPT subscriptions
#93
#94	For OAuth-backed providers like `openai-codex`, do not point
#95	`MNEMOSYNE_LLM_BASE_URL` at `https://chatgpt.com/backend-api/codex`. That
#96	endpoint is not an OpenAI-compatible API-key endpoint; the host backend is
#97	the right path. Configure the provider through your normal Hermes login
#98	(`hermes login` / `hermes config`) and let Mnemosyne route through Hermes.
#99
#100	## Fact-extraction determinism
#101
#102	Fact extraction uses `temperature=0.0` so re-ingesting the same content
#103	produces the same facts. This avoids near-duplicate writes to the facts
#104	table when the same conversation is processed twice. Consolidation continues
#105	to use `temperature=0.3` — paraphrasing variance is acceptable there.
#106
#107	## Session shutdown
#108
#109	Mnemosyne's `on_session_end()` hook runs sleep/consolidation in a daemon
#110	thread with a 15-second join timeout. If consolidation cannot finish in time
#111	(e.g., a slow host LLM call), the join returns and Hermes shutdown proceeds
#112	unblocked; the daemon thread continues in the background and is reaped when
#113	the process exits. A warning is logged when the timeout fires:
#114
#115	```text
#116	WARNING Mnemosyne session-end sleep timed out after 15s — consolidation deferred
#117	```
#118
#119	This protects Hermes from getting stuck on a slow LLM provider during
#120	session shutdown without losing the chance for consolidation to complete on
#121	faster paths.
#122
#123	## Standalone (non-Hermes) use
#124
#125	Standalone Mnemosyne is unaffected. The host backend is opt-in, never imports
#126	Hermes at module load, and the existing
#127	`MNEMOSYNE_LLM_BASE_URL`/`MNEMOSYNE_LLM_API_KEY`/`MNEMOSYNE_LLM_MODEL` and
#128	local GGUF paths continue to work exactly as before when
#129	`MNEMOSYNE_HOST_LLM_ENABLED` is unset or false.
#130
#131	## For other agents
#132
#133	Any host that wants to expose its authenticated LLM to Mnemosyne can register
#134	its own backend through the same tiny interface:
#135
#136	```python
#137	from mnemosyne.core.llm_backends import LLMBackend, set_host_llm_backend
#138
#139	class MyAgentBackend:
#140	name = "my-agent"
#141
#142	def complete(self, prompt, *, max_tokens, temperature, timeout,
#143	provider=None, model=None):
#144	# Route through your own authenticated client and return text-or-None.
#145	...
#146
#147	set_host_llm_backend(MyAgentBackend())
#148	```
#149
#150	This mirrors the pattern Hermes uses today and avoids per-agent forks of
#151	Mnemosyne core.
#152

z6Mkq5mY3JWtxoxUobWcfNHm7AkRubgSWEZTkBVqZXJviFZ5/my-project-public