repositories
loading repo index
repositories
loading repo index
repository
loading code, commits, and activity
Mirrored from https://github.com/yingqi-z20/Agent-libOS
stars
latest
clone command
git clone gitlawb://did:key:z6MkqRzA...RfoM/yingqi-z20-Agen...git clone gitlawb://did:key:z6MkqRzA.../yingqi-z20-Agen...d98dd2c9IPC1d ago| #1 | # Agent libOS |
| #2 | |
| #3 | An experimental Agent-native libOS runtime written in Python. |
| #4 | |
| #5 | Agent libOS models an agent as a long-running, schedulable, interruptible, capability-controlled `AgentProcess`, not as a single chat request or workflow thread. The codebase is an MVP implementation of the ideas in [agent_libos_design_doc.md](agent_libos_design_doc.md). |
| #6 | |
| #7 | This project is still in active development. |
| #8 | |
| #9 | ## Current MVP |
| #10 | |
| #11 | ### Runtime |
| #12 | |
| #13 | - Agent process lifecycle: `spawn`, `fork`, `exec`, `wait`, `signal`, `pause`, `resume`, `exit`. |
| #14 | - Async process supervisor: `Runtime.arun_until_idle()` automatically keeps runnable processes moving. |
| #15 | - Child process tools can fork workers, spawn fresh children, wait/join, list direct children, signal direct children, and merge child memory. |
| #16 | - Each process gets its own default Object Memory namespace at spawn/fork time. Bare Object Memory names resolve inside that process namespace. |
| #17 | - Each process has its own workspace-relative working directory. Relative filesystem paths and shell subprocess cwd resolve from that process cwd; the runtime host process does not `chdir` into launched workspaces. |
| #18 | - Each process has a durable message queue for IPC. Messages carry `kind`, `channel`, `correlation_id`, `reply_to`, subject/body, and structured payload; receivers can read, acknowledge, or block on selective filters. |
| #19 | - Human queue integration is part of the runtime supervisor by default. If a primitive blocks on human approval, the process enters `WAITING_HUMAN`; the runtime processes human terminal messages, wakes the process, and resumes the pending action. |
| #20 | - Child waits are also resumable: `wait_child_process` puts the parent in `WAITING_EVENT`, child exit wakes the parent, and the original wait action resumes without asking the model for a new action. |
| #21 | - Single-step APIs remain available for tests and debugging: `run_next_process_once()` / `arun_next_process_once()` do not drain the human queue. |
| #22 | - Agent images configure process-visible tool tables at process creation time. |
| #23 | - Event bus and audit trace cover process, process messages, object memory, capabilities, tools, human requests, checkpoints, and primitive access. |
| #24 | - SQLite stores process/object metadata, process messages, full LLM call records, events, audit records, capabilities, human requests, tools, candidates, and checkpoints. |
| #25 | - LibOS primitives use an injectable Resource Provider Substrate. The default substrate is local host OS backed, but filesystem, clock/sleep, shell, and human terminal I/O providers can be replaced without changing tool schemas or capability checks. |
| #26 | |
| #27 | ### Object Memory |
| #28 | |
| #29 | - Typed Object Memory with handles, namespace-local names, namespace directories, links, views, materialized context, snapshots, and merge scaffolding. |
| #30 | - The default namespace is process-private: process `proc_abc` resolves bare names inside `process:proc_abc`, similar to how an OS process sees its own virtual address space by default. |
| #31 | - Names are unique only inside a namespace. The same local name can exist independently in two process namespaces or in an explicit shared namespace. |
| #32 | - Explicit namespaces are directory-like scopes created with `create_memory_namespace` and inspected with `list_memory_namespace`. |
| #33 | - Namespace capabilities gate listing and name resolution. Object capabilities still gate reading, writing, linking, materializing, deleting, and granting object access. |
| #34 | - A name is not itself a capability: resolving `namespace/name` requires namespace read authority and object read authority. |
| #35 | - Object payloads live in runtime memory, not SQLite. SQLite stores directory metadata and a runtime-memory marker only. |
| #36 | - Process-owned memory is released on process exit unless retained as the process result. |
| #37 | - File/Object bridge tools can move file content into and out of Object Memory without returning the concrete content to the process-visible tool result. |
| #38 | |
| #39 | ### Tools And Primitives |
| #40 | |
| #41 | LLM-facing tools are stable wrappers over libOS primitives. They are similar to libc calls: ergonomic and model-facing, but not the security boundary. |
| #42 | |
| #43 | Built-in tools currently include: |
| #44 | |
| #45 | - `append_memory_object` |
| #46 | - `ask_human` |
| #47 | - `create_memory_namespace` |
| #48 | - `create_memory_object` |
| #49 | - `create_object_from_file` |
| #50 | - `delete_directory` |
| #51 | - `delete_file` |
| #52 | - `exec_process` |
| #53 | - `fork_child_process` |
| #54 | - `get_current_time` |
| #55 | - `get_working_directory` |
| #56 | - `human_output` |
| #57 | - `load_image_from_yaml` |
| #58 | - `list_child_processes` |
| #59 | - `list_memory_namespace` |
| #60 | - `merge_child_memory` |
| #61 | - `parse_pytest_log` |
| #62 | - `process_exit` |
| #63 | - `propose_jit_tool` |
| #64 | - `read_directory` |
| #65 | - `read_memory_object` |
| #66 | - `read_process_messages` |
| #67 | - `receive_process_messages` |
| #68 | - `read_text_file` |
| #69 | - `register_jit_tool` |
| #70 | - `request_permission` |
| #71 | - `run_shell_command` |
| #72 | - `send_process_message` |
| #73 | - `set_working_directory` |
| #74 | - `signal_child_process` |
| #75 | - `sleep` |
| #76 | - `spawn_child_process` |
| #77 | - `validate_jit_tool` |
| #78 | - `wait_child_process` |
| #79 | - `write_directory` |
| #80 | - `write_object_to_file` |
| #81 | - `write_text_file` |
| #82 | - `echo` |
| #83 | |
| #84 | Important boundary rules: |
| #85 | |
| #86 | - A process can call only tools in its process tool table. |
| #87 | - Tool call visibility is not an external-resource grant. |
| #88 | - Bare Object Memory names resolve in the caller's process namespace; shared memory requires an explicit namespace plus namespace/object capabilities. |
| #89 | - Relative filesystem paths and shell commands resolve from the caller's process working directory, which is independent for each `AgentProcess`. |
| #90 | - Filesystem read/write/delete checks happen in the filesystem primitive. |
| #91 | - Human output, human questions, and human approval checks happen in the HumanObject primitive; concrete terminal reads/writes happen only through the substrate `HumanProvider`. |
| #92 | - Shell execution checks happen in the shell primitive. The model-facing tool accepts argv arrays only; it never accepts shell command strings for implicit parsing. |
| #93 | - Image registration checks happen in the image registry primitive. `load_image_from_yaml` only reads a workspace YAML file and passes the parsed manifest to that primitive. |
| #94 | - `ask_human` creates a blocking HumanObject question and returns the answer only after the human queue responds. |
| #95 | - Clock `sleep` is async, so one sleeping process does not block other runnable processes. |
| #96 | - Agent-authored JIT tools are Deno/TypeScript modules. They export `run(args, libos)` and can reach libOS only through `await libos.syscall(name, args)`. |
| #97 | - JIT syscalls do not consult the caller's LLM-facing tool table. They are authorized by pid, primitive-level capabilities, permission policy, human approval, and audit. |
| #98 | - The Deno subprocess is launched with `--no-prompt` and no read/write/net/env/run/ffi host permissions. Static imports are limited to configured `jsr:` packages, with a small `@std/*` allowlist by default. |
| #99 | - Human approval is part of a syscall. TypeScript sees either the final syscall payload or a final syscall error; it never sees a pending/retry protocol state. |
| #100 | - `process.exit` and `process.exec` are ordinary syscalls from the TypeScript side. The runtime applies the resulting lifecycle change only after the JIT tool returns its normal tool result. |
| #101 | |
| #102 | ### Permissions And Human Queue |
| #103 | |
| #104 | Permission requests are ordinary process actions mediated by the human queue: |
| #105 | |
| #106 | - `request_permission` asks the human to choose a policy for a resource/right pair. |
| #107 | - The human can choose `always_allow`, `always_deny`, or `ask_each_time`. |
| #108 | - With `ask_each_time`, the relevant primitive creates a per-use human approval request when the operation is attempted. |
| #109 | - Per-use approval grants a one-shot capability that is consumed after one successful primitive call. |
| #110 | - Filesystem capabilities can target exact files such as `filesystem:workspace:README.md`, directory subtrees such as `filesystem:workspace:agent_outputs/*`, or the whole workspace. |
| #111 | - Shell capabilities are process-scoped policies over `shell:*`. The built-in policy levels are `always_deny`, `allowlist_auto_else_ask`, `blocklist_ask_else_auto`, and `always_allow`; `always_allow` is intentionally marked high-risk. |
| #112 | - Shell allow/block lists match tokenized argv, not substrings, globs, or shell-expanded strings. Allow-list rules are exact by default, bare executable names do not match path-qualified executables, and block-list checks also scan nested executable-looking argv tokens such as `bash` or `powershell`. |
| #113 | - Runtime helpers can grant file/directory allow lists separately for read, write, and delete operations. |
| #114 | - Child processes inherit no external-resource capability by default; `fork_child_process` and `spawn_child_process` can explicitly inherit selected file, directory, or resource capabilities that the parent already holds. |
| #115 | - `fork_child_process` attenuates a selected parent MemoryView into the child. `spawn_child_process` creates a fresh direct child with a new process namespace and a goal-only MemoryView. |
| #116 | - `exec_process` replaces the current process image and tool table without changing pid. It never grants the target image's required capabilities automatically; capabilities are preserved only when explicitly requested, otherwise external capabilities are shrunk. |
| #117 | - Image registration requires `write` on `image:<image_id>` or a wildcard such as `image:*`. The YAML loader also requires filesystem read authority for the manifest path. |
| #118 | - Ordinary human questions use the same queue: a process waiting on `ask_human` stays in `WAITING_HUMAN` until the terminal queue supplies an answer. |
| #119 | - Rejection does not crash the runtime; the process resumes and can report why it could not complete. |
| #120 | - Approval context includes path, resource, overwrite risk, byte count, SHA-256, target state, and a `repr()`-escaped content preview. |
| #121 | |
| #122 | ### LLM Execution |
| #123 | |
| #124 | - OpenAI-compatible LLM client using `.env` configuration. |
| #125 | - OpenAI tool-call schemas generated from the current process tool table. |
| #126 | - The runtime executes the selected legal tool call for each quantum. |
| #127 | - Free-form model text is allowed, but only tool calls or fallback JSON actions have side effects. |
| #128 | - Malformed tool calls with missing function names are rejected; when possible the executor gives the model one repair attempt with the exact visible tool names. |
| #129 | - Model calls run off the event loop, and tool dispatch has async support. |
| #130 | - Each process LLM context is stored as a mutable Object Memory object named `llm_context:<pid>`. The runtime appends new process facts, events, capability snapshots, and object summaries to the end of this object so repeated prompt prefixes remain stable for prompt caching. |
| #131 | |
| #132 | ### Built-In Coding Image |
| #133 | |
| #134 | `coding-agent:v0` is the practical repository-engineering image. It starts with read-only workspace authority and human-output authority, but no default write/delete authority. Its prompt tells the agent to scale the size of a change to the goal, preserve plans and evidence in Object Memory, fork child workers only when parallel analysis materially helps, spawn fresh children when parent context should not be copied, use pregranted write/delete authority when present, request least-privilege permissions when authority is missing, use file/Object bridge tools for large content movement, parse pytest logs when available, and exit with a structured summary of changes, evidence, verification, residual risks, and follow-up. |
| #135 | |
| #136 | ### Security Properties Covered By Tests |
| #137 | |
| #138 | - Object handles are capability-protected; OIDs or object names alone do not grant access. |
| #139 | - Object Memory namespaces are capability-protected; namespace read/write and object read/write are separate checks. |
| #140 | - Tool tables and external-resource capabilities are independent. |
| #141 | - Tools cannot bypass filesystem or human primitive checks. |
| #142 | - Path containment, revoked capabilities, fork attenuation, spawn-child isolation, exec non-escalation, image registration authority, tool-table denial, Deno/TypeScript JIT scope, syscall capability checks, human approval inside syscalls, deferred process lifecycle, and unsafe import/API rejection are covered by tests. |
| #143 | - Built-in LLM-facing tools are checked so they do not directly touch host filesystem, terminal, network, shell, database, or secrets. |
| #144 | |
| #145 | ## Quick Start |
| #146 | |
| #147 | Install dependencies: |
| #148 | |
| #149 | ```bash |
| #150 | uv sync |
| #151 | ``` |
| #152 | |
| #153 | Deno is optional for the Python test suite. Install `deno` or set `agent_libos.config.DEFAULT_CONFIG.tools.deno_executable` if you want to validate or run real Deno/TypeScript JIT tools. |
| #154 | |
| #155 | Run tests: |
| #156 | |
| #157 | ```bash |
| #158 | uv run python -m unittest discover -s tests -v |
| #159 | ``` |
| #160 | |
| #161 | Run the deterministic local demo: |
| #162 | |
| #163 | ```bash |
| #164 | uv run agent-libos demo |
| #165 | ``` |
| #166 | |
| #167 | The demo does not call a real model. It covers process spawn/fork, Object Memory, a Deno/TypeScript JIT parser when Deno is available, checkpointing, capability denial before grant, human approval, filesystem write, final report object creation, and audit trace generation. If Deno is not installed, the demo reports the JIT validation error and continues through the rest of the contract. |
| #168 | |
| #169 | Use a persistent local runtime database: |
| #170 | |
| #171 | ```bash |
| #172 | uv run agent-libos --db .agent_libos.sqlite init |
| #173 | uv run agent-libos --db .agent_libos.sqlite demo |
| #174 | uv run agent-libos --db .agent_libos.sqlite audit |
| #175 | uv run agent-libos --db .agent_libos.sqlite processes |
| #176 | uv run agent-libos --db .agent_libos.sqlite tools |
| #177 | ``` |
| #178 | |
| #179 | ## LLM Configuration |
| #180 | |
| #181 | Create a local `.env` file for real-model execution: |
| #182 | |
| #183 | ```bash |
| #184 | OPENAI_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1 |
| #185 | OPENAI_LANGUAGE_MODEL=qwen3.7-max |
| #186 | OPENAI_API_KEY=... |
| #187 | ``` |
| #188 | |
| #189 | The LLM client uses the OpenAI Python SDK. By default it uses the Responses API for OpenAI-hosted models and falls back to Chat Completions for custom OpenAI-compatible `base_url` providers. Set `OPENAI_API_MODE=responses` or `OPENAI_API_MODE=chat` to force a mode. Optional knobs include `OPENAI_TIMEOUT`, `OPENAI_MAX_RETRIES`, `OPENAI_STORE`, `OPENAI_REASONING_EFFORT`, `OPENAI_VERBOSITY`, and provider-specific `OPENAI_ENABLE_THINKING`. |
| #190 | |
| #191 | Runtime defaults that are not provider secrets live in `agent_libos.config.DEFAULT_CONFIG`. This includes scheduler quanta, process budgets, default image ids, workspace namespace, tool timeouts, filesystem/object-memory size limits, Deno JIT sandbox limits, JSR import allowlists, shell policy lists, launcher presets, and example-script defaults. Components accept an `AgentLibOSConfig` where runtime-level injection is useful; fixed protocol identifiers and model-facing tool semantics stay in their own modules. |
| #192 | |
| #193 | Spawn and run a process: |
| #194 | |
| #195 | ```bash |
| #196 | uv run agent-libos --db .agent_libos.sqlite spawn --image coding-agent:v0 --goal "Write a short summary of README.md" |
| #197 | uv run agent-libos --db .agent_libos.sqlite run --max-quanta 10 |
| #198 | ``` |
| #199 | |
| #200 | `agent-libos run` uses the high-level async supervisor, so human terminal messages are processed as part of runtime execution. For manual queue processing, the lower-level command still exists: |
| #201 | |
| #202 | ```bash |
| #203 | uv run agent-libos --db .agent_libos.sqlite human |
| #204 | ``` |
| #205 | |
| #206 | Every LLM action-selection call is persisted in SQLite as an `llm_calls` row. The record includes the exact prompt messages, visible tool schemas, output content, tool calls, provider ids, model/api, token usage when the provider returns it, reasoning fields when exposed by the provider, raw response JSON, and errors. Inspect them with: |
| #207 | |
| #208 | ```bash |
| #209 | uv run agent-libos --db .agent_libos.sqlite llm-calls --pid <pid> |
| #210 | ``` |
| #211 | |
| #212 | Humans can also inject process messages at any time. This works while another `agent-libos run` is using the same SQLite runtime database: |
| #213 | |
| #214 | ```bash |
| #215 | uv run agent-libos --db .agent_libos.sqlite message <pid> "Please inspect the latest result" |
| #216 | uv run agent-libos --db .agent_libos.sqlite interrupt <pid> "Stop current work and read this first" |
| #217 | uv run agent-libos --db .agent_libos.sqlite message <pid> "Use this as job input" --channel human --correlation-id job-42 --run |
| #218 | ``` |
| #219 | |
| #220 | For a Codex CLI-style loop in one terminal, use interactive run. Plain text sends a normal message unless a human question or approval is pending, in which case it answers that request; use `/message <text>` to force a normal process message. `/interrupt <text>` sends an interrupt; `/pid <pid>` switches the target; `/exit` exits the interactive loop. |
| #221 | |
| #222 | ```bash |
| #223 | uv run agent-libos --db .agent_libos.sqlite run --interactive --pid <pid> --max-quanta 20 |
| #224 | ``` |
| #225 | |
| #226 | The CLI also exposes process built-ins for manual lifecycle control: |
| #227 | |
| #228 | ```bash |
| #229 | uv run agent-libos --db .agent_libos.sqlite cd <pid> src |
| #230 | uv run agent-libos --db .agent_libos.sqlite exec image.yaml "Review README.md" --pid <pid> --run |
| #231 | uv run agent-libos --db .agent_libos.sqlite exit <pid> --payload '{"done":true}' |
| #232 | ``` |
| #233 | |
| #234 | For `exec`, the first positional argument is the target image. It can be an already registered image id such as `coding-agent:v0`, or a `.yaml` / `.yml` AgentImage manifest path such as `image.yaml`. The second positional argument is the replacement goal. `--run` runs the scheduler immediately after exec; omit it or pass `--no-run` to only swap the process image and tool table. |
| #235 | |
| #236 | An AgentImage YAML manifest accepted by `load_image_from_yaml` can use either a top-level image mapping or direct image fields: |
| #237 | |
| #238 | ```yaml |
| #239 | image: |
| #240 | image_id: yaml-agent:v0 |
| #241 | name: yaml-agent |
| #242 | system_prompt: | |
| #243 | Use the smallest safe tool sequence. |
| #244 | default_tools: |
| #245 | - read_memory_object |
| #246 | - human_output |
| #247 | context_policy: evidence_first |
| #248 | safety_profile: review |
| #249 | metadata: |
| #250 | role: example |
| #251 | ``` |
| #252 | |
| #253 | ## Example Scripts |
| #254 | |
| #255 | Summarize a workspace document through an Agent process: |
| #256 | |
| #257 | ```bash |
| #258 | uv run python scripts/llm_summarize_document.py README.md --auto-approve |
| #259 | ``` |
| #260 | |
| #261 | Choose the permission policy explicitly for non-interactive runs: |
| #262 | |
| #263 | ```bash |
| #264 | uv run python scripts/llm_summarize_document.py README.md --permission-policy always_allow --auto-approve |
| #265 | uv run python scripts/llm_summarize_document.py README.md --permission-policy always_deny --auto-approve |
| #266 | ``` |
| #267 | |
| #268 | Run the real-model write-file smoke test: |
| #269 | |
| #270 | ```bash |
| #271 | uv run python scripts/llm_write_goal_smoke.py |
| #272 | ``` |
| #273 | |
| #274 | Launch a real coding agent against any workspace with preconfigured permissions: |
| #275 | |
| #276 | ```bash |
| #277 | uv run python scripts/run_coding_agent.py --workspace /path/to/repo --goal "Implement the requested change" |
| #278 | ``` |
| #279 | |
| #280 | On Windows PowerShell, the same launcher works with Windows-style paths: |
| #281 | |
| #282 | ```powershell |
| #283 | uv run python scripts\run_coding_agent.py --workspace ..\some-repo --goal "Summarize the current project" |
| #284 | ``` |
| #285 | |
| #286 | The launcher defaults to the `edit` permission preset: read+write over the workspace, but no delete authority. Use `--permission-preset read-only` for inspection-only runs, `--permission-preset full` for read+write+delete, or combine `read-only` with exact allow-list grants such as `--write-file src/main.py` and `--delete-dir build`. |
| #287 | |
| #288 | The launcher also grants a shell policy by default: `--shell-policy allowlist_auto_else_ask`. Use `--shell-policy none` to grant no shell execution policy, `always_deny` to hard-disable shell calls, `blocklist_ask_else_auto` to auto-allow commands except configured risky entries, or `always_allow` only for high-risk fully trusted runs. |
| #289 | |
| #290 | By default the launcher loads LLM settings from this Agent-libOS checkout's `.env` before mounting the target workspace into the Resource Provider Substrate. It does not change the launcher process cwd. Use `--env-file /path/to/.env` to override that. |
| #291 | |
| #292 | Copy a workspace text file through named Object Memory without materializing the file content into the process prompt: |
| #293 | |
| #294 | ```bash |
| #295 | uv run python scripts/object_memory_file_copy_smoke.py |
| #296 | ``` |
| #297 | |
| #298 | Run two async-scheduled processes that use `sleep` to alternate current-time output: |
| #299 | |
| #300 | ```bash |
| #301 | uv run python scripts/async_clock_interleave_smoke.py --iterations 3 --interval 0.2 |
| #302 | ``` |
| #303 | |
| #304 | Expected output order is `A, B, A, B, ...`, showing that one process sleeping does not block the other process. |
| #305 | |
| #306 | Ask the human which workspace file to view, then show that file's content: |
| #307 | |
| #308 | ```bash |
| #309 | uv run python scripts/ask_file_then_show.py |
| #310 | ``` |
| #311 | |
| #312 | For non-interactive testing: |
| #313 | |
| #314 | ```bash |
| #315 | uv run python scripts/ask_file_then_show.py --auto-answer README.md |
| #316 | ``` |
| #317 | |
| #318 | Run a traditional human/LLM terminal chat through the script-local `ChatImage`, using `ask_human` and `human_output`: |
| #319 | |
| #320 | ```bash |
| #321 | uv run python scripts/human_llm_chat.py |
| #322 | ``` |
| #323 | |
| #324 | For a deterministic local smoke run without calling a model: |
| #325 | |
| #326 | ```bash |
| #327 | uv run python scripts/human_llm_chat.py --mock --auto-message hello --auto-message /exit |
| #328 | ``` |
| #329 | |
| #330 | ## Architecture |
| #331 | |
| #332 | ```text |
| #333 | Agent Personality / Application |
| #334 | -> Skills / Tools Layer |
| #335 | - LLM-facing actions |
| #336 | - tool schemas |
| #337 | - macro actions |
| #338 | - skill metadata |
| #339 | -> Agent libOS Runtime |
| #340 | - AsyncProcessScheduler |
| #341 | - ProcessManager |
| #342 | - ObjectMemoryManager |
| #343 | - ToolBroker |
| #344 | - HumanObjectManager |
| #345 | - Primitive managers |
| #346 | - CapabilityManager |
| #347 | - EventBus |
| #348 | - CheckpointManager |
| #349 | - AuditManager |
| #350 | -> Resource Provider Substrate |
| #351 | - filesystem provider |
| #352 | - clock/sleep provider |
| #353 | - shell provider |
| #354 | - human provider |
| #355 | -> Host Runtime / Provider Backend |
| #356 | - local workspace filesystem |
| #357 | - host clock |
| #358 | - subprocess backend |
| #359 | - terminal or UI human I/O backend |
| #360 | - future remote, container, WASM, or service-backed providers |
| #361 | ``` |
| #362 | |
| #363 | The key design boundary is between model-facing tools and libOS primitives. For example, `write_text_file` can be visible in a process tool table, but `FilesystemAdapter.write_text()` still enforces workspace containment, resource capability or permission policy, human approval if needed, events, and audit logging. |
| #364 | |
| #365 | Putting a tool in a process table does not grant access to files, humans, shell, network, secrets, or other host resources. |
| #366 | |
| #367 | Primitives are not themselves the host implementation. They own libOS semantics: capability checks, human approval, event emission, and audit records. Concrete host calls live behind `agent_libos.substrate` providers such as `LocalFilesystemProvider`, `LocalClockProvider`, `LocalShellProvider`, and `LocalHumanProvider`. Shell calls are intentionally argv-only at this boundary, so quoting, pipes, redirects, and command chaining must be requested explicitly through an interpreter executable, where policy matching can see the interpreter token. HumanObject similarly owns request queues, approvals, wakeups, and audit records, while the substrate `HumanProvider` owns terminal or UI read/write. |
| #368 | |
| #369 | ## Runtime Execution Model |
| #370 | |
| #371 | High-level execution: |
| #372 | |
| #373 | ```python |
| #374 | results = await runtime.arun_until_idle(max_quanta=10) |
| #375 | ``` |
| #376 | |
| #377 | By default this does four things: |
| #378 | |
| #379 | 1. Runs all runnable processes asynchronously. |
| #380 | 2. Processes pending human terminal messages when processes are waiting on human input. |
| #381 | 3. Delivers process-message notices at the appropriate tool boundary. |
| #382 | 4. Wakes resumed processes and continues until no runnable or human-resumable work remains, or the quantum budget is exhausted. |
| #383 | |
| #384 | Process messages are explicit queue entries, not raw prompt text. A process can send messages to itself, its parent, or direct children with `send_process_message`. The receiver uses `read_process_messages` for non-blocking inspection or `receive_process_messages` to wait in `WAITING_EVENT` until a matching unread message arrives. Both read paths can filter by kind, sender, channel, correlation id, reply target, or exact message ids, and returned unread messages are acknowledged by default. Interrupt messages are checked before tool execution and preempt non-message tools until read; normal messages are noticed after a tool call and do not block the current tool. |
| #385 | |
| #386 | For debugging a pending approval state, opt out explicitly: |
| #387 | |
| #388 | ```python |
| #389 | results = await runtime.arun_until_idle(max_quanta=1, process_human_queue=False) |
| #390 | ``` |
| #391 | |
| #392 | Single-step APIs also remain available: |
| #393 | |
| #394 | ```python |
| #395 | result = await runtime.arun_next_process_once() |
| #396 | ``` |
| #397 | |
| #398 | ## Object Memory Namespace Model |
| #399 | |
| #400 | Object Memory names are local to a namespace. Runtime code that omits `namespace` uses the caller process namespace: |
| #401 | |
| #402 | ```python |
| #403 | pid = runtime.process.spawn(image="base-agent:v0", goal="collect notes") |
| #404 | handle = runtime.memory.create_object( |
| #405 | pid=pid, |
| #406 | object_type="summary", |
| #407 | name="notes", |
| #408 | payload={"entries": []}, |
| #409 | immutable=False, |
| #410 | ) |
| #411 | obj = runtime.memory.get_object_by_name(pid, "notes") |
| #412 | assert obj.namespace == runtime.memory.process_namespace(pid) |
| #413 | ``` |
| #414 | |
| #415 | For shared or phase-specific memory, create an explicit namespace and pass it on object operations: |
| #416 | |
| #417 | ```python |
| #418 | runtime.memory.create_namespace(pid, "project") |
| #419 | runtime.memory.create_namespace(pid, "project/research") |
| #420 | runtime.memory.create_object( |
| #421 | pid=pid, |
| #422 | object_type="observation", |
| #423 | namespace="project/research", |
| #424 | name="notes", |
| #425 | payload={"source": "README.md"}, |
| #426 | ) |
| #427 | listing = runtime.memory.list_namespace(pid, "project/research") |
| #428 | ``` |
| #429 | |
| #430 | The namespace grants directory-style authority such as list, lookup, and create. It does not replace object capabilities; reading `project/research/notes` still requires object read capability. |
| #431 | |
| #432 | ## How To Write Agent libOS Tools |
| #433 | |
| #434 | Tools should not directly access host resources. Use this pattern: |
| #435 | |
| #436 | 1. Define a Pydantic input schema and optional output schema. |
| #437 | 2. Subclass `SyncAgentTool` for blocking local code or `BaseAgentTool` for async code. |
| #438 | 3. Keep validation and model-facing ergonomics in the tool. |
| #439 | 4. Call `ctx.runtime.<primitive>` for process, memory, filesystem, human, clock, or other libOS operations. |
| #440 | 5. Let primitives enforce capability checks, containment, audit, event emission, checkpointing, and policy hooks. |
| #441 | 6. Register the tool through `Runtime._register_builtin_tools()` or a ToolBroker-backed registry. |
| #442 | |
| #443 | Do not put direct filesystem, terminal, network, shell, browser, database, or credential access inside a model-facing tool unless that code is itself the libOS primitive or a sandbox backend. |
| #444 | |
| #445 | Agent-authored JIT tools use TypeScript, not Python. A process proposes source with `propose_jit_tool`, validates it with `validate_jit_tool`, and registers it with `register_jit_tool`. Registration adds the new tool only to the registering process tool table. |
| #446 | |
| #447 | The TypeScript source shape is: |
| #448 | |
| #449 | ```ts |
| #450 | export async function run(args, libos) { |
| #451 | const file = await libos.syscall("filesystem.read_text", { path: args.path }); |
| #452 | return { bytes: file.content.length }; |
| #453 | } |
| #454 | ``` |
| #455 | |
| #456 | The `libos` object intentionally exposes only `syscall(name, args)`. It does not expose Python objects, `Runtime`, or `runtime.tools`. Syscall dispatch enters `LibOSSyscallSession`, which calls primitives such as filesystem, Object Memory, human, clock, process, shell, and image registry under the caller pid. |
| #457 | |
| #458 | ## Module Map |
| #459 | |
| #460 | ```text |
| #461 | agent_libos/ |
| #462 | api/ CLI entry points and demo orchestration |
| #463 | capability/ Capability grant, revoke, check, and object handles |
| #464 | config/ Typed runtime, LLM, tool, memory, launcher, and script defaults |
| #465 | human/ HumanObject query, approval, interrupt, and output primitives |
| #466 | images/ Built-in AgentImage definitions |
| #467 | llm/ Prompt, context, OpenAI-compatible client, executor, action parser |
| #468 | memory/ Typed Object Memory and MemoryView implementation |
| #469 | models/ Dataclass and enum models split by runtime domain |
| #470 | primitives/ LibOS primitive managers for filesystem, clock, shell, git, and browser placeholders |
| #471 | runtime/ Runtime composition, syscall broker, async scheduler, process manager, events, checkpoints, audit |
| #472 | skills/ Skill schema, registry, verifier, linker scaffolding |
| #473 | skills_tools/ Tool/action registry and bundle scaffolding |
| #474 | substrate/ Resource provider interfaces for filesystem, clock, shell, human I/O, and local host-backed implementations |
| #475 | storage/ SQLite persistence |
| #476 | tools/ Tool base classes, ToolBroker, sandbox, and built-in tools |
| #477 | scripts/ Real-model smoke and demo scripts |
| #478 | tests/ Safety-boundary and regression tests |
| #479 | ``` |
| #480 | |
| #481 | ## Roadmap |
| #482 | |
| #483 | Near-term priorities: |
| #484 | |
| #485 | - More LLM executor conformance tests for provider edge cases and unusual tool-call formats. |
| #486 | - Tool result compaction and long-context paging. |
| #487 | - Stronger checkpoint/rollback tests. |
| #488 | - Audit querying by pid, capability, tool, external resource, and time range. |
| #489 | - More complete terminal human queue UX. |
| #490 | - More hardened Deno JIT sandbox profiles and policy presets for high-risk tools. |
| #491 | |
| #492 | Longer-term directions: |
| #493 | |
| #494 | - Persistent signed skill/tool registry. |
| #495 | - Distributed process scheduling. |
| #496 | - Rich human role and authority model. |
| #497 | - ExternalRef objects and snapshots for external resources. |
| #498 | - Multi-tenant runtime policy. |
| #499 | - MCP-compatible tool exposure. |
| #500 | |
| #501 | ## Development |
| #502 | |
| #503 | Add runtime dependencies with: |
| #504 | |
| #505 | ```bash |
| #506 | uv add <package> |
| #507 | ``` |
| #508 | |
| #509 | Add development dependencies with: |
| #510 | |
| #511 | ```bash |
| #512 | uv add --dev <package> |
| #513 | ``` |
| #514 | |
| #515 | Commit both `pyproject.toml` and `uv.lock` after dependency changes. |
| #516 |