repositories
loading repo index
repositories
loading repo index
repository
loading code, commits, and activity
Mirrored from https://github.com/yingqi-z20/Agent-libOS
stars
latest
clone command
git clone gitlawb://did:key:z6MkqRzA...RfoM/yingqi-z20-Agen...git clone gitlawb://did:key:z6MkqRzA.../yingqi-z20-Agen...d98dd2c9IPC1d agoAn experimental Agent-native libOS runtime written in Python.
Agent libOS models an agent as a long-running, schedulable, interruptible, capability-controlled AgentProcess, not as a single chat request or workflow thread. The codebase is an MVP implementation of the ideas in agent_libos_design_doc.md.
This project is still in active development.
spawn, fork, exec, wait, signal, pause, resume, exit.Runtime.arun_until_idle() automatically keeps runnable processes moving.chdir into launched workspaces.kind, channel, correlation_id, reply_to, subject/body, and structured payload; receivers can read, acknowledge, or block on selective filters.WAITING_HUMAN; the runtime processes human terminal messages, wakes the process, and resumes the pending action.wait_child_process puts the parent in WAITING_EVENT, child exit wakes the parent, and the original wait action resumes without asking the model for a new action.run_next_process_once() / arun_next_process_once() do not drain the human queue.proc_abc resolves bare names inside process:proc_abc, similar to how an OS process sees its own virtual address space by default.create_memory_namespace and inspected with list_memory_namespace.namespace/name requires namespace read authority and object read authority.LLM-facing tools are stable wrappers over libOS primitives. They are similar to libc calls: ergonomic and model-facing, but not the security boundary.
Built-in tools currently include:
append_memory_objectask_humancreate_memory_namespacecreate_memory_objectcreate_object_from_filedelete_directorydelete_fileexec_processfork_child_processget_current_timeget_working_directoryhuman_outputload_image_from_yamllist_child_processeslist_memory_namespacemerge_child_memoryparse_pytest_logprocess_exitpropose_jit_toolread_directoryread_memory_objectread_process_messagesreceive_process_messagesread_text_fileregister_jit_toolrequest_permissionrun_shell_commandsend_process_messageset_working_directorysignal_child_processsleepspawn_child_processvalidate_jit_toolwait_child_processwrite_directorywrite_object_to_filewrite_text_fileechoImportant boundary rules:
AgentProcess.HumanProvider.load_image_from_yaml only reads a workspace YAML file and passes the parsed manifest to that primitive.ask_human creates a blocking HumanObject question and returns the answer only after the human queue responds.sleep is async, so one sleeping process does not block other runnable processes.run(args, libos) and can reach libOS only through await libos.syscall(name, args).--no-prompt and no read/write/net/env/run/ffi host permissions. Static imports are limited to configured jsr: packages, with a small @std/* allowlist by default.process.exit and process.exec are ordinary syscalls from the TypeScript side. The runtime applies the resulting lifecycle change only after the JIT tool returns its normal tool result.Permission requests are ordinary process actions mediated by the human queue:
request_permission asks the human to choose a policy for a resource/right pair.always_allow, always_deny, or ask_each_time.ask_each_time, the relevant primitive creates a per-use human approval request when the operation is attempted.filesystem:workspace:README.md, directory subtrees such as filesystem:workspace:agent_outputs/*, or the whole workspace.shell:*. The built-in policy levels are always_deny, allowlist_auto_else_ask, blocklist_ask_else_auto, and always_allow; always_allow is intentionally marked high-risk.bash or powershell.fork_child_process and spawn_child_process can explicitly inherit selected file, directory, or resource capabilities that the parent already holds.fork_child_process attenuates a selected parent MemoryView into the child. spawn_child_process creates a fresh direct child with a new process namespace and a goal-only MemoryView.exec_process replaces the current process image and tool table without changing pid. It never grants the target image's required capabilities automatically; capabilities are preserved only when explicitly requested, otherwise external capabilities are shrunk.write on image:<image_id> or a wildcard such as image:*. The YAML loader also requires filesystem read authority for the manifest path.ask_human stays in WAITING_HUMAN until the terminal queue supplies an answer.repr()-escaped content preview..env configuration.llm_context:<pid>. The runtime appends new process facts, events, capability snapshots, and object summaries to the end of this object so repeated prompt prefixes remain stable for prompt caching.coding-agent:v0 is the practical repository-engineering image. It starts with read-only workspace authority and human-output authority, but no default write/delete authority. Its prompt tells the agent to scale the size of a change to the goal, preserve plans and evidence in Object Memory, fork child workers only when parallel analysis materially helps, spawn fresh children when parent context should not be copied, use pregranted write/delete authority when present, request least-privilege permissions when authority is missing, use file/Object bridge tools for large content movement, parse pytest logs when available, and exit with a structured summary of changes, evidence, verification, residual risks, and follow-up.
Install dependencies:
uv sync
Deno is optional for the Python test suite. Install deno or set agent_libos.config.DEFAULT_CONFIG.tools.deno_executable if you want to validate or run real Deno/TypeScript JIT tools.
Run tests:
uv run python -m unittest discover -s tests -v
Run the deterministic local demo:
uv run agent-libos demo
The demo does not call a real model. It covers process spawn/fork, Object Memory, a Deno/TypeScript JIT parser when Deno is available, checkpointing, capability denial before grant, human approval, filesystem write, final report object creation, and audit trace generation. If Deno is not installed, the demo reports the JIT validation error and continues through the rest of the contract.
Use a persistent local runtime database:
uv run agent-libos --db .agent_libos.sqlite init uv run agent-libos --db .agent_libos.sqlite demo uv run agent-libos --db .agent_libos.sqlite audit uv run agent-libos --db .agent_libos.sqlite processes uv run agent-libos --db .agent_libos.sqlite tools
Create a local .env file for real-model execution:
OPENAI_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1 OPENAI_LANGUAGE_MODEL=qwen3.7-max OPENAI_API_KEY=...
The LLM client uses the OpenAI Python SDK. By default it uses the Responses API for OpenAI-hosted models and falls back to Chat Completions for custom OpenAI-compatible base_url providers. Set OPENAI_API_MODE=responses or OPENAI_API_MODE=chat to force a mode. Optional knobs include OPENAI_TIMEOUT, OPENAI_MAX_RETRIES, OPENAI_STORE, OPENAI_REASONING_EFFORT, OPENAI_VERBOSITY, and provider-specific OPENAI_ENABLE_THINKING.
Runtime defaults that are not provider secrets live in agent_libos.config.DEFAULT_CONFIG. This includes scheduler quanta, process budgets, default image ids, workspace namespace, tool timeouts, filesystem/object-memory size limits, Deno JIT sandbox limits, JSR import allowlists, shell policy lists, launcher presets, and example-script defaults. Components accept an AgentLibOSConfig where runtime-level injection is useful; fixed protocol identifiers and model-facing tool semantics stay in their own modules.
Spawn and run a process:
uv run agent-libos --db .agent_libos.sqlite spawn --image coding-agent:v0 --goal "Write a short summary of README.md" uv run agent-libos --db .agent_libos.sqlite run --max-quanta 10
agent-libos run uses the high-level async supervisor, so human terminal messages are processed as part of runtime execution. For manual queue processing, the lower-level command still exists:
uv run agent-libos --db .agent_libos.sqlite human
Every LLM action-selection call is persisted in SQLite as an llm_calls row. The record includes the exact prompt messages, visible tool schemas, output content, tool calls, provider ids, model/api, token usage when the provider returns it, reasoning fields when exposed by the provider, raw response JSON, and errors. Inspect them with:
uv run agent-libos --db .agent_libos.sqlite llm-calls --pid <pid>
Humans can also inject process messages at any time. This works while another agent-libos run is using the same SQLite runtime database:
uv run agent-libos --db .agent_libos.sqlite message <pid> "Please inspect the latest result" uv run agent-libos --db .agent_libos.sqlite interrupt <pid> "Stop current work and read this first" uv run agent-libos --db .agent_libos.sqlite message <pid> "Use this as job input" --channel human --correlation-id job-42 --run
For a Codex CLI-style loop in one terminal, use interactive run. Plain text sends a normal message unless a human question or approval is pending, in which case it answers that request; use /message <text> to force a normal process message. /interrupt <text> sends an interrupt; /pid <pid> switches the target; /exit exits the interactive loop.
uv run agent-libos --db .agent_libos.sqlite run --interactive --pid <pid> --max-quanta 20
The CLI also exposes process built-ins for manual lifecycle control:
uv run agent-libos --db .agent_libos.sqlite cd <pid> src
uv run agent-libos --db .agent_libos.sqlite exec image.yaml "Review README.md" --pid <pid> --run
uv run agent-libos --db .agent_libos.sqlite exit <pid> --payload '{"done":true}'
For exec, the first positional argument is the target image. It can be an already registered image id such as coding-agent:v0, or a .yaml / .yml AgentImage manifest path such as image.yaml. The second positional argument is the replacement goal. --run runs the scheduler immediately after exec; omit it or pass --no-run to only swap the process image and tool table.
An AgentImage YAML manifest accepted by load_image_from_yaml can use either a top-level image mapping or direct image fields:
image:
image_id: yaml-agent:v0
name: yaml-agent
system_prompt: |
Use the smallest safe tool sequence.
default_tools:
- read_memory_object
- human_output
context_policy: evidence_first
safety_profile: review
metadata:
role: example
Summarize a workspace document through an Agent process:
uv run python scripts/llm_summarize_document.py README.md --auto-approve
Choose the permission policy explicitly for non-interactive runs:
uv run python scripts/llm_summarize_document.py README.md --permission-policy always_allow --auto-approve uv run python scripts/llm_summarize_document.py README.md --permission-policy always_deny --auto-approve
Run the real-model write-file smoke test:
uv run python scripts/llm_write_goal_smoke.py
Launch a real coding agent against any workspace with preconfigured permissions:
uv run python scripts/run_coding_agent.py --workspace /path/to/repo --goal "Implement the requested change"
On Windows PowerShell, the same launcher works with Windows-style paths:
uv run python scripts\run_coding_agent.py --workspace ..\some-repo --goal "Summarize the current project"
The launcher defaults to the edit permission preset: read+write over the workspace, but no delete authority. Use --permission-preset read-only for inspection-only runs, --permission-preset full for read+write+delete, or combine read-only with exact allow-list grants such as --write-file src/main.py and --delete-dir build.
The launcher also grants a shell policy by default: --shell-policy allowlist_auto_else_ask. Use --shell-policy none to grant no shell execution policy, always_deny to hard-disable shell calls, blocklist_ask_else_auto to auto-allow commands except configured risky entries, or always_allow only for high-risk fully trusted runs.
By default the launcher loads LLM settings from this Agent-libOS checkout's .env before mounting the target workspace into the Resource Provider Substrate. It does not change the launcher process cwd. Use --env-file /path/to/.env to override that.
Copy a workspace text file through named Object Memory without materializing the file content into the process prompt:
uv run python scripts/object_memory_file_copy_smoke.py
Run two async-scheduled processes that use sleep to alternate current-time output:
uv run python scripts/async_clock_interleave_smoke.py --iterations 3 --interval 0.2
Expected output order is A, B, A, B, ..., showing that one process sleeping does not block the other process.
Ask the human which workspace file to view, then show that file's content:
uv run python scripts/ask_file_then_show.py
For non-interactive testing:
uv run python scripts/ask_file_then_show.py --auto-answer README.md
Run a traditional human/LLM terminal chat through the script-local ChatImage, using ask_human and human_output:
uv run python scripts/human_llm_chat.py
For a deterministic local smoke run without calling a model:
uv run python scripts/human_llm_chat.py --mock --auto-message hello --auto-message /exit
Agent Personality / Application
-> Skills / Tools Layer
- LLM-facing actions
- tool schemas
- macro actions
- skill metadata
-> Agent libOS Runtime
- AsyncProcessScheduler
- ProcessManager
- ObjectMemoryManager
- ToolBroker
- HumanObjectManager
- Primitive managers
- CapabilityManager
- EventBus
- CheckpointManager
- AuditManager
-> Resource Provider Substrate
- filesystem provider
- clock/sleep provider
- shell provider
- human provider
-> Host Runtime / Provider Backend
- local workspace filesystem
- host clock
- subprocess backend
- terminal or UI human I/O backend
- future remote, container, WASM, or service-backed providers
The key design boundary is between model-facing tools and libOS primitives. For example, write_text_file can be visible in a process tool table, but FilesystemAdapter.write_text() still enforces workspace containment, resource capability or permission policy, human approval if needed, events, and audit logging.
Putting a tool in a process table does not grant access to files, humans, shell, network, secrets, or other host resources.
Primitives are not themselves the host implementation. They own libOS semantics: capability checks, human approval, event emission, and audit records. Concrete host calls live behind agent_libos.substrate providers such as LocalFilesystemProvider, LocalClockProvider, LocalShellProvider, and LocalHumanProvider. Shell calls are intentionally argv-only at this boundary, so quoting, pipes, redirects, and command chaining must be requested explicitly through an interpreter executable, where policy matching can see the interpreter token. HumanObject similarly owns request queues, approvals, wakeups, and audit records, while the substrate HumanProvider owns terminal or UI read/write.
High-level execution:
results = await runtime.arun_until_idle(max_quanta=10)
By default this does four things:
Process messages are explicit queue entries, not raw prompt text. A process can send messages to itself, its parent, or direct children with send_process_message. The receiver uses read_process_messages for non-blocking inspection or receive_process_messages to wait in WAITING_EVENT until a matching unread message arrives. Both read paths can filter by kind, sender, channel, correlation id, reply target, or exact message ids, and returned unread messages are acknowledged by default. Interrupt messages are checked before tool execution and preempt non-message tools until read; normal messages are noticed after a tool call and do not block the current tool.
For debugging a pending approval state, opt out explicitly:
results = await runtime.arun_until_idle(max_quanta=1, process_human_queue=False)
Single-step APIs also remain available:
result = await runtime.arun_next_process_once()
Object Memory names are local to a namespace. Runtime code that omits namespace uses the caller process namespace:
pid = runtime.process.spawn(image="base-agent:v0", goal="collect notes")
handle = runtime.memory.create_object(
pid=pid,
object_type="summary",
name="notes",
payload={"entries": []},
immutable=False,
)
obj = runtime.memory.get_object_by_name(pid, "notes")
assert obj.namespace == runtime.memory.process_namespace(pid)
For shared or phase-specific memory, create an explicit namespace and pass it on object operations:
runtime.memory.create_namespace(pid, "project")
runtime.memory.create_namespace(pid, "project/research")
runtime.memory.create_object(
pid=pid,
object_type="observation",
namespace="project/research",
name="notes",
payload={"source": "README.md"},
)
listing = runtime.memory.list_namespace(pid, "project/research")
The namespace grants directory-style authority such as list, lookup, and create. It does not replace object capabilities; reading project/research/notes still requires object read capability.
Tools should not directly access host resources. Use this pattern:
SyncAgentTool for blocking local code or BaseAgentTool for async code.ctx.runtime.<primitive> for process, memory, filesystem, human, clock, or other libOS operations.Runtime._register_builtin_tools() or a ToolBroker-backed registry.Do not put direct filesystem, terminal, network, shell, browser, database, or credential access inside a model-facing tool unless that code is itself the libOS primitive or a sandbox backend.
Agent-authored JIT tools use TypeScript, not Python. A process proposes source with propose_jit_tool, validates it with validate_jit_tool, and registers it with register_jit_tool. Registration adds the new tool only to the registering process tool table.
The TypeScript source shape is:
export async function run(args, libos) {
const file = await libos.syscall("filesystem.read_text", { path: args.path });
return { bytes: file.content.length };
}
The libos object intentionally exposes only syscall(name, args). It does not expose Python objects, Runtime, or runtime.tools. Syscall dispatch enters LibOSSyscallSession, which calls primitives such as filesystem, Object Memory, human, clock, process, shell, and image registry under the caller pid.
agent_libos/ api/ CLI entry points and demo orchestration capability/ Capability grant, revoke, check, and object handles config/ Typed runtime, LLM, tool, memory, launcher, and script defaults human/ HumanObject query, approval, interrupt, and output primitives images/ Built-in AgentImage definitions llm/ Prompt, context, OpenAI-compatible client, executor, action parser memory/ Typed Object Memory and MemoryView implementation models/ Dataclass and enum models split by runtime domain primitives/ LibOS primitive managers for filesystem, clock, shell, git, and browser placeholders runtime/ Runtime composition, syscall broker, async scheduler, process manager, events, checkpoints, audit skills/ Skill schema, registry, verifier, linker scaffolding skills_tools/ Tool/action registry and bundle scaffolding substrate/ Resource provider interfaces for filesystem, clock, shell, human I/O, and local host-backed implementations storage/ SQLite persistence tools/ Tool base classes, ToolBroker, sandbox, and built-in tools scripts/ Real-model smoke and demo scripts tests/ Safety-boundary and regression tests
Near-term priorities:
Longer-term directions:
Add runtime dependencies with:
uv add <package>
Add development dependencies with:
uv add --dev <package>
Commit both pyproject.toml and uv.lock after dependency changes.