Optional Isolation over Mandatory Sandboxing
V0 Challenge:
Every tool call in V0 executed in a sandboxed Docker container by default. While this guaranteed reproducibility and security, it also created friction — the agent and sandbox ran as separate processes, states diverged easily, and multi-tenant workloads could crash each other.
Moreover, with the rise of the Model Context Protocol (MCP), which assumes local execution and direct access to user environments, V0’s rigid isolation model became incompatible.
Every tool call in V0 executed in a sandboxed Docker container by default. While this guaranteed reproducibility and security, it also created friction — the agent and sandbox ran as separate processes, states diverged easily, and multi-tenant workloads could crash each other.
Moreover, with the rise of the Model Context Protocol (MCP), which assumes local execution and direct access to user environments, V0’s rigid isolation model became incompatible.
Sandboxing should be opt-in, not universal.
V1 unifies agent and tool execution within a single process by default, aligning with MCP’s local-execution model.
When isolation is needed, the same stack can be transparently containerized, maintaining flexibility without complexity.
Stateless by Default, One Source of Truth for State
V0 Challenge:
V0 relied on mutable Python objects and dynamic typing, which led to silent inconsistencies — failed session restores, version drift, and non-deterministic behavior. Each subsystem tracked its own transient state, making debugging and recovery painful.
V0 relied on mutable Python objects and dynamic typing, which led to silent inconsistencies — failed session restores, version drift, and non-deterministic behavior. Each subsystem tracked its own transient state, making debugging and recovery painful.
Keep everything stateless, with exactly one mutable state.
All components (agents, tools, LLMs, and configurations) are immutable Pydantic models validated at construction.
The only mutable entity is the conversation state, a single source of truth that enables deterministic replay and robust persistence across sessions or distributed systems.
Clear Boundaries between Agent and Applications
V0 Challenge:
The same codebase powered the CLI, web interface, and integrations (e.g., Github, Gitlab, etc). Over time, application-specific conditionals and prompts polluted the agent core, making it brittle.
Heavy research dependencies and benchmark integrations further bloated production builds.
The same codebase powered the CLI, web interface, and integrations (e.g., Github, Gitlab, etc). Over time, application-specific conditionals and prompts polluted the agent core, making it brittle.
Heavy research dependencies and benchmark integrations further bloated production builds.
Maintain strict separation of concerns.
V1 divides the system into stable, isolated layers: the SDK (agent core), tools (set of tools), workspace (sandbox), and agent server (server that runs inside sandbox).
Applications communicate with the agent via APIs rather than embedding it directly, ensuring research and production can evolve independently.
Composable Components for Extensibility
V0 Challenge:
Because agent logic was hard-coded into the core application, extending behavior (e.g., adding new tools or entry points) required branching logic for different entrypoints. This rigidity limited experimentation and discouraged contributions.
Because agent logic was hard-coded into the core application, extending behavior (e.g., adding new tools or entry points) required branching logic for different entrypoints. This rigidity limited experimentation and discouraged contributions.
Everything should be composable and safe to extend.
Agents are defined as graphs of interchangeable components—tools, prompts, LLMs, and contexts—each described declaratively with strong typing.
Developers can reconfigure capabilities (e.g., swap toolsets, override prompts, add delegation logic) without modifying core code, preserving stability while fostering rapid innovation.
Design Invariants (Normative)
This page describes the architectural invariants the SDK relies on. These are treated as contracts between components. Where appropriate, we express invariants in a lightweight OCL-like notation:context X inv Name: <predicate>pre:/post:for pre/post-conditions
Single Source of Truth for Runtime State
The SDK is designed so that all runtime state that affects agent execution is representable as an event log plus a small, validated state snapshot.- Configuration objects are immutable (Pydantic
frozen=Truewhere applicable). - The only intentionally mutable entity is
ConversationState, which owns the event log, execution status, secrets registry, and persistence handles.
context AgentBase inv StatelessConfiguration: self.model_config.frozen = truecontext Event inv Immutable: self.model_config.frozen = true
ConversationStateis the single coordination point for execution. Other objects may maintain private runtime caches, but must not be required to restore or replay a conversation.
Workspace Boundary is the I/O Boundary
All side effects against the environment (filesystem, processes, git operations) must occur through a Workspace (local or remote), which becomes the I/O boundary.- Tools may execute in different runtimes (local process vs inside agent-server), but conceptually they always operate against a workspace rooted at
workspace.working_dir.
context BaseWorkspace inv WorkingDirIsString: self.working_dir.oclIsTypeOf(String)
Event Log is the Execution Trace
The event stream is the single authoritative trace of what the agent saw and did. Natural language invariant:- Any agent decision that should be reproducible on replay must be representable as an
LLMConvertibleEvent(for LLM context) plus associated non-LLM events (e.g., state updates, errors).
Tool Calls are Explicit, Typed, and Linkable
The SDK assumes an explicitAction -> Observation pairing.
OCL-like (conceptual):
context ActionEvent inv HasToolCallId: self.tool_call_id <> nullcontext ObservationEvent inv RefersToAction: self.action_id <> null
- Observations must be attributable to a specific action/tool call so that conversations can be audited, visualized, and resumed.
Remote vs Local is an Execution Detail
The SDK makes deployment mode (local vs remote) a runtime selection behind a common interface, not two separate programming models.Conversation(...)returns eitherLocalConversationorRemoteConversationbased on the provided workspace.- User-facing code typically should not need to change when switching workspaces; you mostly swap configuration.
This does not mean every optional method behaves identically across workspace types (e.g.,
pause() / resume() may be a no-op locally and meaningful remotely). The core conversation API (send_message, run, events) stays consistent.
