openhands-sdk/openhands/sdk/agent/
Core Responsibilities
The Agent system has four primary responsibilities:- Reasoning-Action Loop - Query LLM to generate next actions based on conversation history
- Tool Orchestration - Select and execute tools, handle results and errors
- Context Management - Apply skills, manage conversation history via condensers
- Security Validation - Analyze proposed actions for safety before execution via security analyzer
Architecture
Key Components
| Component | Purpose | Design |
|---|---|---|
Agent | Main implementation | Stateless reasoning-action loop executor |
AgentBase | Abstract base class | Defines agent interface and initialization |
AgentContext | Context container | Manages skills, prompts, and metadata |
Condenser | History compression | Reduces context when token limits approached |
SecurityAnalyzer | Safety validation | Evaluates action risk before execution |
Reasoning-Action Loop
The agent operates through a single-step execution model where eachstep() call processes one reasoning cycle:
Step Execution Flow:
- Pending Actions: If actions awaiting confirmation exist, execute them and return
- Condensation: If condenser exists:
- Call
condenser.condense()with current event view - If returns
View: use condensed events for LLM query (continue in same step) - If returns
Condensation: emit event and return (will be processed next step)
- Call
- LLM Query: Query LLM with messages from event history
- If context window exceeded: emit
CondensationRequestand return
- If context window exceeded: emit
- Response Parsing: Parse LLM response into events
- Tool calls → create
ActionEvent(s) - Text message → create
MessageEventand return
- Tool calls → create
- Confirmation Check: If actions need user approval:
- Set conversation status to
WAITING_FOR_CONFIRMATIONand return
- Set conversation status to
- Action Execution: Execute tools and create
ObservationEvent(s)
- Stateless: Agent holds no mutable state between steps
- Event-Driven: Reads from event history, writes new events
- Interruptible: Each step is atomic and can be paused/resumed
Agent Context
The agent appliesAgentContext which includes skills and prompts to shape LLM behavior:
| Skill Type | Activation | Use Case |
|---|---|---|
| repo | Always included | Project-specific context, conventions |
| knowledge | Trigger words/patterns | Domain knowledge, special behaviors |
Tool Execution
Tools follow a strict action-observation pattern:Invariants (Normative)
AgentBase: Configuration is Stateless and Immutable
Natural language invariant:- An
AgentBaseinstance is a pure configuration object. It may cache materializedToolDefinitioninstances internally, but it must remain valid to re-create those tools from its declarative spec.
context AgentBase inv Frozen: self.model_config.frozen = true
Initialization: System Prompt Precedes Any User Message
Agent.init_state(state, on_event=...) is responsible for creating the initial system prompt event.
Natural language invariant:
- A
ConversationStatemust not contain a userMessageEventbefore it contains aSystemPromptEvent.
context ConversationState inv SystemBeforeUser: self.events->select(e|e.oclIsKindOf(SystemPromptEvent))->size() >= 1 implies self.events->forAll(e| e.oclIsKindOf(MessageEvent) and e.source='user' implies e.index > systemPromptIndex )
Tool Materialization: Names Resolve to Registered ToolDefinitions
AnAgent is configured with a list of tool specs (openhands.sdk.tool.spec.Tool) that reference registered ToolDefinition factories.
Natural language invariant:
resolve_tool(Tool(name=X))must succeed (tool name present in registry) for all tools the agent intends to use.- Tool factories must return a sequence of
ToolDefinitioninstances; tool sets (e.g., browser tool sets) are represented as multi-element sequences.
Multi-Tool Calls: Shared Thought Only on First ActionEvent
When an LLM returns parallel tool calls, the SDK represents this as multipleActionEvents that share the same llm_response_id.
Natural language invariant:
- For a batch of
ActionEvents with the samellm_response_id, only the first action carriesthought/reasoning_content/thinking_blocks; subsequent actions must have emptythought.
event.base._combine_action_events):
context ActionEvent inv BatchedThoughtOnlyFirst: (self.llm_response_id = other.llm_response_id and self <> first) implies self.thought->isEmpty()
Confirmation Mode: Requires Both Analyzer and Policy
conversation.is_confirmation_mode_active is true iff:
- A
SecurityAnalyzeris configured, and - The confirmation policy is not
NeverConfirm.
context BaseConversation inv ConfirmationModeIff: self.is_confirmation_mode_active = (self.state.security_analyzer <> null and not self.state.confirmation_policy.oclIsKindOf(NeverConfirm))
| Mode | Behavior | Use Case |
|---|---|---|
| Direct | Execute immediately | Development, trusted environments |
| Confirmation | Store as pending, wait for user approval | High-risk actions, production |
- Low Risk: Execute immediately
- Medium Risk: Log warning, execute with monitoring
- High Risk: Block execution, request user confirmation
Component Relationships
How Agent Interacts
Relationship Characteristics:- Conversation → Agent: Orchestrates step execution, provides event history
- Agent → LLM: Queries for next actions, receives tool calls or messages
- Agent → Tools: Executes actions, receives observations
- AgentContext → Agent: Injects skills and prompts into LLM queries
See Also
- Conversation Architecture - Agent orchestration and lifecycle
- Tool System - Tool definition and execution patterns
- Events - Event types and structures
- Skills - Prompt engineering and skill patterns
- LLM - Language model abstraction

