Skip to main content
The Agent component implements the core reasoning-action loop that drives autonomous task execution. It orchestrates LLM queries, tool execution, and context management through a stateless, event-driven architecture. Source: openhands-sdk/openhands/sdk/agent/

Core Responsibilities

The Agent system has four primary responsibilities:
  1. Reasoning-Action Loop - Query LLM to generate next actions based on conversation history
  2. Tool Orchestration - Select and execute tools, handle results and errors
  3. Context Management - Apply skills, manage conversation history via condensers
  4. Security Validation - Analyze proposed actions for safety before execution via security analyzer

Architecture

Key Components

ComponentPurposeDesign
AgentMain implementationStateless reasoning-action loop executor
AgentBaseAbstract base classDefines agent interface and initialization
AgentContextContext containerManages skills, prompts, and metadata
CondenserHistory compressionReduces context when token limits approached
SecurityAnalyzerSafety validationEvaluates action risk before execution

Reasoning-Action Loop

The agent operates through a single-step execution model where each step() call processes one reasoning cycle: Step Execution Flow:
  1. Pending Actions: If actions awaiting confirmation exist, execute them and return
  2. Condensation: If condenser exists:
    • Call condenser.condense() with current event view
    • If returns View: use condensed events for LLM query (continue in same step)
    • If returns Condensation: emit event and return (will be processed next step)
  3. LLM Query: Query LLM with messages from event history
    • If context window exceeded: emit CondensationRequest and return
  4. Response Parsing: Parse LLM response into events
    • Tool calls → create ActionEvent(s)
    • Text message → create MessageEvent and return
  5. Confirmation Check: If actions need user approval:
    • Set conversation status to WAITING_FOR_CONFIRMATION and return
  6. Action Execution: Execute tools and create ObservationEvent(s)
Key Characteristics:
  • Stateless: Agent holds no mutable state between steps
  • Event-Driven: Reads from event history, writes new events
  • Interruptible: Each step is atomic and can be paused/resumed

Agent Context

The agent applies AgentContext which includes skills and prompts to shape LLM behavior:
Skill TypeActivationUse Case
repoAlways includedProject-specific context, conventions
knowledgeTrigger words/patternsDomain knowledge, special behaviors
Review this guide for details on creating and applying agent context and skills.

Tool Execution

Tools follow a strict action-observation pattern:

Invariants (Normative)

AgentBase: Configuration is Stateless and Immutable

Natural language invariant:
  • An AgentBase instance is a pure configuration object. It may cache materialized ToolDefinition instances internally, but it must remain valid to re-create those tools from its declarative spec.
OCL-like:
  • context AgentBase inv Frozen: self.model_config.frozen = true

Initialization: System Prompt Precedes Any User Message

Agent.init_state(state, on_event=...) is responsible for creating the initial system prompt event. Natural language invariant:
  • A ConversationState must not contain a user MessageEvent before it contains a SystemPromptEvent.
OCL-like (conceptual):
  • context ConversationState inv SystemBeforeUser: self.events->select(e|e.oclIsKindOf(SystemPromptEvent))->size() >= 1 implies self.events->forAll(e| e.oclIsKindOf(MessageEvent) and e.source='user' implies e.index > systemPromptIndex )

Tool Materialization: Names Resolve to Registered ToolDefinitions

An Agent is configured with a list of tool specs (openhands.sdk.tool.spec.Tool) that reference registered ToolDefinition factories. Natural language invariant:
  • resolve_tool(Tool(name=X)) must succeed (tool name present in registry) for all tools the agent intends to use.
  • Tool factories must return a sequence of ToolDefinition instances; tool sets (e.g., browser tool sets) are represented as multi-element sequences.

Multi-Tool Calls: Shared Thought Only on First ActionEvent

When an LLM returns parallel tool calls, the SDK represents this as multiple ActionEvents that share the same llm_response_id. Natural language invariant:
  • For a batch of ActionEvents with the same llm_response_id, only the first action carries thought / reasoning_content / thinking_blocks; subsequent actions must have empty thought.
OCL-like (as modeled in event.base._combine_action_events):
  • context ActionEvent inv BatchedThoughtOnlyFirst: (self.llm_response_id = other.llm_response_id and self <> first) implies self.thought->isEmpty()

Confirmation Mode: Requires Both Analyzer and Policy

conversation.is_confirmation_mode_active is true iff:
  • A SecurityAnalyzer is configured, and
  • The confirmation policy is not NeverConfirm.
OCL-like (conceptual):
  • context BaseConversation inv ConfirmationModeIff: self.is_confirmation_mode_active = (self.state.security_analyzer <> null and not self.state.confirmation_policy.oclIsKindOf(NeverConfirm))
Execution Modes:
ModeBehaviorUse Case
DirectExecute immediatelyDevelopment, trusted environments
ConfirmationStore as pending, wait for user approvalHigh-risk actions, production
Security Integration: Before execution, the security analyzer evaluates each action:
  • Low Risk: Execute immediately
  • Medium Risk: Log warning, execute with monitoring
  • High Risk: Block execution, request user confirmation

Component Relationships

How Agent Interacts

Relationship Characteristics:
  • Conversation → Agent: Orchestrates step execution, provides event history
  • Agent → LLM: Queries for next actions, receives tool calls or messages
  • Agent → Tools: Executes actions, receives observations
  • AgentContext → Agent: Injects skills and prompts into LLM queries

See Also