Agents are PydanticAI's primary interface for interacting with LLMs.
In some use cases a single Agent will control an entire application or component,
but multiple agents can also interact to embody more complex workflows.
The Agent class has full API documentation, but conceptually you can think of an agent as a container for:
Optional default model settings to help fine tune requests. Can also be specified when running the agent.
In typing terms, agents are generic in their dependency and result types, e.g., an agent which required dependencies of type Foobar and returned results of type list[str] would have type Agent[Foobar, list[str]]. In practice, you shouldn't need to care about this, it should just mean your IDE can tell you when you have the right type, and if you choose to use static type checking it should work well with PydanticAI.
Here's a toy example of an agent that simulates a roulette wheel:
roulette_wheel.py
frompydantic_aiimportAgent,RunContextroulette_agent=Agent('openai:gpt-4o',deps_type=int,result_type=bool,system_prompt=('Use the `roulette_wheel` function to see if the ''customer has won based on the number they provide.'),)@roulette_agent.toolasyncdefroulette_wheel(ctx:RunContext[int],square:int)->str:"""check if the square is a winner"""return'winner'ifsquare==ctx.depselse'loser'# Run the agentsuccess_number=18result=roulette_agent.run_sync('Put my money on square eighteen',deps=success_number)print(result.data)#> Trueresult=roulette_agent.run_sync('I bet five is the winner',deps=success_number)print(result.data)#> False
Agents are designed for reuse, like FastAPI Apps
Agents are intended to be instantiated once (frequently as module globals) and reused throughout your application, similar to a small FastAPI app or an APIRouter.
Running Agents
There are four ways to run an agent:
agent.run() — a coroutine which returns a RunResult containing a completed response.
agent.run_sync() — a plain, synchronous function which returns a RunResult containing a completed response (internally, this just calls loop.run_until_complete(self.run())).
agent.iter() — a context manager which returns an AgentRun, an async-iterable over the nodes of the agent's underlying Graph.
Here's a simple example demonstrating the first three:
run_agent.py
frompydantic_aiimportAgentagent=Agent('openai:gpt-4o')result_sync=agent.run_sync('What is the capital of Italy?')print(result_sync.data)#> Romeasyncdefmain():result=awaitagent.run('What is the capital of France?')print(result.data)#> Parisasyncwithagent.run_stream('What is the capital of the UK?')asresponse:print(awaitresponse.get_data())#> London
(This example is complete, it can be run "as is" — you'll need to add asyncio.run(main()) to run main)
You can also pass messages from previous runs to continue a conversation or provide context, as described in Messages and Chat History.
Iterating Over an Agent's Graph
Under the hood, each Agent in PydanticAI uses pydantic-graph to manage its execution flow. pydantic-graph is a generic, type-centric library for building and running finite state machines in Python. It doesn't actually depend on PydanticAI — you can use it standalone for workflows that have nothing to do with GenAI — but PydanticAI makes use of it to orchestrate the handling of model requests and model responses in an agent's run.
In many scenarios, you don't need to worry about pydantic-graph at all; calling agent.run(...) simply traverses the underlying graph from start to finish. However, if you need deeper insight or control — for example to capture each tool invocation, or to inject your own logic at specific stages — PydanticAI exposes the lower-level iteration process via Agent.iter. This method returns an AgentRun, which you can async-iterate over, or manually drive node-by-node via the next method. Once the agent's graph returns an End, you have the final result along with a detailed history of all steps.
async for iteration
Here's an example of using async for with iter to record each node the agent executes:
agent_iter_async_for.py
frompydantic_aiimportAgentagent=Agent('openai:gpt-4o')asyncdefmain():nodes=[]# Begin an AgentRun, which is an async-iterable over the nodes of the agent's graphasyncwithagent.iter('What is the capital of France?')asagent_run:asyncfornodeinagent_run:# Each node represents a step in the agent's executionnodes.append(node)print(nodes)""" [ ModelRequestNode( request=ModelRequest( parts=[ UserPromptPart( content='What is the capital of France?', timestamp=datetime.datetime(...), part_kind='user-prompt', ) ], kind='request', ) ), CallToolsNode( model_response=ModelResponse( parts=[TextPart(content='Paris', part_kind='text')], model_name='gpt-4o', timestamp=datetime.datetime(...), kind='response', ) ), End(data=FinalResult(data='Paris', tool_name=None, tool_call_id=None)), ] """print(agent_run.result.data)#> Paris
The AgentRun is an async iterator that yields each node (BaseNode or End) in the flow.
The run ends when an End node is returned.
Using .next(...) manually
You can also drive the iteration manually by passing the node you want to run next to the AgentRun.next(...) method. This allows you to inspect or modify the node before it executes or skip nodes based on your own logic, and to catch errors in next() more easily:
agent_iter_next.py
frompydantic_aiimportAgentfrompydantic_graphimportEndagent=Agent('openai:gpt-4o')asyncdefmain():asyncwithagent.iter('What is the capital of France?')asagent_run:node=agent_run.next_nodeall_nodes=[node]# Drive the iteration manually:whilenotisinstance(node,End):node=awaitagent_run.next(node)all_nodes.append(node)print(all_nodes)""" [ UserPromptNode( user_prompt='What is the capital of France?', system_prompts=(), system_prompt_functions=[], system_prompt_dynamic_functions={}, ), ModelRequestNode( request=ModelRequest( parts=[ UserPromptPart( content='What is the capital of France?', timestamp=datetime.datetime(...), part_kind='user-prompt', ) ], kind='request', ) ), CallToolsNode( model_response=ModelResponse( parts=[TextPart(content='Paris', part_kind='text')], model_name='gpt-4o', timestamp=datetime.datetime(...), kind='response', ) ), End(data=FinalResult(data='Paris', tool_name=None, tool_call_id=None)), ] """
Accessing usage and the final result
You can retrieve usage statistics (tokens, requests, etc.) at any time from the AgentRun object via agent_run.usage(). This method returns a Usage object containing the usage data.
Once the run finishes, agent_run.final_result becomes a AgentRunResult object containing the final output (and related metadata).
Streaming
Here is an example of streaming an agent run in combination with async for iteration:
streaming.py
importasynciofromdataclassesimportdataclassfromdatetimeimportdatefrompydantic_aiimportAgentfrompydantic_ai.messagesimport(FinalResultEvent,FunctionToolCallEvent,FunctionToolResultEvent,PartDeltaEvent,PartStartEvent,TextPartDelta,ToolCallPartDelta,)frompydantic_ai.toolsimportRunContext@dataclassclassWeatherService:asyncdefget_forecast(self,location:str,forecast_date:date)->str:# In real code: call weather API, DB queries, etc.returnf'The forecast in {location} on {forecast_date} is 24°C and sunny.'asyncdefget_historic_weather(self,location:str,forecast_date:date)->str:# In real code: call a historical weather API or DBreturn(f'The weather in {location} on {forecast_date} was 18°C and partly cloudy.')weather_agent=Agent[WeatherService,str]('openai:gpt-4o',deps_type=WeatherService,result_type=str,# We'll produce a final answer as plain textsystem_prompt='Providing a weather forecast at the locations the user provides.',)@weather_agent.toolasyncdefweather_forecast(ctx:RunContext[WeatherService],location:str,forecast_date:date,)->str:ifforecast_date>=date.today():returnawaitctx.deps.get_forecast(location,forecast_date)else:returnawaitctx.deps.get_historic_weather(location,forecast_date)output_messages:list[str]=[]asyncdefmain():user_prompt='What will the weather be like in Paris on Tuesday?'# Begin a node-by-node, streaming iterationasyncwithweather_agent.iter(user_prompt,deps=WeatherService())asrun:asyncfornodeinrun:ifAgent.is_user_prompt_node(node):# A user prompt node => The user has provided inputoutput_messages.append(f'=== UserPromptNode: {node.user_prompt} ===')elifAgent.is_model_request_node(node):# A model request node => We can stream tokens from the model's requestoutput_messages.append('=== ModelRequestNode: streaming partial request tokens ===')asyncwithnode.stream(run.ctx)asrequest_stream:asyncforeventinrequest_stream:ifisinstance(event,PartStartEvent):output_messages.append(f'[Request] Starting part {event.index}: {event.part!r}')elifisinstance(event,PartDeltaEvent):ifisinstance(event.delta,TextPartDelta):output_messages.append(f'[Request] Part {event.index} text delta: {event.delta.content_delta!r}')elifisinstance(event.delta,ToolCallPartDelta):output_messages.append(f'[Request] Part {event.index} args_delta={event.delta.args_delta}')elifisinstance(event,FinalResultEvent):output_messages.append(f'[Result] The model produced a final result (tool_name={event.tool_name})')elifAgent.is_call_tools_node(node):# A handle-response node => The model returned some data, potentially calls a tooloutput_messages.append('=== CallToolsNode: streaming partial response & tool usage ===')asyncwithnode.stream(run.ctx)ashandle_stream:asyncforeventinhandle_stream:ifisinstance(event,FunctionToolCallEvent):output_messages.append(f'[Tools] The LLM calls tool={event.part.tool_name!r} with args={event.part.args} (tool_call_id={event.part.tool_call_id!r})')elifisinstance(event,FunctionToolResultEvent):output_messages.append(f'[Tools] Tool call {event.tool_call_id!r} returned => {event.result.content}')elifAgent.is_end_node(node):assertrun.result.data==node.data.data# Once an End node is reached, the agent run is completeoutput_messages.append(f'=== Final Agent Output: {run.result.data} ===')if__name__=='__main__':asyncio.run(main())print(output_messages)""" [ '=== ModelRequestNode: streaming partial request tokens ===', '[Request] Starting part 0: ToolCallPart(tool_name=\'weather_forecast\', args=\'{"location":"Pa\', tool_call_id=\'0001\', part_kind=\'tool-call\')', '[Request] Part 0 args_delta=ris","forecast_', '[Request] Part 0 args_delta=date":"2030-01-', '[Request] Part 0 args_delta=01"}', '=== CallToolsNode: streaming partial response & tool usage ===', '[Tools] The LLM calls tool=\'weather_forecast\' with args={"location":"Paris","forecast_date":"2030-01-01"} (tool_call_id=\'0001\')', "[Tools] Tool call '0001' returned => The forecast in Paris on 2030-01-01 is 24°C and sunny.", '=== ModelRequestNode: streaming partial request tokens ===', "[Request] Starting part 0: TextPart(content='It will be ', part_kind='text')", '[Result] The model produced a final result (tool_name=None)', "[Request] Part 0 text delta: 'warm and sunny '", "[Request] Part 0 text delta: 'in Paris on '", "[Request] Part 0 text delta: 'Tuesday.'", '=== CallToolsNode: streaming partial response & tool usage ===', '=== Final Agent Output: It will be warm and sunny in Paris on Tuesday. ===', ] """
Additional Configuration
Usage Limits
PydanticAI offers a UsageLimits structure to help you limit your
usage (tokens and/or requests) on model runs.
You can apply these settings by passing the usage_limits argument to the run{_sync,_stream} functions.
Consider the following example, where we limit the number of response tokens:
frompydantic_aiimportAgentfrompydantic_ai.exceptionsimportUsageLimitExceededfrompydantic_ai.usageimportUsageLimitsagent=Agent('anthropic:claude-3-5-sonnet-latest')result_sync=agent.run_sync('What is the capital of Italy? Answer with just the city.',usage_limits=UsageLimits(response_tokens_limit=10),)print(result_sync.data)#> Romeprint(result_sync.usage())"""Usage(requests=1, request_tokens=62, response_tokens=1, total_tokens=63, details=None)"""try:result_sync=agent.run_sync('What is the capital of Italy? Answer with a paragraph.',usage_limits=UsageLimits(response_tokens_limit=10),)exceptUsageLimitExceededase:print(e)#> Exceeded the response_tokens_limit of 10 (response_tokens=32)
Restricting the number of requests can be useful in preventing infinite loops or excessive tool calling:
fromtyping_extensionsimportTypedDictfrompydantic_aiimportAgent,ModelRetryfrompydantic_ai.exceptionsimportUsageLimitExceededfrompydantic_ai.usageimportUsageLimitsclassNeverResultType(TypedDict):""" Never ever coerce data to this type. """never_use_this:stragent=Agent('anthropic:claude-3-5-sonnet-latest',retries=3,result_type=NeverResultType,system_prompt='Any time you get a response, call the `infinite_retry_tool` to produce another response.',)@agent.tool_plain(retries=5)definfinite_retry_tool()->int:raiseModelRetry('Please try again.')try:result_sync=agent.run_sync('Begin infinite retry loop!',usage_limits=UsageLimits(request_limit=3))exceptUsageLimitExceededase:print(e)#> The next request would exceed the request_limit of 3
Note
This is especially relevant if you've registered many tools. The request_limit can be used to prevent the model from calling them in a loop too many times.
Model (Run) Settings
PydanticAI offers a settings.ModelSettings structure to help you fine tune your requests.
This structure allows you to configure common parameters that influence the model's behavior, such as temperature, max_tokens,
timeout, and more.
There are two ways to apply these settings:
1. Passing to run{_sync,_stream} functions via the model_settings argument. This allows for fine-tuning on a per-request basis.
2. Setting during Agent initialization via the model_settings argument. These settings will be applied by default to all subsequent run calls using said agent. However, model_settings provided during a specific run call will override the agent's default settings.
For example, if you'd like to set the temperature setting to 0.0 to ensure less random behavior,
you can do the following:
frompydantic_aiimportAgentagent=Agent('openai:gpt-4o')result_sync=agent.run_sync('What is the capital of Italy?',model_settings={'temperature':0.0})print(result_sync.data)#> Rome
Model specific settings
If you wish to further customize model behavior, you can use a subclass of ModelSettings, like GeminiModelSettings, associated with your model of choice.
For example:
frompydantic_aiimportAgent,UnexpectedModelBehaviorfrompydantic_ai.models.geminiimportGeminiModelSettingsagent=Agent('google-gla:gemini-1.5-flash')try:result=agent.run_sync('Write a list of 5 very rude things that I might say to the universe after stubbing my toe in the dark:',model_settings=GeminiModelSettings(temperature=0.0,# general model settings can also be specifiedgemini_safety_settings=[{'category':'HARM_CATEGORY_HARASSMENT','threshold':'BLOCK_LOW_AND_ABOVE',},{'category':'HARM_CATEGORY_HATE_SPEECH','threshold':'BLOCK_LOW_AND_ABOVE',},],),)exceptUnexpectedModelBehaviorase:print(e)""" Safety settings triggered, body: <safety settings details> """
Runs vs. Conversations
An agent run might represent an entire conversation — there's no limit to how many messages can be exchanged in a single run. However, a conversation might also be composed of multiple runs, especially if you need to maintain state between separate interactions or API calls.
Here's an example of a conversation comprised of multiple runs:
conversation_example.py
frompydantic_aiimportAgentagent=Agent('openai:gpt-4o')# First runresult1=agent.run_sync('Who was Albert Einstein?')print(result1.data)#> Albert Einstein was a German-born theoretical physicist.# Second run, passing previous messagesresult2=agent.run_sync('What was his most famous equation?',message_history=result1.new_messages(),)print(result2.data)#> Albert Einstein's most famous equation is (E = mc^2).
(This example is complete, it can be run "as is")
Type safe by design
PydanticAI is designed to work well with static type checkers, like mypy and pyright.
Typing is (somewhat) optional
PydanticAI is designed to make type checking as useful as possible for you if you choose to use it, but you don't have to use types everywhere all the time.
That said, because PydanticAI uses Pydantic, and Pydantic uses type hints as the definition for schema and validation, some types (specifically type hints on parameters to tools, and the result_type arguments to Agent) are used at runtime.
We (the library developers) have messed up if type hints are confusing you more than helping you, if you find this, please create an issue explaining what's annoying you!
In particular, agents are generic in both the type of their dependencies and the type of results they return, so you can use the type hints to ensure you're using the right types.
Consider the following script with type mistakes:
type_mistakes.py
fromdataclassesimportdataclassfrompydantic_aiimportAgent,RunContext@dataclassclassUser:name:stragent=Agent('test',deps_type=User,result_type=bool,)@agent.system_promptdefadd_user_name(ctx:RunContext[str])->str:returnf"The user's name is {ctx.deps}."deffoobar(x:bytes)->None:passresult=agent.run_sync('Does their name start with "A"?',deps=User('Anne'))foobar(result.data)
Running mypy on this will give the following output:
System prompts might seem simple at first glance since they're just strings (or sequences of strings that are concatenated), but crafting the right system prompt is key to getting the model to behave as you want.
Generally, system prompts fall into two categories:
Static system prompts: These are known when writing the code and can be defined via the system_prompt parameter of the Agent constructor.
Dynamic system prompts: These depend in some way on context that isn't known until runtime, and should be defined via functions decorated with @agent.system_prompt.
You can add both to a single agent; they're appended in the order they're defined at runtime.
Here's an example using both types of system prompts:
system_prompts.py
fromdatetimeimportdatefrompydantic_aiimportAgent,RunContextagent=Agent('openai:gpt-4o',deps_type=str,system_prompt="Use the customer's name while replying to them.",)@agent.system_promptdefadd_the_users_name(ctx:RunContext[str])->str:returnf"The user's name is {ctx.deps}."@agent.system_promptdefadd_the_date()->str:returnf'The date is {date.today()}.'result=agent.run_sync('What is the date?',deps='Frank')print(result.data)#> Hello Frank, the date today is 2032-01-02.
(This example is complete, it can be run "as is")
Reflection and self-correction
Validation errors from both function tool parameter validation and structured result validation can be passed back to the model with a request to retry.
You can access the current retry count from within a tool or result validator via ctx.retry.
Here's an example:
tool_retry.py
frompydanticimportBaseModelfrompydantic_aiimportAgent,RunContext,ModelRetryfromfake_databaseimportDatabaseConnclassChatResult(BaseModel):user_id:intmessage:stragent=Agent('openai:gpt-4o',deps_type=DatabaseConn,result_type=ChatResult,)@agent.tool(retries=2)defget_user_by_name(ctx:RunContext[DatabaseConn],name:str)->int:"""Get a user's ID from their full name."""print(name)#> John#> John Doeuser_id=ctx.deps.users.get(name=name)ifuser_idisNone:raiseModelRetry(f'No user found with name {name!r}, remember to provide their full name')returnuser_idresult=agent.run_sync('Send a message to John Doe asking for coffee next week',deps=DatabaseConn())print(result.data)"""user_id=123 message='Hello John, would you be free for coffee sometime next week? Let me know what works for you!'"""
Model errors
If models behave unexpectedly (e.g., the retry limit is exceeded, or their API returns 503), agent runs will raise UnexpectedModelBehavior.
In these cases, capture_run_messages can be used to access the messages exchanged during the run to help diagnose the issue.
agent_model_errors.py
frompydantic_aiimportAgent,ModelRetry,UnexpectedModelBehavior,capture_run_messagesagent=Agent('openai:gpt-4o')@agent.tool_plaindefcalc_volume(size:int)->int:ifsize==42:returnsize**3else:raiseModelRetry('Please try again.')withcapture_run_messages()asmessages:try:result=agent.run_sync('Please get me the volume of a box with size 6.')exceptUnexpectedModelBehaviorase:print('An error occurred:',e)#> An error occurred: Tool exceeded max retries count of 1print('cause:',repr(e.__cause__))#> cause: ModelRetry('Please try again.')print('messages:',messages)""" messages: [ ModelRequest( parts=[ UserPromptPart( content='Please get me the volume of a box with size 6.', timestamp=datetime.datetime(...), part_kind='user-prompt', ) ], kind='request', ), ModelResponse( parts=[ ToolCallPart( tool_name='calc_volume', args={'size': 6}, tool_call_id='pyd_ai_tool_call_id', part_kind='tool-call', ) ], model_name='gpt-4o', timestamp=datetime.datetime(...), kind='response', ), ModelRequest( parts=[ RetryPromptPart( content='Please try again.', tool_name='calc_volume', tool_call_id='pyd_ai_tool_call_id', timestamp=datetime.datetime(...), part_kind='retry-prompt', ) ], kind='request', ), ModelResponse( parts=[ ToolCallPart( tool_name='calc_volume', args={'size': 6}, tool_call_id='pyd_ai_tool_call_id', part_kind='tool-call', ) ], model_name='gpt-4o', timestamp=datetime.datetime(...), kind='response', ), ] """else:print(result.data)
(This example is complete, it can be run "as is")
Note
If you call run, run_sync, or run_stream more than once within a single capture_run_messages context, messages will represent the messages exchanged during the first call only.