Which model should I use for my agent?

Start with a strong general reasoning model so you are debugging your logic, not the model's limits. Once the agent works, profile it: route cheap, high-volume steps (classification, extraction, routing) to a smaller, faster model and reserve the frontier model for hard planning and tool selection. This 'model cascade' commonly cuts cost 40-70% with little quality loss. Always measure on your own eval set rather than trusting leaderboards.

How do I test and evaluate an AI agent?

Build an eval set of 30-100 real tasks with known-good outcomes, then score every change against it. Track task success rate, number of steps to completion, tool-call accuracy, latency, and cost per run. Use trace logs to replay failures step by step, add assertions on tool inputs, and treat any regression in success rate as a blocking bug. LLM-as-judge graders help score open-ended outputs at scale, but spot-check the judge against human labels.

How much does it cost to run an AI agent?

Cost is driven by tokens, and agents are token-hungry because every loop iteration re-sends context. A simple two-tool agent might cost a fraction of a cent per run; a long-horizon research agent can run into dollars. Control spend with prompt caching, context trimming, a model cascade, hard step limits, and a token budget per task. Instrument cost per run early so it does not surprise you at scale.

Should I build a single agent or a multi-agent system?

Default to a single agent with a good set of tools. It is simpler to debug, cheaper, and lower latency, and it solves most problems. Reach for a multi-agent system only when the task has clearly separable specialties (e.g. research, coding, review), when context windows overflow, or when you need parallelism. Multi-agent designs add orchestration, coordination, and cost overhead, so adopt them deliberately rather than by default.

Learn · Hands-on tutorial

How to build AI agents

A practical, step-by-step guide to building an AI agent from scratch — pick a reasoning model, give it tools with function calling, add memory, run the agent loop, and ship it with evals and tracing. Code samples included.

Hands-on
20 min
Code included
Updated 2026

Start building free Browse agent templates

Learning how to build AI agents is less about a single magic library and more about assembling five durable parts the right way: a reasoning model, a set of tools, some memory, a control loop, and an evaluation harness. Get those right and you can build an AI agent that reliably completes real, multi-step work instead of just answering a single prompt.

An AI agent is a program that uses a large language model (LLM) as its decision-maker. You hand it a goal; it decides which tool to call, observes the result, and keeps going until the task is done. That autonomy is what separates an agent from a chatbot — and it is the whole reason agents can book the meeting, fix the bug, or resolve the ticket rather than describe how someone else might.

This guide is deliberately practical. We will walk the six steps end to end, show a minimal LLM agent in TypeScript, define a real tool, and cover the things that bite people in production: cost, evaluation, and tracing. If you are brand new, start with what is agentic AI; if you are choosing a stack, see AI agent frameworks.

Before you start

Prerequisites

You do not need a PhD — just a little setup and a clear task in mind. Here is what makes the rest of this tutorial smooth.

An API key for a reasoning model — any provider with tool/function calling support
Basic TypeScript or Python — you should be comfortable reading async/await code
A concrete task in mind — e.g. triage a support ticket or summarize a repo
One or two real tools — an API, a database query, or a search endpoint
A handful of test cases — 5–10 example tasks with known-good outcomes
A way to read logs — tracing/observability so you can debug the loop

The build process

How to build an AI agent in 6 steps

Follow this sequence the first time. Each step is small on its own; the magic is in how they compose into an autonomous system.

1. Define the goal and scope
Write the objective in one sentence, then define success criteria, allowed inputs, and what is explicitly out of scope. A vague goal produces a wandering agent. Decide up front when the agent should stop, ask a human, or hand off.
2. Choose a reasoning model
Pick an LLM with strong reasoning and reliable tool calling as the brain. Start with a capable general model so you debug your logic, not the model's ceiling — you can introduce a cheaper model for routine steps once it works.
3. Give it tools via function calling
Tools are how an agent acts. Each tool gets a name, a plain-language description, and a typed parameter schema. The model reads those descriptions to decide which tool to call and with what arguments — so write them like docs for a teammate.
4. Add memory and context
Give the agent short-term memory (a scratchpad of the current run) and long-term memory in a vector store for facts, past runs, and user preferences. Retrieval-augmented generation (RAG) pulls just the relevant context into each prompt.
5. Wrap it in the agent loop
Run the perceive → reason → act → observe cycle: send context to the model, execute any tool calls it requests, feed results back, and repeat. Add a step limit, retries, and guardrails so a confused agent fails safe instead of looping forever.
6. Evaluate, trace, and deploy
Score the agent against your eval set, trace every decision and token, and only then deploy — behind monitoring, a step budget, and a cost cap. Treat any drop in task success rate as a release-blocking regression.

Start smaller than you think

Your first version should solve one narrow task with one or two tools and no long-term memory. Get that loop green, then add capability. Most failed agent projects tried to do ten things on day one and could debug none of them.

Code

A minimal AI agent in TypeScript

Here is the smallest useful shape: a model, a list of tools, and a run call. The framework handles the loop; you supply the goal and the tools.

agent.tstypescript

1import { Agent, tool } from "@/sdk";  // the AI Agentics TypeScript SDK2import { z } from "zod";34const getWeather = tool({  // 1. declare a tool5  name: "get_weather",6  description: "Get the current weather for a city.",7  parameters: z.object({ city: z.string() }),8  run: async ({ city }) => fetchWeather(city),9});1011const agent = new Agent({  // 2. wire model + tools12  model: "reasoning-pro",13  instructions: "You are a concise travel assistant.",14  tools: [getWeather],15  maxSteps: 8,16});1718const result = await agent.run(  // 3. run the agent loop19  "Should I pack a jacket for Lisbon tomorrow?"20);21console.log(result.output);

A minimal agent definition — a model, tools, and a single run() that drives the loop until the task is done.

Notice what you did not write: no manual loop, no prompt-stuffing of tool results, no parsing of which function to call. The SDK runs the loop for you — it sends your instructions and tools to the model, executes any tool calls the model requests, feeds the outputs back, and repeats up to maxSteps. Your job is to design good tools and a clear goal. Next, let's look at the loop the SDK is running on your behalf.

The engine

Inside the agent loop

Perceive → Reason → Act → Observe

How run() actually drives the agent

When you call run(), the agent enters a loop. Each iteration is one turn of reasoning followed by at most one action, and the result of that action becomes part of the context for the next turn.

This is the ReAct pattern — reason, then act — and it is what makes agents resilient. If a tool errors or returns something unexpected, the model sees it and can re-plan, retry, or ask for help instead of crashing.

Perceive — assemble goal, history, and prior tool results into context.
Reason — the model decides: respond, or call a tool (and with what args).
Act — the runtime executes the chosen tool and captures its output.
Observe — append the result, check stop conditions, then loop again.

Explore agentic workflows & patterns

Perceive

Read goal & context

Reason

Plan the next step

Act

Call a tool / API

Observe

Evaluate the result

Each iteration the model picks a tool, the runtime executes it, and the result feeds the next decision — until the goal is met or a limit is hit.

Code

Defining a tool the model can call

A tool is just a function plus a schema the model reads. The clearer the name, description, and parameters, the better the model decides when and how to use it.

tools/search-orders.tstypescript

1import { tool } from "@/sdk";2import { z } from "zod";34export const searchOrders = tool({5  name: "search_orders",6  description:  // write this like docs for a teammate7    "Find a customer's orders by email. Use when the"8    + " user asks about order status, refunds, or history.",9  parameters: z.object({  // typed + validated args10    email: z.string().email(),11    status: z.enum(["open", "shipped", "all"])12      .default("all"),13    limit: z.number().int().min(1).max(50).default(10),14  }),15  run: async ({ email, status, limit }) => {  // the real action16    return db.orders.find({ email, status }, { limit });17  },18});

The description and typed parameters are part of the prompt — they teach the model exactly when and how to call this tool.

Three things make a tool reliable. First, a specific name the model can pattern-match against. Second, a description that says when to use it, not just what it does. Third, a typed, validated schema (here with Zod) so malformed arguments are caught before they ever hit your database. Want to go deeper on tool design, retries, and safety? See AI agent tools and AI agent memory.

Lessons learned

Common pitfalls (and what to do instead)

Almost every team building agents hits the same handful of traps. Here's the do / avoid list, distilled.

Do this

Start with one task, one or two tools, no long-term memory.
Write tool descriptions that say when to use the tool.
Set a hard maxSteps and a per-task token/cost budget.
Trace every run so you can replay and debug failures.
Validate tool arguments before they touch real systems.
Build an eval set early and score every change against it.

Avoid this

Giving the agent 20 tools on day one so it can't choose.
Letting the loop run unbounded — runaway cost and latency.
Stuffing the entire history into every prompt (use trimming/RAG).
Trusting vibes instead of measuring task success rate.
Granting write access to production with no human in the loop.
Jumping to multi-agent before a single agent has failed you.

Guardrails are not optional

An agent can call real tools, so a confused or adversarial input can do real damage. Sandbox dangerous actions, require human approval for irreversible operations (payments, deletes, external emails), and rate-limit tool calls. Production agents should run on a SOC 2–compliant platform with full audit logs — see our security practices.

Why it's worth it

The payoff of building it right

When the loop, tools, and evals are in place, the difference between a demo and a dependable agent is dramatic.

What disciplined agent-building buys you

Task success rate92%

Cost cut via model cascade60%

Faster debugging with tracing75%

Steps automated end-to-end85%

Representative gains teams report after adding evals, tracing, and a model cascade to a working agent.

Steps to a real agent

model → deploy

Code samples

agent + tool

~20m

To first run

from scratch

100%

Traceable

every decision logged

FAQ

Building AI agents, answered

No code can get you a working prototype: visual builders let you wire a model to a handful of tools and ship a simple agent in an afternoon. But to add custom tools, control the agent loop, harden it for production, and pass evals, you will eventually drop into a TypeScript or Python SDK. A practical path is to prototype the behavior visually, then rebuild the winners in code where you get version control, tests, and full observability.

Go deeper

Keep building: related guides

AI agent frameworks comparedPick the right SDK for your agent AI agent tools & function callingDesign tools the model uses well AI agent memoryScratchpads, vector stores, and RAG Multi-agent systemsWhen and how to orchestrate teams Agentic workflows & patternsReAct, planning, and reflection SDKs & API referenceBuild in TypeScript or Python

how to build AI agentsbuild an AI agentfunction callingagent loopReActLLM agentsAI agent toolsagent evaluation

Get started

Build your first AI agent today

Spin up a working agent from a template, add your tools, and ship it with tracing and evals built in. Free to start — no credit card required.

Start building free Read the docs

How to build AI agents

Prerequisites

How to build an AI agent in 6 steps

1. Define the goal and scope

2. Choose a reasoning model

3. Give it tools via function calling

4. Add memory and context

5. Wrap it in the agent loop

6. Evaluate, trace, and deploy