How to build AI agents
A practical, step-by-step guide to building an AI agent from scratch — pick a reasoning model, give it tools with function calling, add memory, run the agent loop, and ship it with evals and tracing. Code samples included.
- Hands-on
- 20 min
- Code included
- Updated 2026
Learning how to build AI agents is less about a single magic library and more about assembling five durable parts the right way: a reasoning model, a set of tools, some memory, a control loop, and an evaluation harness. Get those right and you can build an AI agent that reliably completes real, multi-step work instead of just answering a single prompt.
An AI agent is a program that uses a large language model (LLM) as its decision-maker. You hand it a goal; it decides which tool to call, observes the result, and keeps going until the task is done. That autonomy is what separates an agent from a chatbot — and it is the whole reason agents can book the meeting, fix the bug, or resolve the ticket rather than describe how someone else might.
This guide is deliberately practical. We will walk the six steps end to end, show a minimal LLM agent in TypeScript, define a real tool, and cover the things that bite people in production: cost, evaluation, and tracing. If you are brand new, start with what is agentic AI; if you are choosing a stack, see AI agent frameworks.
Prerequisites
You do not need a PhD — just a little setup and a clear task in mind. Here is what makes the rest of this tutorial smooth.
- An API key for a reasoning model — any provider with tool/function calling support
- Basic TypeScript or Python — you should be comfortable reading async/await code
- A concrete task in mind — e.g. triage a support ticket or summarize a repo
- One or two real tools — an API, a database query, or a search endpoint
- A handful of test cases — 5–10 example tasks with known-good outcomes
- A way to read logs — tracing/observability so you can debug the loop
How to build an AI agent in 6 steps
Follow this sequence the first time. Each step is small on its own; the magic is in how they compose into an autonomous system.
1. Define the goal and scope
Write the objective in one sentence, then define success criteria, allowed inputs, and what is explicitly out of scope. A vague goal produces a wandering agent. Decide up front when the agent should stop, ask a human, or hand off.
2. Choose a reasoning model
Pick an LLM with strong reasoning and reliable tool calling as the brain. Start with a capable general model so you debug your logic, not the model's ceiling — you can introduce a cheaper model for routine steps once it works.
3. Give it tools via function calling
Tools are how an agent acts. Each tool gets a name, a plain-language description, and a typed parameter schema. The model reads those descriptions to decide which tool to call and with what arguments — so write them like docs for a teammate.
4. Add memory and context
Give the agent short-term memory (a scratchpad of the current run) and long-term memory in a vector store for facts, past runs, and user preferences. Retrieval-augmented generation (RAG) pulls just the relevant context into each prompt.
5. Wrap it in the agent loop
Run the perceive → reason → act → observe cycle: send context to the model, execute any tool calls it requests, feed results back, and repeat. Add a step limit, retries, and guardrails so a confused agent fails safe instead of looping forever.
6. Evaluate, trace, and deploy
Score the agent against your eval set, trace every decision and token, and only then deploy — behind monitoring, a step budget, and a cost cap. Treat any drop in task success rate as a release-blocking regression.
Start smaller than you think
Your first version should solve one narrow task with one or two tools and no long-term memory. Get that loop green, then add capability. Most failed agent projects tried to do ten things on day one and could debug none of them.
A minimal AI agent in TypeScript
Here is the smallest useful shape: a model, a list of tools, and a run call. The framework handles the loop; you supply the goal and the tools.
1import { Agent, tool } from "@/sdk"; // the AI Agentics TypeScript SDK2import { z } from "zod";34const getWeather = tool({ // 1. declare a tool5 name: "get_weather",6 description: "Get the current weather for a city.",7 parameters: z.object({ city: z.string() }),8 run: async ({ city }) => fetchWeather(city),9});1011const agent = new Agent({ // 2. wire model + tools12 model: "reasoning-pro",13 instructions: "You are a concise travel assistant.",14 tools: [getWeather],15 maxSteps: 8,16});1718const result = await agent.run( // 3. run the agent loop19 "Should I pack a jacket for Lisbon tomorrow?"20);21console.log(result.output);Notice what you did not write: no manual loop, no prompt-stuffing of tool results, no parsing of which function to call. The SDK runs the loop for you — it sends your instructions and tools to the model, executes any tool calls the model requests, feeds the outputs back, and repeats up to maxSteps. Your job is to design good tools and a clear goal. Next, let's look at the loop the SDK is running on your behalf.
Inside the agent loop
How run() actually drives the agent
When you call run(), the agent enters a loop. Each iteration is one turn of reasoning followed by at most one action, and the result of that action becomes part of the context for the next turn.
This is the ReAct pattern — reason, then act — and it is what makes agents resilient. If a tool errors or returns something unexpected, the model sees it and can re-plan, retry, or ask for help instead of crashing.
- Perceive — assemble goal, history, and prior tool results into context.
- Reason — the model decides: respond, or call a tool (and with what args).
- Act — the runtime executes the chosen tool and captures its output.
- Observe — append the result, check stop conditions, then loop again.
Perceive
Read goal & context
Reason
Plan the next step
Act
Call a tool / API
Observe
Evaluate the result
Defining a tool the model can call
A tool is just a function plus a schema the model reads. The clearer the name, description, and parameters, the better the model decides when and how to use it.
1import { tool } from "@/sdk";2import { z } from "zod";34export const searchOrders = tool({5 name: "search_orders",6 description: // write this like docs for a teammate7 "Find a customer's orders by email. Use when the"8 + " user asks about order status, refunds, or history.",9 parameters: z.object({ // typed + validated args10 email: z.string().email(),11 status: z.enum(["open", "shipped", "all"])12 .default("all"),13 limit: z.number().int().min(1).max(50).default(10),14 }),15 run: async ({ email, status, limit }) => { // the real action16 return db.orders.find({ email, status }, { limit });17 },18});Three things make a tool reliable. First, a specific name the model can pattern-match against. Second, a description that says when to use it, not just what it does. Third, a typed, validated schema (here with Zod) so malformed arguments are caught before they ever hit your database. Want to go deeper on tool design, retries, and safety? See AI agent tools and AI agent memory.
Common pitfalls (and what to do instead)
Almost every team building agents hits the same handful of traps. Here's the do / avoid list, distilled.
Do this
- Start with one task, one or two tools, no long-term memory.
- Write tool descriptions that say when to use the tool.
- Set a hard maxSteps and a per-task token/cost budget.
- Trace every run so you can replay and debug failures.
- Validate tool arguments before they touch real systems.
- Build an eval set early and score every change against it.
Avoid this
- Giving the agent 20 tools on day one so it can't choose.
- Letting the loop run unbounded — runaway cost and latency.
- Stuffing the entire history into every prompt (use trimming/RAG).
- Trusting vibes instead of measuring task success rate.
- Granting write access to production with no human in the loop.
- Jumping to multi-agent before a single agent has failed you.
Guardrails are not optional
An agent can call real tools, so a confused or adversarial input can do real damage. Sandbox dangerous actions, require human approval for irreversible operations (payments, deletes, external emails), and rate-limit tool calls. Production agents should run on a SOC 2–compliant platform with full audit logs — see our security practices.
The payoff of building it right
When the loop, tools, and evals are in place, the difference between a demo and a dependable agent is dramatic.
What disciplined agent-building buys you
Steps to a real agent
model → deploy
Code samples
agent + tool
To first run
from scratch
Traceable
every decision logged
Building AI agents, answered
No code can get you a working prototype: visual builders let you wire a model to a handful of tools and ship a simple agent in an afternoon. But to add custom tools, control the agent loop, harden it for production, and pass evals, you will eventually drop into a TypeScript or Python SDK. A practical path is to prototype the behavior visually, then rebuild the winners in code where you get version control, tests, and full observability.
Keep building: related guides
Build your first AI agent today
Spin up a working agent from a template, add your tools, and ship it with tracing and evals built in. Free to start — no credit card required.