The Builder's Guide to AI Agents
Builder-focused guide to AI agents: patterns, stacks, guardrails, and a setup workflow to ship in 2-3 weeks.
The Builder's Guide to AI Agents
Stop reading tutorials. Start shipping agents.
I've built dozens of AI agents over the past year—from simple chatbots to complex multi-agent systems that run entire business workflows. Here’s what I wish someone told me on day one. For a broader shipping playbook, pair this with How to Ship AI Products Fast and the Moltbot Bridge Guide.
Official docs to bookmark


No theory. No "maybe one day" speculation. Just what works, what doesn't, and how to ship something useful in 2-3 weeks.
What Are AI Agents (And Why Should You Care?)
An AI agent is software that can:
- Understand what you want
- Plan how to do it
- Use tools to get it done
- Iterate until it's right
That's it. Not magic. Not AGI. Just LLMs + tools + a loop.
Why Build Them?
Because you're probably doing repetitive knowledge work that could be automated:
- Researching leads and writing personalized outreach
- Processing documents and extracting structured data
- Managing customer conversations across channels
- Generating reports from multiple data sources
- Coordinating between different services and APIs
I replaced a 4-hour weekly research task with an agent that runs in 20 minutes. That's 16 hours/month back. Multiply that across your business.
When NOT to Build Agents
Real talk: agents aren't always the answer.
❌ Don't build an agent when:
- A simple script would do (if-then-else handles it)
- The task requires 100% accuracy (agents make mistakes)
- You don't have clear success criteria
- The cost per task doesn't justify the build time
✅ Build an agent when:
- Tasks require judgment and adaptation
- The workflow has multiple steps with branching logic
- You need to handle edge cases gracefully
- Scaling humans isn't economical
Architecture Patterns
I've used three main patterns. Pick based on your complexity.
Pattern 1: Single Agent with Tools
The workhorse. One agent, multiple tools, handles most use cases.
┌─────────────────────────────────────────┐
│ AGENT │
│ ┌─────────────────────────────────┐ │
│ │ LLM (Claude/GPT) │ │
│ └─────────────────────────────────┘ │
│ │ │
│ ┌─────────────┼─────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │Search│ │ Code │ │ API │ │
│ │ Tool │ │ Exec │ │ Call │ │
│ └──────┘ └──────┘ └──────┘ │
└─────────────────────────────────────────┘
When to use: 80% of cases. Start here.
Example: A research agent that:
- Takes a company name
- Searches the web for information
- Reads their website
- Checks LinkedIn
- Writes a summary
One agent, four tools, done.
Pattern 2: Multi-Agent Orchestration
When one agent isn't enough. Specialized agents coordinated by a router.
┌─────────────────────────────────────────────────────┐
│ ORCHESTRATOR │
│ ┌───────────────────────────────────────────────┐ │
│ │ Router / Planner │ │
│ └───────────────────────────────────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Research │ │ Writer │ │ Reviewer │ │
│ │ Agent │ │ Agent │ │ Agent │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────┘
When to use:
- Different sub-tasks need different prompts and models
- You want agents to check each other's work
- The workflow is complex enough that one prompt can't handle it
Example: Content pipeline:
- Research Agent → gathers sources, facts, data
- Writer Agent → drafts the content
- Editor Agent → reviews, suggests improvements
- Publisher Agent → formats and posts
Each agent has a focused role. Way cleaner than one mega-prompt trying to do everything.
Pattern 3: Autonomous Loop with Memory
For ongoing tasks that persist across sessions.
┌─────────────────────────────────────────────────────┐
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Input │────▶│ Agent │────▶│ Output │ │
│ └──────────┘ └────┬─────┘ └──────────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │ Memory │ │
│ │ (Vector + │ │
│ │ Files) │ │
│ └───────────┘ │
│ │
│ ◀─────────── Feedback Loop ───────────▶ │
│ │
└─────────────────────────────────────────────────────┘
When to use:
- Long-running assistants (like personal AI)
- Tasks that require learning from past interactions
- Workflows that span days or weeks
Example: A customer success agent that:
- Remembers all past conversations with each customer
- Learns their preferences over time
- Proactively reaches out based on patterns
- Gets better the more it's used
Tech Stack Options
Real options I've used. Not a feature comparison—just what works.
LLM Providers
| Provider | Best For | Cost | Notes |
|---|---|---|---|
| Claude (Anthropic) | Complex reasoning, long context, coding | Medium | My go-to for most agents. Sonnet for speed, Opus for quality. |
| GPT-4 (OpenAI) | General purpose, wide ecosystem | Medium-High | Good all-rounder, extensive function calling support |
| GPT-4o-mini | High volume, cost-sensitive | Low | Great for simple tasks, fast |
| Gemini (Google) | Long context, multimodal | Medium | 1M+ token context is insane for document processing |
| Local (Llama, Mistral) | Privacy, no API costs | Time + Hardware | Viable for simpler tasks, not there yet for complex agents |
My recommendation: Start with Claude Sonnet 3.5. Best balance of capability, cost, and speed. Upgrade to Opus when you need more reasoning power.
Claude Code vs ChatGPT for coding
If you're building agents, Claude Code is best when you need fast, file-level changes across a repo and repeatable workflows. ChatGPT-style chat is great for brainstorming, but it is slower for multi-file edits. My rule: use Claude Code to build and refactor, use chat for ideation and review.
Frameworks
| Framework | Best For | Learning Curve |
|---|---|---|
| LangChain | Everything (but complex) | High |
| LangGraph | Stateful multi-agent | High |
| CrewAI | Multi-agent teams | Medium |
| AutoGen | Research/experimentation | Medium |
| Instructor | Structured outputs | Low |
| Raw API calls | Simple agents, full control | Low |
My recommendation:
For simple agents (Pattern 1): Raw API calls + Instructor. No framework overhead. You understand every line of code.
For multi-agent (Pattern 2-3): LangGraph if you need complex state management, CrewAI if you want faster setup.
Vector Databases (For Memory)
| Database | Best For | Self-hosted? |
|---|---|---|
| Pinecone | Production, managed | No |
| Chroma | Local dev, simple | Yes |
| Weaviate | Hybrid search | Yes |
| Qdrant | Performance | Yes |
| pgvector | Already using Postgres | Yes |
My recommendation: Start with Chroma locally. Move to Pinecone or pgvector for production.
Step-by-Step Setup Guide
If you're following the course curriculum, this aligns with Module 2: First Agent.
Let's build a useful agent in under an hour.
Project: Research Agent
An agent that takes a topic, searches the web, and writes a summary with sources.
Step 1: Environment Setup
# Create project
mkdir research-agent && cd research-agent
python -m venv venv
source venv/bin/activate
# Install dependencies
pip install anthropic instructor httpx beautifulsoup4 rich
Step 2: Basic Agent Structure
Create agent.py:
import anthropic
import instructor
from pydantic import BaseModel
from typing import Optional
import httpx
from bs4 import BeautifulSoup
# Initialize Claude with structured outputs
client = instructor.from_anthropic(anthropic.Anthropic())
# Define our tools as Pydantic models
class SearchQuery(BaseModel):
query: str
class WebPage(BaseModel):
url: str
title: str
content: str
class ResearchSummary(BaseModel):
topic: str
key_findings: list[str]
sources: list[str]
summary: str
class Agent:
def __init__(self):
self.model = "claude-sonnet-4-20250514"
self.max_iterations = 5
def search_web(self, query: str) -> list[dict]:
"""Simulate web search - replace with real API"""
# In production: use Brave Search, SerpAPI, or similar
print(f"🔍 Searching: {query}")
# Placeholder - implement real search
return [
{"title": "Example Result", "url": "https://example.com", "snippet": "..."}
]
def fetch_page(self, url: str) -> str:
"""Fetch and extract text from a webpage"""
print(f"📄 Fetching: {url}")
try:
resp = httpx.get(url, timeout=10, follow_redirects=True)
soup = BeautifulSoup(resp.text, 'html.parser')
# Remove scripts and styles
for tag in soup(['script', 'style', 'nav', 'footer']):
tag.decompose()
return soup.get_text(separator='\n', strip=True)[:8000]
except Exception as e:
return f"Error fetching: {e}"
def run(self, topic: str) -> ResearchSummary:
"""Main agent loop"""
context = f"Research topic: {topic}\n\n"
for i in range(self.max_iterations):
print(f"\n--- Iteration {i+1} ---")
# Ask Claude what to do next
response = client.messages.create(
model=self.model,
max_tokens=1024,
messages=[{
"role": "user",
"content": f"""You are a research agent. Your task is to research a topic and provide a comprehensive summary.
Current context:
{context}
Available actions:
1. SEARCH: <query> - Search the web
2. FETCH: <url> - Read a webpage
3. DONE - Finish research and provide summary
What's your next action? Respond with just the action."""
}]
)
action = response.content[0].text.strip()
if action.startswith("SEARCH:"):
query = action.replace("SEARCH:", "").strip()
results = self.search_web(query)
context += f"\nSearch results for '{query}':\n{results}\n"
elif action.startswith("FETCH:"):
url = action.replace("FETCH:", "").strip()
content = self.fetch_page(url)
context += f"\nContent from {url}:\n{content[:2000]}...\n"
elif "DONE" in action:
# Generate final summary
summary = client.messages.create(
model=self.model,
max_tokens=2048,
response_model=ResearchSummary,
messages=[{
"role": "user",
"content": f"""Based on this research, provide a structured summary:
{context}
Create a ResearchSummary with key findings, sources, and a comprehensive summary."""
}]
)
return summary
raise Exception("Max iterations reached")
if __name__ == "__main__":
agent = Agent()
result = agent.run("Latest developments in AI agents 2024")
print(f"\n{'='*50}")
print(f"Topic: {result.topic}")
print(f"\nKey Findings:")
for finding in result.key_findings:
print(f" • {finding}")
print(f"\nSummary:\n{result.summary}")
Step 3: Add Real Search
Replace the placeholder search with Brave Search API:
def search_web(self, query: str) -> list[dict]:
"""Search using Brave Search API"""
resp = httpx.get(
"https://api.search.brave.com/res/v1/web/search",
headers={"X-Subscription-Token": os.environ["BRAVE_API_KEY"]},
params={"q": query, "count": 5}
)
results = resp.json().get("web", {}).get("results", [])
return [
{"title": r["title"], "url": r["url"], "snippet": r.get("description", "")}
for r in results
]
Step 4: Add Memory (Optional)
For agents that need to remember across sessions:
import chromadb
from datetime import datetime
class Memory:
def __init__(self, collection_name: str = "agent_memory"):
self.client = chromadb.PersistentClient(path="./memory_db")
self.collection = self.client.get_or_create_collection(collection_name)
def store(self, content: str, metadata: dict = None):
"""Store a memory"""
self.collection.add(
documents=[content],
metadatas=[{**(metadata or {}), "timestamp": datetime.now().isoformat()}],
ids=[f"mem_{datetime.now().timestamp()}"]
)
def recall(self, query: str, n_results: int = 5) -> list[str]:
"""Recall relevant memories"""
results = self.collection.query(query_texts=[query], n_results=n_results)
return results["documents"][0] if results["documents"] else []
Step 5: Run It
export ANTHROPIC_API_KEY="your-key"
export BRAVE_API_KEY="your-key" # Get from brave.com/search/api
python agent.py
Best Practices
What I've learned the hard way.
1. Start Small, Iterate Fast
Don't architect a 10-agent system on day one. Build the simplest thing that works, then add complexity when you hit real limitations.
Week 1: Single agent, 2-3 tools, basic prompt
Week 2: Add error handling, logging, edge cases
Week 3: Add memory, multi-step workflows, production hardening
2. Structured Outputs Are Non-Negotiable
Don't parse free-form LLM responses with regex. Use structured outputs:
# Bad - fragile
response = llm("Give me the name and email")
name = response.split("Name:")[1].split("\n")[0] # 💀
# Good - reliable
class Contact(BaseModel):
name: str
email: str
contact = client.create(response_model=Contact, ...) # ✅
Instructor, OpenAI function calling, or Claude's tool use. Pick one. Use it everywhere.
3. Logging Everything
You will debug agent runs. Make it easy:
import json
from datetime import datetime
class AgentLogger:
def __init__(self, run_id: str):
self.run_id = run_id
self.log_file = f"logs/{run_id}.jsonl"
def log(self, event_type: str, data: dict):
with open(self.log_file, "a") as f:
f.write(json.dumps({
"timestamp": datetime.now().isoformat(),
"type": event_type,
**data
}) + "\n")
Log: LLM calls, tool executions, decisions, errors. Everything.
4. Cost Control
Agents can get expensive fast. Build in controls:
class CostTracker:
def __init__(self, max_cost: float = 1.0):
self.max_cost = max_cost
self.current_cost = 0.0
def add(self, input_tokens: int, output_tokens: int, model: str):
# Approximate costs
rates = {
"claude-sonnet-4-20250514": (0.003, 0.015), # per 1k tokens
"claude-opus-4-20250514": (0.015, 0.075),
}
in_rate, out_rate = rates.get(model, (0.01, 0.03))
cost = (input_tokens * in_rate + output_tokens * out_rate) / 1000
self.current_cost += cost
if self.current_cost > self.max_cost:
raise Exception(f"Cost limit exceeded: ${self.current_cost:.2f}")
5. Human-in-the-Loop for High Stakes
For anything important, add confirmation points:
def execute_action(self, action: str, auto_approve: bool = False):
if action.startswith("SEND_EMAIL"):
if not auto_approve:
print(f"Proposed action: {action}")
if input("Approve? (y/n): ").lower() != "y":
return "Action cancelled by user"
# Execute action...
Start with human approval, remove it once you trust the agent.
6. Fail Gracefully
Agents will fail. Plan for it:
def safe_tool_call(self, tool: str, args: dict, retries: int = 3):
for attempt in range(retries):
try:
return self.tools[tool](**args)
except Exception as e:
self.logger.log("tool_error", {"tool": tool, "error": str(e), "attempt": attempt})
if attempt == retries - 1:
return {"error": str(e), "fallback": True}
time.sleep(2 ** attempt) # Exponential backoff
Common Pitfalls
❌ Prompt Stuffing
Don't put everything in the system prompt. It doesn't scale.
# Bad - 5000 token system prompt with every possible instruction
system = """You are an assistant that does X, Y, Z, and also A, B, C...
Here are 47 rules to follow...
Here are 23 examples..."""
# Good - focused prompts, context injected as needed
system = "You are a research assistant. Focus on accuracy and sourcing."
# Add specific context per-message
❌ Infinite Loops
Always have escape hatches:
# Bad
while not done:
result = agent.step()
# Good
for _ in range(max_iterations):
result = agent.step()
if result.done:
break
else:
raise TimeoutError("Max iterations reached")
❌ No Observability
If you can't see what the agent is doing, you can't debug it.
Build a simple dashboard or use tools like LangSmith, Weights & Biases, or even just log files you can grep.
❌ Over-Engineering Early
Don't build:
- Complex routing before you have 2 agents
- Vector memory before you need persistence
- Evaluation frameworks before you have users
Ship first. Abstract later.
❌ Ignoring Token Limits
Context windows aren't infinite. Plan for it:
def truncate_context(self, context: str, max_tokens: int = 100000):
# Rough estimation: 1 token ≈ 4 characters
max_chars = max_tokens * 4
if len(context) > max_chars:
# Keep most recent context
return "...[truncated]...\n" + context[-max_chars:]
return context
What I'm Running Right Now
Real examples from my setup:
-
Personal AI (OpenClaw/Kai) - Multi-modal agent with 30+ tools. Handles email, calendar, research, coding, home automation. Runs 24/7 with heartbeat polling.
-
Lead Research Agent - Takes company domains, researches them, scores fit, writes personalized outreach. Saves ~10 hours/week.
-
Content Pipeline - Research → Draft → Edit → Format → Schedule. Handles blog posts, social content, newsletters.
-
Client Intake Bot - Qualifies leads via chat, books meetings, syncs to CRM. Replaced a VA.
All built incrementally. Started simple. Added complexity when needed.
Where to Go From Here
- Build the research agent from this guide. Get it working.
- Swap in a real use case - what repetitive task do YOU want automated?
- Add one tool at a time - web search, then scraping, then whatever you need
- Ship it internally - use it yourself for 2 weeks
- Then think about scaling - not before
If you want a structured path to learn to build AI products with agents, the AI Product Building Course pairs well with this guide.
Stop reading. Start building.
Questions? Find me on Twitter @amirbrooks or check my other guides.
Related Content
Related Guides
Related Stories
- How I Built 14 AI Agents — From one to a team
- Running 15 AI Agents Daily — Architecture and costs
- MCP Explained in 10 Minutes — The tool layer
FAQ
What are AI agents in product development?
They are workflow systems that plan, act, and use tools to complete tasks. In products, they automate repeatable steps and reduce manual effort.
Do agents replace manual coding?
No. Use agents for repetitive or multi-step work, and manual coding for precise changes and debugging.
Which is better: Claude Code vs ChatGPT for coding?
Claude Code is better for fast, multi-file edits and repeatable workflows. ChatGPT-style chat is better for ideation and quick explanations.
Download the AI Agent Implementation Checklist
A 50-point checklist covering architecture, tool design, safety guardrails, and deployment—everything from this guide in an actionable format.
- Step-by-step setup workflow
- Tool integration templates
- Safety and guardrail checklist
- Testing and monitoring guidelines
Instant access. No spam, ever. Unsubscribe anytime.