AIGuides

The Builder's Guide to AI Agents

Q: What is an AI agent?

An AI agent is software that can understand what you want, plan how to do it, use tools to get it done, and iterate until it's right. It combines LLMs with tools and a feedback loop to accomplish tasks autonomously.

Q: How long does it take to build an AI agent?

With the right approach and stack, you can ship a useful AI agent in 2-3 weeks. Simple single-agent systems can be built in a weekend, while more complex multi-agent systems take longer.

Q: What's the best LLM for building AI agents?

Claude (Anthropic) excels at reasoning and following complex instructions, making it ideal for agents. GPT-4 is versatile with the largest ecosystem. For most builders, starting with Claude for complex tasks and GPT-4 for general purposes works well.

Q: Do I need to know how to code to build an AI agent?

Basic coding knowledge helps, but you can start with no-code tools like Zapier or Make for simple agents. For production-quality agents, familiarity with Python or TypeScript and API integration is recommended.

Q: What are the common mistakes when building AI agents?

The biggest mistakes are: building an agent when a simple script would do, expecting 100% accuracy, not setting clear success criteria, and skipping proper error handling and guardrails.

Builder-focused guide to AI agents: patterns, stacks, guardrails, and a setup workflow to ship in 2-3 weeks.

January 31, 202613 min read

The Builder's Guide to AI Agents

Stop reading tutorials. Start shipping agents.

I've built dozens of AI agents over the past year—from simple chatbots to complex multi-agent systems that run entire business workflows. Here’s what I wish someone told me on day one. For a broader shipping playbook, pair this with How to Ship AI Products Fast and the Moltbot Bridge Guide.

Official docs to bookmark

Agent workflow loop: input, plan, tool use, output

Multi-agent dashboard with tasks and handoffs

No theory. No "maybe one day" speculation. Just what works, what doesn't, and how to ship something useful in 2-3 weeks.

What Are AI Agents (And Why Should You Care?)

An AI agent is software that can:

Understand what you want
Plan how to do it
Use tools to get it done
Iterate until it's right

That's it. Not magic. Not AGI. Just LLMs + tools + a loop.

Why Build Them?

Because you're probably doing repetitive knowledge work that could be automated:

Researching leads and writing personalized outreach
Processing documents and extracting structured data
Managing customer conversations across channels
Generating reports from multiple data sources
Coordinating between different services and APIs

I replaced a 4-hour weekly research task with an agent that runs in 20 minutes. That's 16 hours/month back. Multiply that across your business.

When NOT to Build Agents

Real talk: agents aren't always the answer.

❌ Don't build an agent when:

A simple script would do (if-then-else handles it)
The task requires 100% accuracy (agents make mistakes)
You don't have clear success criteria
The cost per task doesn't justify the build time

✅ Build an agent when:

Tasks require judgment and adaptation
The workflow has multiple steps with branching logic
You need to handle edge cases gracefully
Scaling humans isn't economical

Architecture Patterns

I've used three main patterns. Pick based on your complexity.

Pattern 1: Single Agent with Tools

The workhorse. One agent, multiple tools, handles most use cases.

┌─────────────────────────────────────────┐
│                 AGENT                    │
│  ┌─────────────────────────────────┐    │
│  │         LLM (Claude/GPT)        │    │
│  └─────────────────────────────────┘    │
│                  │                       │
│    ┌─────────────┼─────────────┐        │
│    ▼             ▼             ▼        │
│ ┌──────┐    ┌──────┐    ┌──────┐       │
│ │Search│    │ Code │    │ API  │       │
│ │ Tool │    │ Exec │    │ Call │       │
│ └──────┘    └──────┘    └──────┘       │
└─────────────────────────────────────────┘

When to use: 80% of cases. Start here.

Example: A research agent that:

Takes a company name
Searches the web for information
Reads their website
Checks LinkedIn
Writes a summary

One agent, four tools, done.

Pattern 2: Multi-Agent Orchestration

When one agent isn't enough. Specialized agents coordinated by a router.

┌─────────────────────────────────────────────────────┐
│                   ORCHESTRATOR                       │
│  ┌───────────────────────────────────────────────┐  │
│  │              Router / Planner                  │  │
│  └───────────────────────────────────────────────┘  │
│         │              │              │              │
│         ▼              ▼              ▼              │
│   ┌──────────┐  ┌──────────┐  ┌──────────┐         │
│   │ Research │  │  Writer  │  │ Reviewer │         │
│   │  Agent   │  │  Agent   │  │  Agent   │         │
│   └──────────┘  └──────────┘  └──────────┘         │
└─────────────────────────────────────────────────────┘

When to use:

Different sub-tasks need different prompts and models
You want agents to check each other's work
The workflow is complex enough that one prompt can't handle it

Example: Content pipeline:

Research Agent → gathers sources, facts, data
Writer Agent → drafts the content
Editor Agent → reviews, suggests improvements
Publisher Agent → formats and posts

Each agent has a focused role. Way cleaner than one mega-prompt trying to do everything.

Pattern 3: Autonomous Loop with Memory

For ongoing tasks that persist across sessions.

┌─────────────────────────────────────────────────────┐
│                                                      │
│    ┌──────────┐     ┌──────────┐     ┌──────────┐  │
│    │  Input   │────▶│  Agent   │────▶│  Output  │  │
│    └──────────┘     └────┬─────┘     └──────────┘  │
│                          │                          │
│                    ┌─────▼─────┐                    │
│                    │  Memory   │                    │
│                    │ (Vector + │                    │
│                    │   Files)  │                    │
│                    └───────────┘                    │
│                                                      │
│    ◀─────────── Feedback Loop ───────────▶         │
│                                                      │
└─────────────────────────────────────────────────────┘

When to use:

Long-running assistants (like personal AI)
Tasks that require learning from past interactions
Workflows that span days or weeks

Example: A customer success agent that:

Remembers all past conversations with each customer
Learns their preferences over time
Proactively reaches out based on patterns
Gets better the more it's used

Tech Stack Options

Real options I've used. Not a feature comparison—just what works.

LLM Providers

Provider	Best For	Cost	Notes
Claude (Anthropic)	Complex reasoning, long context, coding	Medium	My go-to for most agents. Sonnet for speed, Opus for quality.
GPT-4 (OpenAI)	General purpose, wide ecosystem	Medium-High	Good all-rounder, extensive function calling support
GPT-4o-mini	High volume, cost-sensitive	Low	Great for simple tasks, fast
Gemini (Google)	Long context, multimodal	Medium	1M+ token context is insane for document processing
Local (Llama, Mistral)	Privacy, no API costs	Time + Hardware	Viable for simpler tasks, not there yet for complex agents

My recommendation: Start with Claude Sonnet 3.5. Best balance of capability, cost, and speed. Upgrade to Opus when you need more reasoning power.

Claude Code vs ChatGPT for coding

If you're building agents, Claude Code is best when you need fast, file-level changes across a repo and repeatable workflows. ChatGPT-style chat is great for brainstorming, but it is slower for multi-file edits. My rule: use Claude Code to build and refactor, use chat for ideation and review.

Frameworks

Framework	Best For	Learning Curve
LangChain	Everything (but complex)	High
LangGraph	Stateful multi-agent	High
CrewAI	Multi-agent teams	Medium
AutoGen	Research/experimentation	Medium
Instructor	Structured outputs	Low
Raw API calls	Simple agents, full control	Low

My recommendation:

For simple agents (Pattern 1): Raw API calls + Instructor. No framework overhead. You understand every line of code.

For multi-agent (Pattern 2-3): LangGraph if you need complex state management, CrewAI if you want faster setup.

Vector Databases (For Memory)

Database	Best For	Self-hosted?
Pinecone	Production, managed	No
Chroma	Local dev, simple	Yes
Weaviate	Hybrid search	Yes
Qdrant	Performance	Yes
pgvector	Already using Postgres	Yes

My recommendation: Start with Chroma locally. Move to Pinecone or pgvector for production.

Step-by-Step Setup Guide

If you're following the course curriculum, this aligns with Module 2: First Agent.

Let's build a useful agent in under an hour.

Project: Research Agent

An agent that takes a topic, searches the web, and writes a summary with sources.

Step 1: Environment Setup

# Create project
mkdir research-agent && cd research-agent
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install anthropic instructor httpx beautifulsoup4 rich

Step 2: Basic Agent Structure

Create agent.py:

import anthropic
import instructor
from pydantic import BaseModel
from typing import Optional
import httpx
from bs4 import BeautifulSoup

# Initialize Claude with structured outputs
client = instructor.from_anthropic(anthropic.Anthropic())

# Define our tools as Pydantic models
class SearchQuery(BaseModel):
    query: str
    
class WebPage(BaseModel):
    url: str
    title: str
    content: str

class ResearchSummary(BaseModel):
    topic: str
    key_findings: list[str]
    sources: list[str]
    summary: str


class Agent:
    def __init__(self):
        self.model = "claude-sonnet-4-20250514"
        self.max_iterations = 5
        
    def search_web(self, query: str) -> list[dict]:
        """Simulate web search - replace with real API"""
        # In production: use Brave Search, SerpAPI, or similar
        print(f"🔍 Searching: {query}")
        # Placeholder - implement real search
        return [
            {"title": "Example Result", "url": "https://example.com", "snippet": "..."}
        ]
    
    def fetch_page(self, url: str) -> str:
        """Fetch and extract text from a webpage"""
        print(f"📄 Fetching: {url}")
        try:
            resp = httpx.get(url, timeout=10, follow_redirects=True)
            soup = BeautifulSoup(resp.text, 'html.parser')
            # Remove scripts and styles
            for tag in soup(['script', 'style', 'nav', 'footer']):
                tag.decompose()
            return soup.get_text(separator='\n', strip=True)[:8000]
        except Exception as e:
            return f"Error fetching: {e}"
    
    def run(self, topic: str) -> ResearchSummary:
        """Main agent loop"""
        context = f"Research topic: {topic}\n\n"
        
        for i in range(self.max_iterations):
            print(f"\n--- Iteration {i+1} ---")
            
            # Ask Claude what to do next
            response = client.messages.create(
                model=self.model,
                max_tokens=1024,
                messages=[{
                    "role": "user",
                    "content": f"""You are a research agent. Your task is to research a topic and provide a comprehensive summary.

Current context:
{context}

Available actions:
1. SEARCH: <query> - Search the web
2. FETCH: <url> - Read a webpage
3. DONE - Finish research and provide summary

What's your next action? Respond with just the action."""
                }]
            )
            
            action = response.content[0].text.strip()
            
            if action.startswith("SEARCH:"):
                query = action.replace("SEARCH:", "").strip()
                results = self.search_web(query)
                context += f"\nSearch results for '{query}':\n{results}\n"
                
            elif action.startswith("FETCH:"):
                url = action.replace("FETCH:", "").strip()
                content = self.fetch_page(url)
                context += f"\nContent from {url}:\n{content[:2000]}...\n"
                
            elif "DONE" in action:
                # Generate final summary
                summary = client.messages.create(
                    model=self.model,
                    max_tokens=2048,
                    response_model=ResearchSummary,
                    messages=[{
                        "role": "user",
                        "content": f"""Based on this research, provide a structured summary:

{context}

Create a ResearchSummary with key findings, sources, and a comprehensive summary."""
                    }]
                )
                return summary
        
        raise Exception("Max iterations reached")


if __name__ == "__main__":
    agent = Agent()
    result = agent.run("Latest developments in AI agents 2024")
    print(f"\n{'='*50}")
    print(f"Topic: {result.topic}")
    print(f"\nKey Findings:")
    for finding in result.key_findings:
        print(f"  • {finding}")
    print(f"\nSummary:\n{result.summary}")

Step 3: Add Real Search

Replace the placeholder search with Brave Search API:

def search_web(self, query: str) -> list[dict]:
    """Search using Brave Search API"""
    resp = httpx.get(
        "https://api.search.brave.com/res/v1/web/search",
        headers={"X-Subscription-Token": os.environ["BRAVE_API_KEY"]},
        params={"q": query, "count": 5}
    )
    results = resp.json().get("web", {}).get("results", [])
    return [
        {"title": r["title"], "url": r["url"], "snippet": r.get("description", "")}
        for r in results
    ]

Step 4: Add Memory (Optional)

For agents that need to remember across sessions:

import chromadb
from datetime import datetime

class Memory:
    def __init__(self, collection_name: str = "agent_memory"):
        self.client = chromadb.PersistentClient(path="./memory_db")
        self.collection = self.client.get_or_create_collection(collection_name)
    
    def store(self, content: str, metadata: dict = None):
        """Store a memory"""
        self.collection.add(
            documents=[content],
            metadatas=[{**(metadata or {}), "timestamp": datetime.now().isoformat()}],
            ids=[f"mem_{datetime.now().timestamp()}"]
        )
    
    def recall(self, query: str, n_results: int = 5) -> list[str]:
        """Recall relevant memories"""
        results = self.collection.query(query_texts=[query], n_results=n_results)
        return results["documents"][0] if results["documents"] else []

Step 5: Run It

export ANTHROPIC_API_KEY="your-key"
export BRAVE_API_KEY="your-key"  # Get from brave.com/search/api

python agent.py

Best Practices

What I've learned the hard way.

1. Start Small, Iterate Fast

Don't architect a 10-agent system on day one. Build the simplest thing that works, then add complexity when you hit real limitations.

Week 1: Single agent, 2-3 tools, basic prompt
Week 2: Add error handling, logging, edge cases
Week 3: Add memory, multi-step workflows, production hardening

2. Structured Outputs Are Non-Negotiable

Don't parse free-form LLM responses with regex. Use structured outputs:

# Bad - fragile
response = llm("Give me the name and email")
name = response.split("Name:")[1].split("\n")[0]  # 💀

# Good - reliable
class Contact(BaseModel):
    name: str
    email: str
    
contact = client.create(response_model=Contact, ...)  # ✅

Instructor, OpenAI function calling, or Claude's tool use. Pick one. Use it everywhere.

3. Logging Everything

You will debug agent runs. Make it easy:

import json
from datetime import datetime

class AgentLogger:
    def __init__(self, run_id: str):
        self.run_id = run_id
        self.log_file = f"logs/{run_id}.jsonl"
        
    def log(self, event_type: str, data: dict):
        with open(self.log_file, "a") as f:
            f.write(json.dumps({
                "timestamp": datetime.now().isoformat(),
                "type": event_type,
                **data
            }) + "\n")

Log: LLM calls, tool executions, decisions, errors. Everything.

4. Cost Control

Agents can get expensive fast. Build in controls:

class CostTracker:
    def __init__(self, max_cost: float = 1.0):
        self.max_cost = max_cost
        self.current_cost = 0.0
        
    def add(self, input_tokens: int, output_tokens: int, model: str):
        # Approximate costs
        rates = {
            "claude-sonnet-4-20250514": (0.003, 0.015),  # per 1k tokens
            "claude-opus-4-20250514": (0.015, 0.075),
        }
        in_rate, out_rate = rates.get(model, (0.01, 0.03))
        cost = (input_tokens * in_rate + output_tokens * out_rate) / 1000
        self.current_cost += cost
        
        if self.current_cost > self.max_cost:
            raise Exception(f"Cost limit exceeded: ${self.current_cost:.2f}")

5. Human-in-the-Loop for High Stakes

For anything important, add confirmation points:

def execute_action(self, action: str, auto_approve: bool = False):
    if action.startswith("SEND_EMAIL"):
        if not auto_approve:
            print(f"Proposed action: {action}")
            if input("Approve? (y/n): ").lower() != "y":
                return "Action cancelled by user"
    # Execute action...

Start with human approval, remove it once you trust the agent.

6. Fail Gracefully

Agents will fail. Plan for it:

def safe_tool_call(self, tool: str, args: dict, retries: int = 3):
    for attempt in range(retries):
        try:
            return self.tools[tool](**args)
        except Exception as e:
            self.logger.log("tool_error", {"tool": tool, "error": str(e), "attempt": attempt})
            if attempt == retries - 1:
                return {"error": str(e), "fallback": True}
            time.sleep(2 ** attempt)  # Exponential backoff

Common Pitfalls

❌ Prompt Stuffing

Don't put everything in the system prompt. It doesn't scale.

# Bad - 5000 token system prompt with every possible instruction
system = """You are an assistant that does X, Y, Z, and also A, B, C...
Here are 47 rules to follow...
Here are 23 examples..."""

# Good - focused prompts, context injected as needed
system = "You are a research assistant. Focus on accuracy and sourcing."
# Add specific context per-message

❌ Infinite Loops

Always have escape hatches:

# Bad
while not done:
    result = agent.step()
    
# Good
for _ in range(max_iterations):
    result = agent.step()
    if result.done:
        break
else:
    raise TimeoutError("Max iterations reached")

❌ No Observability

If you can't see what the agent is doing, you can't debug it.

Build a simple dashboard or use tools like LangSmith, Weights & Biases, or even just log files you can grep.

❌ Over-Engineering Early

Don't build:

Complex routing before you have 2 agents
Vector memory before you need persistence
Evaluation frameworks before you have users

Ship first. Abstract later.

❌ Ignoring Token Limits

Context windows aren't infinite. Plan for it:

def truncate_context(self, context: str, max_tokens: int = 100000):
    # Rough estimation: 1 token ≈ 4 characters
    max_chars = max_tokens * 4
    if len(context) > max_chars:
        # Keep most recent context
        return "...[truncated]...\n" + context[-max_chars:]
    return context

What I'm Running Right Now

Real examples from my setup:

Personal AI (OpenClaw/Kai) - Multi-modal agent with 30+ tools. Handles email, calendar, research, coding, home automation. Runs 24/7 with heartbeat polling.
Lead Research Agent - Takes company domains, researches them, scores fit, writes personalized outreach. Saves ~10 hours/week.
Content Pipeline - Research → Draft → Edit → Format → Schedule. Handles blog posts, social content, newsletters.
Client Intake Bot - Qualifies leads via chat, books meetings, syncs to CRM. Replaced a VA.

All built incrementally. Started simple. Added complexity when needed.

Where to Go From Here

Build the research agent from this guide. Get it working.
Swap in a real use case - what repetitive task do YOU want automated?
Add one tool at a time - web search, then scraping, then whatever you need
Ship it internally - use it yourself for 2 weeks
Then think about scaling - not before

If you want a structured path to learn to build AI products with agents, the AI Product Building Course pairs well with this guide.

Stop reading. Start building.

Questions? Find me on Twitter @amirbrooks or check my other guides.

How I Built 14 AI Agents — From one to a team
Running 15 AI Agents Daily — Architecture and costs
MCP Explained in 10 Minutes — The tool layer

FAQ

What are AI agents in product development?

They are workflow systems that plan, act, and use tools to complete tasks. In products, they automate repeatable steps and reduce manual effort.

Do agents replace manual coding?

No. Use agents for repetitive or multi-step work, and manual coding for precise changes and debugging.

Which is better: Claude Code vs ChatGPT for coding?

Claude Code is better for fast, multi-file edits and repeatable workflows. ChatGPT-style chat is better for ideation and quick explanations.

Free Resource

Download the AI Agent Implementation Checklist

A 50-point checklist covering architecture, tool design, safety guardrails, and deployment—everything from this guide in an actionable format.

Step-by-step setup workflow
Tool integration templates
Safety and guardrail checklist
Testing and monitoring guidelines

Instant access. No spam, ever. Unsubscribe anytime.

Related Guides

How to Build AI Products Without Code in 2026: Complete Guide

Build AI products without code in 2026 with a practical stack, step-by-step workflow, and a real example. Learn how founders launch fast without engineering.

AFL in Melbourne 2026: The Complete First-Timer's Guide

Complete first-timer's guide to AFL in Melbourne 2026. Learn the rules, buy tickets, experience the MCG, and discover why Australian Rules Football is a Melbourne obsession.

AI Agents for Solo Founders: Build a 24/7 Team Without Hiring

Learn how solo founders use AI agents to automate workflows, reduce context switching, and ship faster with safe, practical systems.

AIGuides

The Builder's Guide to AI Agents

Builder-focused guide to AI agents: patterns, stacks, guardrails, and a setup workflow to ship in 2-3 weeks.

January 31, 202613 min read

The Builder's Guide to AI Agents

Stop reading tutorials. Start shipping agents.

Official docs to bookmark

Agent workflow loop: input, plan, tool use, output

Multi-agent dashboard with tasks and handoffs

No theory. No "maybe one day" speculation. Just what works, what doesn't, and how to ship something useful in 2-3 weeks.

What Are AI Agents (And Why Should You Care?)

An AI agent is software that can:

Understand what you want
Plan how to do it
Use tools to get it done
Iterate until it's right

That's it. Not magic. Not AGI. Just LLMs + tools + a loop.

Why Build Them?

Because you're probably doing repetitive knowledge work that could be automated:

Researching leads and writing personalized outreach
Processing documents and extracting structured data
Managing customer conversations across channels
Generating reports from multiple data sources
Coordinating between different services and APIs

I replaced a 4-hour weekly research task with an agent that runs in 20 minutes. That's 16 hours/month back. Multiply that across your business.

When NOT to Build Agents

Real talk: agents aren't always the answer.

❌ Don't build an agent when:

A simple script would do (if-then-else handles it)
The task requires 100% accuracy (agents make mistakes)
You don't have clear success criteria
The cost per task doesn't justify the build time

✅ Build an agent when:

Tasks require judgment and adaptation
The workflow has multiple steps with branching logic
You need to handle edge cases gracefully
Scaling humans isn't economical

Architecture Patterns

I've used three main patterns. Pick based on your complexity.

Pattern 1: Single Agent with Tools

The workhorse. One agent, multiple tools, handles most use cases.

┌─────────────────────────────────────────┐
│                 AGENT                    │
│  ┌─────────────────────────────────┐    │
│  │         LLM (Claude/GPT)        │    │
│  └─────────────────────────────────┘    │
│                  │                       │
│    ┌─────────────┼─────────────┐        │
│    ▼             ▼             ▼        │
│ ┌──────┐    ┌──────┐    ┌──────┐       │
│ │Search│    │ Code │    │ API  │       │
│ │ Tool │    │ Exec │    │ Call │       │
│ └──────┘    └──────┘    └──────┘       │
└─────────────────────────────────────────┘

When to use: 80% of cases. Start here.

Example: A research agent that:

Takes a company name
Searches the web for information
Reads their website
Checks LinkedIn
Writes a summary

One agent, four tools, done.

Pattern 2: Multi-Agent Orchestration

When one agent isn't enough. Specialized agents coordinated by a router.

┌─────────────────────────────────────────────────────┐
│                   ORCHESTRATOR                       │
│  ┌───────────────────────────────────────────────┐  │
│  │              Router / Planner                  │  │
│  └───────────────────────────────────────────────┘  │
│         │              │              │              │
│         ▼              ▼              ▼              │
│   ┌──────────┐  ┌──────────┐  ┌──────────┐         │
│   │ Research │  │  Writer  │  │ Reviewer │         │
│   │  Agent   │  │  Agent   │  │  Agent   │         │
│   └──────────┘  └──────────┘  └──────────┘         │
└─────────────────────────────────────────────────────┘

When to use:

Different sub-tasks need different prompts and models
You want agents to check each other's work
The workflow is complex enough that one prompt can't handle it

Example: Content pipeline:

Research Agent → gathers sources, facts, data
Writer Agent → drafts the content
Editor Agent → reviews, suggests improvements
Publisher Agent → formats and posts

Each agent has a focused role. Way cleaner than one mega-prompt trying to do everything.

Pattern 3: Autonomous Loop with Memory

For ongoing tasks that persist across sessions.

┌─────────────────────────────────────────────────────┐
│                                                      │
│    ┌──────────┐     ┌──────────┐     ┌──────────┐  │
│    │  Input   │────▶│  Agent   │────▶│  Output  │  │
│    └──────────┘     └────┬─────┘     └──────────┘  │
│                          │                          │
│                    ┌─────▼─────┐                    │
│                    │  Memory   │                    │
│                    │ (Vector + │                    │
│                    │   Files)  │                    │
│                    └───────────┘                    │
│                                                      │
│    ◀─────────── Feedback Loop ───────────▶         │
│                                                      │
└─────────────────────────────────────────────────────┘

When to use:

Long-running assistants (like personal AI)
Tasks that require learning from past interactions
Workflows that span days or weeks

Example: A customer success agent that:

Remembers all past conversations with each customer
Learns their preferences over time
Proactively reaches out based on patterns
Gets better the more it's used

Tech Stack Options

Real options I've used. Not a feature comparison—just what works.

LLM Providers

Provider	Best For	Cost	Notes
Claude (Anthropic)	Complex reasoning, long context, coding	Medium	My go-to for most agents. Sonnet for speed, Opus for quality.
GPT-4 (OpenAI)	General purpose, wide ecosystem	Medium-High	Good all-rounder, extensive function calling support
GPT-4o-mini	High volume, cost-sensitive	Low	Great for simple tasks, fast
Gemini (Google)	Long context, multimodal	Medium	1M+ token context is insane for document processing
Local (Llama, Mistral)	Privacy, no API costs	Time + Hardware	Viable for simpler tasks, not there yet for complex agents

My recommendation: Start with Claude Sonnet 3.5. Best balance of capability, cost, and speed. Upgrade to Opus when you need more reasoning power.

Claude Code vs ChatGPT for coding

Frameworks

Framework	Best For	Learning Curve
LangChain	Everything (but complex)	High
LangGraph	Stateful multi-agent	High
CrewAI	Multi-agent teams	Medium
AutoGen	Research/experimentation	Medium
Instructor	Structured outputs	Low
Raw API calls	Simple agents, full control	Low

My recommendation:

For simple agents (Pattern 1): Raw API calls + Instructor. No framework overhead. You understand every line of code.

For multi-agent (Pattern 2-3): LangGraph if you need complex state management, CrewAI if you want faster setup.

Vector Databases (For Memory)

Database	Best For	Self-hosted?
Pinecone	Production, managed	No
Chroma	Local dev, simple	Yes
Weaviate	Hybrid search	Yes
Qdrant	Performance	Yes
pgvector	Already using Postgres	Yes

My recommendation: Start with Chroma locally. Move to Pinecone or pgvector for production.

Step-by-Step Setup Guide

If you're following the course curriculum, this aligns with Module 2: First Agent.

Let's build a useful agent in under an hour.

Project: Research Agent

An agent that takes a topic, searches the web, and writes a summary with sources.

Step 1: Environment Setup

# Create project
mkdir research-agent && cd research-agent
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install anthropic instructor httpx beautifulsoup4 rich

Step 2: Basic Agent Structure

Create agent.py:

import anthropic
import instructor
from pydantic import BaseModel
from typing import Optional
import httpx
from bs4 import BeautifulSoup

# Initialize Claude with structured outputs
client = instructor.from_anthropic(anthropic.Anthropic())

# Define our tools as Pydantic models
class SearchQuery(BaseModel):
    query: str
    
class WebPage(BaseModel):
    url: str
    title: str
    content: str

class ResearchSummary(BaseModel):
    topic: str
    key_findings: list[str]
    sources: list[str]
    summary: str


class Agent:
    def __init__(self):
        self.model = "claude-sonnet-4-20250514"
        self.max_iterations = 5
        
    def search_web(self, query: str) -> list[dict]:
        """Simulate web search - replace with real API"""
        # In production: use Brave Search, SerpAPI, or similar
        print(f"🔍 Searching: {query}")
        # Placeholder - implement real search
        return [
            {"title": "Example Result", "url": "https://example.com", "snippet": "..."}
        ]
    
    def fetch_page(self, url: str) -> str:
        """Fetch and extract text from a webpage"""
        print(f"📄 Fetching: {url}")
        try:
            resp = httpx.get(url, timeout=10, follow_redirects=True)
            soup = BeautifulSoup(resp.text, 'html.parser')
            # Remove scripts and styles
            for tag in soup(['script', 'style', 'nav', 'footer']):
                tag.decompose()
            return soup.get_text(separator='\n', strip=True)[:8000]
        except Exception as e:
            return f"Error fetching: {e}"
    
    def run(self, topic: str) -> ResearchSummary:
        """Main agent loop"""
        context = f"Research topic: {topic}\n\n"
        
        for i in range(self.max_iterations):
            print(f"\n--- Iteration {i+1} ---")
            
            # Ask Claude what to do next
            response = client.messages.create(
                model=self.model,
                max_tokens=1024,
                messages=[{
                    "role": "user",
                    "content": f"""You are a research agent. Your task is to research a topic and provide a comprehensive summary.

Current context:
{context}

Available actions:
1. SEARCH: <query> - Search the web
2. FETCH: <url> - Read a webpage
3. DONE - Finish research and provide summary

What's your next action? Respond with just the action."""
                }]
            )
            
            action = response.content[0].text.strip()
            
            if action.startswith("SEARCH:"):
                query = action.replace("SEARCH:", "").strip()
                results = self.search_web(query)
                context += f"\nSearch results for '{query}':\n{results}\n"
                
            elif action.startswith("FETCH:"):
                url = action.replace("FETCH:", "").strip()
                content = self.fetch_page(url)
                context += f"\nContent from {url}:\n{content[:2000]}...\n"
                
            elif "DONE" in action:
                # Generate final summary
                summary = client.messages.create(
                    model=self.model,
                    max_tokens=2048,
                    response_model=ResearchSummary,
                    messages=[{
                        "role": "user",
                        "content": f"""Based on this research, provide a structured summary:

{context}

Create a ResearchSummary with key findings, sources, and a comprehensive summary."""
                    }]
                )
                return summary
        
        raise Exception("Max iterations reached")


if __name__ == "__main__":
    agent = Agent()
    result = agent.run("Latest developments in AI agents 2024")
    print(f"\n{'='*50}")
    print(f"Topic: {result.topic}")
    print(f"\nKey Findings:")
    for finding in result.key_findings:
        print(f"  • {finding}")
    print(f"\nSummary:\n{result.summary}")

Step 3: Add Real Search

Replace the placeholder search with Brave Search API:

def search_web(self, query: str) -> list[dict]:
    """Search using Brave Search API"""
    resp = httpx.get(
        "https://api.search.brave.com/res/v1/web/search",
        headers={"X-Subscription-Token": os.environ["BRAVE_API_KEY"]},
        params={"q": query, "count": 5}
    )
    results = resp.json().get("web", {}).get("results", [])
    return [
        {"title": r["title"], "url": r["url"], "snippet": r.get("description", "")}
        for r in results
    ]

Step 4: Add Memory (Optional)

For agents that need to remember across sessions:

import chromadb
from datetime import datetime

class Memory:
    def __init__(self, collection_name: str = "agent_memory"):
        self.client = chromadb.PersistentClient(path="./memory_db")
        self.collection = self.client.get_or_create_collection(collection_name)
    
    def store(self, content: str, metadata: dict = None):
        """Store a memory"""
        self.collection.add(
            documents=[content],
            metadatas=[{**(metadata or {}), "timestamp": datetime.now().isoformat()}],
            ids=[f"mem_{datetime.now().timestamp()}"]
        )
    
    def recall(self, query: str, n_results: int = 5) -> list[str]:
        """Recall relevant memories"""
        results = self.collection.query(query_texts=[query], n_results=n_results)
        return results["documents"][0] if results["documents"] else []

Step 5: Run It

export ANTHROPIC_API_KEY="your-key"
export BRAVE_API_KEY="your-key"  # Get from brave.com/search/api

python agent.py

Best Practices

What I've learned the hard way.

1. Start Small, Iterate Fast

Don't architect a 10-agent system on day one. Build the simplest thing that works, then add complexity when you hit real limitations.

Week 1: Single agent, 2-3 tools, basic prompt
Week 2: Add error handling, logging, edge cases
Week 3: Add memory, multi-step workflows, production hardening

2. Structured Outputs Are Non-Negotiable

Don't parse free-form LLM responses with regex. Use structured outputs:

# Bad - fragile
response = llm("Give me the name and email")
name = response.split("Name:")[1].split("\n")[0]  # 💀

# Good - reliable
class Contact(BaseModel):
    name: str
    email: str
    
contact = client.create(response_model=Contact, ...)  # ✅

Instructor, OpenAI function calling, or Claude's tool use. Pick one. Use it everywhere.

3. Logging Everything

You will debug agent runs. Make it easy:

import json
from datetime import datetime

class AgentLogger:
    def __init__(self, run_id: str):
        self.run_id = run_id
        self.log_file = f"logs/{run_id}.jsonl"
        
    def log(self, event_type: str, data: dict):
        with open(self.log_file, "a") as f:
            f.write(json.dumps({
                "timestamp": datetime.now().isoformat(),
                "type": event_type,
                **data
            }) + "\n")

Log: LLM calls, tool executions, decisions, errors. Everything.

4. Cost Control

Agents can get expensive fast. Build in controls:

class CostTracker:
    def __init__(self, max_cost: float = 1.0):
        self.max_cost = max_cost
        self.current_cost = 0.0
        
    def add(self, input_tokens: int, output_tokens: int, model: str):
        # Approximate costs
        rates = {
            "claude-sonnet-4-20250514": (0.003, 0.015),  # per 1k tokens
            "claude-opus-4-20250514": (0.015, 0.075),
        }
        in_rate, out_rate = rates.get(model, (0.01, 0.03))
        cost = (input_tokens * in_rate + output_tokens * out_rate) / 1000
        self.current_cost += cost
        
        if self.current_cost > self.max_cost:
            raise Exception(f"Cost limit exceeded: ${self.current_cost:.2f}")

5. Human-in-the-Loop for High Stakes

For anything important, add confirmation points:

def execute_action(self, action: str, auto_approve: bool = False):
    if action.startswith("SEND_EMAIL"):
        if not auto_approve:
            print(f"Proposed action: {action}")
            if input("Approve? (y/n): ").lower() != "y":
                return "Action cancelled by user"
    # Execute action...

Start with human approval, remove it once you trust the agent.

6. Fail Gracefully

Agents will fail. Plan for it:

def safe_tool_call(self, tool: str, args: dict, retries: int = 3):
    for attempt in range(retries):
        try:
            return self.tools[tool](**args)
        except Exception as e:
            self.logger.log("tool_error", {"tool": tool, "error": str(e), "attempt": attempt})
            if attempt == retries - 1:
                return {"error": str(e), "fallback": True}
            time.sleep(2 ** attempt)  # Exponential backoff

Common Pitfalls

❌ Prompt Stuffing

Don't put everything in the system prompt. It doesn't scale.

# Bad - 5000 token system prompt with every possible instruction
system = """You are an assistant that does X, Y, Z, and also A, B, C...
Here are 47 rules to follow...
Here are 23 examples..."""

# Good - focused prompts, context injected as needed
system = "You are a research assistant. Focus on accuracy and sourcing."
# Add specific context per-message

❌ Infinite Loops

Always have escape hatches:

# Bad
while not done:
    result = agent.step()
    
# Good
for _ in range(max_iterations):
    result = agent.step()
    if result.done:
        break
else:
    raise TimeoutError("Max iterations reached")

❌ No Observability

If you can't see what the agent is doing, you can't debug it.

Build a simple dashboard or use tools like LangSmith, Weights & Biases, or even just log files you can grep.

❌ Over-Engineering Early

Don't build:

Complex routing before you have 2 agents
Vector memory before you need persistence
Evaluation frameworks before you have users

Ship first. Abstract later.

❌ Ignoring Token Limits

Context windows aren't infinite. Plan for it:

def truncate_context(self, context: str, max_tokens: int = 100000):
    # Rough estimation: 1 token ≈ 4 characters
    max_chars = max_tokens * 4
    if len(context) > max_chars:
        # Keep most recent context
        return "...[truncated]...\n" + context[-max_chars:]
    return context

What I'm Running Right Now

Real examples from my setup:

Personal AI (OpenClaw/Kai) - Multi-modal agent with 30+ tools. Handles email, calendar, research, coding, home automation. Runs 24/7 with heartbeat polling.
Lead Research Agent - Takes company domains, researches them, scores fit, writes personalized outreach. Saves ~10 hours/week.
Content Pipeline - Research → Draft → Edit → Format → Schedule. Handles blog posts, social content, newsletters.
Client Intake Bot - Qualifies leads via chat, books meetings, syncs to CRM. Replaced a VA.

All built incrementally. Started simple. Added complexity when needed.

Where to Go From Here

Build the research agent from this guide. Get it working.
Swap in a real use case - what repetitive task do YOU want automated?
Add one tool at a time - web search, then scraping, then whatever you need
Ship it internally - use it yourself for 2 weeks
Then think about scaling - not before

If you want a structured path to learn to build AI products with agents, the AI Product Building Course pairs well with this guide.

Stop reading. Start building.

Questions? Find me on Twitter @amirbrooks or check my other guides.

How I Built 14 AI Agents — From one to a team
Running 15 AI Agents Daily — Architecture and costs
MCP Explained in 10 Minutes — The tool layer

FAQ

What are AI agents in product development?

They are workflow systems that plan, act, and use tools to complete tasks. In products, they automate repeatable steps and reduce manual effort.

Do agents replace manual coding?

No. Use agents for repetitive or multi-step work, and manual coding for precise changes and debugging.

Which is better: Claude Code vs ChatGPT for coding?

Claude Code is better for fast, multi-file edits and repeatable workflows. ChatGPT-style chat is better for ideation and quick explanations.

Free Resource

Download the AI Agent Implementation Checklist

A 50-point checklist covering architecture, tool design, safety guardrails, and deployment—everything from this guide in an actionable format.

Step-by-step setup workflow
Tool integration templates
Safety and guardrail checklist
Testing and monitoring guidelines

Instant access. No spam, ever. Unsubscribe anytime.

The Builder's Guide to AI Agents

Official docs to bookmark

What Are AI Agents (And Why Should You Care?)

Why Build Them?

When NOT to Build Agents

Architecture Patterns

Pattern 1: Single Agent with Tools

Pattern 2: Multi-Agent Orchestration

Pattern 3: Autonomous Loop with Memory

Tech Stack Options

LLM Providers

Claude Code vs ChatGPT for coding

Frameworks

Vector Databases (For Memory)

Step-by-Step Setup Guide

Project: Research Agent

Step 1: Environment Setup

Step 2: Basic Agent Structure

Step 3: Add Real Search

Step 4: Add Memory (Optional)

Step 5: Run It

Best Practices

1. Start Small, Iterate Fast

2. Structured Outputs Are Non-Negotiable

3. Logging Everything

4. Cost Control

5. Human-in-the-Loop for High Stakes

6. Fail Gracefully

Common Pitfalls

❌ Prompt Stuffing

❌ Infinite Loops

❌ No Observability

❌ Over-Engineering Early

❌ Ignoring Token Limits

What I'm Running Right Now

Where to Go From Here

Related Content

Related Guides

Related Stories

FAQ

What are AI agents in product development?

Do agents replace manual coding?

Which is better: Claude Code vs ChatGPT for coding?

Download the AI Agent Implementation Checklist

Related Guides

How to Build AI Products Without Code in 2026: Complete Guide

AFL in Melbourne 2026: The Complete First-Timer's Guide

AI Agents for Solo Founders: Build a 24/7 Team Without Hiring

The Builder's Guide to AI Agents

Official docs to bookmark

What Are AI Agents (And Why Should You Care?)

Why Build Them?

When NOT to Build Agents

Architecture Patterns

Pattern 1: Single Agent with Tools

Pattern 2: Multi-Agent Orchestration

Pattern 3: Autonomous Loop with Memory

Tech Stack Options

LLM Providers

Claude Code vs ChatGPT for coding

Frameworks

Vector Databases (For Memory)

Step-by-Step Setup Guide

Project: Research Agent

Step 1: Environment Setup

Step 2: Basic Agent Structure

Step 3: Add Real Search

Step 4: Add Memory (Optional)

Step 5: Run It

Best Practices

1. Start Small, Iterate Fast

2. Structured Outputs Are Non-Negotiable

3. Logging Everything

4. Cost Control

5. Human-in-the-Loop for High Stakes

6. Fail Gracefully

Common Pitfalls

❌ Prompt Stuffing

❌ Infinite Loops

❌ No Observability