Context Engineering for AI Agents
Context Engineering for AI Agents

Context Engineering for AI Agents
Technical Guide Context Engineering · 2025

Context
Engineering
for AI Agents

“Context Engineering is the discipline of designing, structuring, and managing the information environment that an AI agent reasons within — so it acts intelligently, reliably, and efficiently.”

Prompt Engineering told the model what to do. Context Engineering teaches it how to think — by precisely controlling everything that fills its context window: memories, tools, instructions, history, and live data. It is the craft behind every capable AI agent in 2025.

§ 01

The Three Pillars of Context

📥

Pillar One

What Goes In

Everything placed into the context window before the model reasons. This is your engineering surface.

  • System prompt & persona
  • Retrieved memories
  • Tool schemas & outputs
  • Conversation history
  • User input + files
  • Background knowledge
⚙️

Pillar Two

How It’s Structured

The arrangement, format, and priority of context elements. Order and structure dramatically affect model behavior.

  • XML / Markdown delimiters
  • Priority ordering (top = high weight)
  • Compression & summarization
  • Chunking strategies
  • Token budget management
  • Few-shot example placement
🔄

Pillar Three

How It Evolves

Context is dynamic. Good context engineering manages its lifecycle across turns, tasks, and time.

  • Memory write / read loops
  • Context window pruning
  • Re-injection strategies
  • State serialization
  • Long-horizon planning context
  • Forgetting & summarizing
§ 02

Anatomy of an Agent’s Context Window

Typical context window composition — a 200k token agent context

System Prompt
SYSTEM
~18%
Long-Term Memory
MEMORY
~22%
Tool Schemas
TOOLS
~10%
Conv. History
HISTORY
~25%
User Input + RAG
INPUT + RAG
~20%
Output Reserve
OUT
~5%
System Prompt Long-Term Memory Tool Schemas Conv. History User Input + RAG Output Reserve
§ 03

Context Engineering Lifecycle

The Full Context Engineering Loop — From User Input to Agent Response

USER Input / Task query, goal, files CONTEXT ASSEMBLY Context Builder System + Memory + Tools + History + RAG results MEMORY LAYER Long-Term Memory RAG / RETRIEVAL Knowledge Fetch REASONING ENGINE LLM Reads full context Plans & reasons Decides next action Need Tool? TOOL CALL Tool Execution TOOL RESULT Result → Context RESPONSE Final Output MEMORY Write / Update PRUNE Context Trim YES inject result back into context NO update memory for next turn

Memory Architecture — Four Types of Agent Memory

IN-CONTEXT Working Memory Current window contents Temporary · Fast · Limited EXTERNAL Vector / DB Memory Semantic search + retrieval Persistent · Scalable IN-WEIGHTS Fine-tune Memory Baked into model params Slow · Deep · Permanent IN-CACHE KV Cache Memory Prompt prefix caching Fast · Cost-efficient ← More Volatile                        More Persistent → Seconds → Hours Days → Months Permanent Session

Context Pruning Strategy — Managing the Token Budget

MONITOR Token Counter Track usage % SUMMARIZE Old History → Summary Compress old turns ARCHIVE Write to Memory DB Save key facts EVICT Remove Low-Priority Drop old tool results REBUILD Fresh Context Window Ready for next turn loop — continuously managed

“The bottleneck in agent performance is rarely the model. It is almost always the quality of information you put in front of it.”

— Core principle of Context Engineering
§ 04

Context Engineering in Practice

System Design

Customer Support Agent

Task: “I want to return the shoes I bought last week. They don’t fit.”

Engineered Context Window

[SYSTEM] You are a helpful support agent for ShoeStore. Be empathetic. Policy: 30-day free returns.

[MEMORY] User: Sarah Chen. Tier: Premium. Past issues: 2 resolved. Preferred size: US9.

[TOOLS] lookup_order(id), initiate_return(order_id, reason), check_inventory(sku)

[INPUT] “I want to return the shoes I bought last week…”
  • 1
    System prompt sets persona + policy constraints
  • 2
    Memory injects customer profile (no need to ask who they are)
  • 3
    Agent calls lookup_order → injects result into context
  • 4
    Calls initiate_return with correct order ID
  • 5
    Response + interaction saved back to memory store
Engineering

Autonomous Coding Agent

Task: “Add unit tests for the payment module and fix any bugs you find.”

Engineered Context Window

[SYSTEM] Senior Python engineer. Write pytest tests. Only edit files explicitly requested.

[MEMORY] Project: FastAPI app. DB: PostgreSQL. Pattern: Repository. Prior sessions: 3 bug fixes.

[FILES] payment.py (840 tokens), models.py (420 tokens) — retrieved via RAG

[TOOLS] read_file, write_file, run_tests, search_codebase
  • 1
    RAG retrieves only relevant files (not entire codebase)
  • 2
    Memory provides project conventions — no re-explaining needed
  • 3
    Agent reads, writes tests, calls run_tests
  • 4
    Test result injected back — agent iterates on failures
  • 5
    Session summary written to long-term memory
Research

Deep Research Agent

Task: “Write a competitive analysis of EV battery manufacturers.”

Engineered Context Window

[SYSTEM] Expert analyst. Cite sources. Structure: Executive summary → Details → Conclusion.

[PLAN] Step 1: identify players. Step 2: gather data. Step 3: synthesize.

[SCRATCHPAD] [running notes & intermediate findings kept here, pruned as context fills]

[SOURCES] Retrieved articles injected progressively via RAG
  • 1
    Plan injected upfront — agent follows structured reasoning
  • 2
    Scratchpad in context stores intermediate work
  • 3
    Sources retrieved progressively — old ones summarized & pruned
  • 4
    Context budget managed: 60% sources, 30% reasoning, 10% output
  • 5
    Final report written with full traceable citations
Personal AI

Long-Horizon Personal Assistant

Task: “Help me prep for my meeting with the investor tomorrow.”

Engineered Context Window

[SYSTEM] Personal assistant for Alex. Proactive. Anticipate needs. Concise.

[MEMORY] Investor: James Wong, Sequoia. Past meeting notes. Alex’s pitch deck v3. Alex’s goals: $2M seed.

[CALENDAR] Meeting: 10am, 45 mins. Location: Zoom. Retrieved from calendar tool.

[INPUT] “Help me prep for my meeting…”
  • 1
    Rich memory profile means zero context re-establishment
  • 2
    Calendar tool call injects live meeting details
  • 3
    Past notes retrieved — agent knows investor preferences
  • 4
    Agent produces tailored prep briefing in seconds
  • 5
    Post-meeting: outcome saved to long-term memory
§ 05

Key Techniques Reference

Technique Category What It Does When To Use
RAG (Retrieval-Augmented Generation) Context Filling Fetches relevant documents from a vector DB and injects them into context at query time. Grounds the model in real, up-to-date knowledge. Large knowledge bases, dynamic data, reducing hallucinations
Sliding Window History Management Keeps only the N most recent conversation turns in context. Older turns are dropped or summarized to make room for new input. Long-running chatbots, multi-turn agents with token limits
Hierarchical Summarization Compression Progressively summarizes earlier parts of a conversation into increasingly dense summaries, preserving meaning while reducing tokens. Long research sessions, hours-long agent tasks
Scratchpad / Chain-of-Thought Reasoning Aid Dedicated section in context for intermediate reasoning steps. Lets the model “think out loud” before committing to a final answer. Complex multi-step tasks, planning, debugging
Few-Shot Examples Behavior Shaping Inject 2–5 high-quality input/output pairs into context to demonstrate desired format, tone, and reasoning style. Structured outputs, specialized formats, consistent tone
Prompt Caching Efficiency Prefix parts of the context (e.g. system prompt + tools) for reuse across many calls. Dramatically reduces latency and cost. High-volume applications, static system prompts
Tool Result Injection Agentic Loop After tool execution, the result is inserted back into context in a structured format so the model can reason about it in the next step. All agentic tool-use scenarios
Semantic Memory Retrieval Long-Term Memory Stores past interactions as vector embeddings. At runtime, retrieves the most semantically similar past facts and injects them into context. Personalized assistants, cross-session continuity
Priority-Based Ordering Structure Places the most critical instructions at the top of context (highest attention weight). Less important content goes later or is summarized. All agent systems — always apply this principle
XML / Markdown Delimiters Structure Uses explicit tags like <system>, <memory>, <tools> to help the model distinguish between different types of context content. Complex contexts with multiple distinct sections

Context Engineering for AI Agents · A Technical Visual Guide · 2025

Leave a Reply

Your email address will not be published. Required fields are marked *