Building Smarter AI Agents with MongoDB

Last week I attended MongoDB Dev Day here in Chicago, a two-day deep dive into modern data modeling, search, and building AI applications on MongoDB. Going in, I was curious about one thing in particular: how well does MongoDB fit into agentic workflows?

I left convinced that the document model is one of the most underrated primitives for building AI agents — especially when it comes to agent memory. This post is a recap of what I learned, with code examples, and a focused look at why MongoDB is a strong backbone for agent memory.

Why MongoDB Clicks for AI Workflows

LLMs and agents naturally produce and consume JSON-like, semi-structured data. Tool calls, intermediate reasoning, plans, observations, and memory all tend to be hierarchical and shape-shifting. Traditional relational schemas force you to flatten, normalize, and translate that data on the way in and out.

MongoDB skips that impedance mismatch entirely:

An LLM’s output ≈ a MongoDB document.
Schemas can evolve as your agent evolves.
Vector embeddings live alongside the data they describe — no separate vector DB to sync.

That last point is the unlock for agent memory, which I’ll get to. First, a quick walk through the foundations.

Day 1: Data Modeling & Search Foundations

Document Modeling: Embed vs. Reference

We built an online library app to explore the central tradeoff in document design: embedding vs. referencing.

Embedding keeps related data co-located so you avoid joins:

// Embedded: a book with its reviews inline
{
  "_id": "book_123",
  "title": "The Pragmatic Programmer",
  "authors": ["Andy Hunt", "Dave Thomas"],
  "reviews": [
    { "user": "alice", "rating": 5, "text": "Classic." },
    { "user": "bob",   "rating": 4, "text": "Still relevant." }
  ]
}

This is great… until it isn’t. If reviews grow unbounded, or are updated independently of the book, referencing is better:

// Referenced
// books collection
{ "_id": "book_123", "title": "The Pragmatic Programmer" }

// reviews collection
{ "_id": "rev_1", "book_id": "book_123", "user": "alice", "rating": 5 }
{ "_id": "rev_2", "book_id": "book_123", "user": "bob",   "rating": 4 }

Rule of thumb: embed when data is read together and bounded in size; reference when data grows unbounded or is written independently.

Search Fundamentals

MongoDB’s Atlas Search gives you full-text search built into the database — no separate Elasticsearch cluster required. Highlights:

Autocomplete, fuzzy search, filters, facets
Multi-language support (40+ languages)
Custom scoring, synonyms, highlighting, more-like-this

A simple fuzzy search aggregation:

db.books.aggregate([
  {
    $search: {
      index: "default",
      text: {
        query: "pragmatik programer",
        path: ["title", "authors"],
        fuzzy: { maxEdits: 2 }
      }
    }
  },
  { $limit: 10 }
]);

We ended Day 1 by earning a skill badge — a nice way to validate everything hands-on.

Day 2: Building AI Applications

Vector Search + Auto Embeddings

The standout feature of Day 2 was MongoDB’s auto embedding capability: embeddings are generated automatically when a document is inserted or updated, and they’re stored inside the document itself.

That means no sync pipeline between your operational DB and your vector store. One source of truth.

A vector index definition:

db.memories.createSearchIndex({
  name: "memory_vector_index",
  type: "vectorSearch",
  definition: {
    fields: [
      {
        type: "vector",
        path: "embedding",
        numDimensions: 1536,
        similarity: "cosine"
      },
      { type: "filter", path: "agent_id" },
      { type: "filter", path: "type" }
    ]
  }
});

And a vector search query, combining semantic similarity with metadata filters:

db.memories.aggregate([
  {
    $vectorSearch: {
      index: "memory_vector_index",
      path: "embedding",
      queryVector: queryEmbedding, // produced by your embedding model
      numCandidates: 100,
      limit: 5,
      filter: {
        agent_id: "agent_42",
        type: "long_term"
      }
    }
  },
  {
    $project: {
      content: 1,
      type: 1,
      created_at: 1,
      score: { $meta: "vectorSearchScore" }
    }
  }
]);

This is the foundation for RAG — and, more interestingly, for agent memory.

The Main Event: Agent Memory in MongoDB

Here’s where MongoDB really shines. Most agent frameworks need three kinds of memory:

Short-term memory — the current conversation / scratchpad
Long-term memory — semantic recall across sessions
Structured memory — facts, preferences, entities

In a typical stack, that’s three systems (Redis, a vector DB, a relational DB) glued together. In MongoDB, it’s one collection with a flexible schema.

A Unified Memory Document

{
  "_id": "mem_9f2c",
  "agent_id": "agent_42",
  "session_id": "sess_1029",
  "type": "long_term",                  // short_term | long_term | fact
  "role": "user",                       // user | assistant | tool | system
  "content": "I prefer concise answers and code in TypeScript.",
  "metadata": {
    "source": "conversation",
    "importance": 0.82,
    "entities": ["user_preference", "language:typescript"]
  },
  "embedding": [0.0123, -0.0456, ...],  // auto-generated
  "created_at": "2025-01-15T14:32:11Z",
  "last_accessed": "2025-01-16T09:11:02Z"
}

One schema flexes to cover all three memory types. No migrations when the agent evolves.

Writing Memories

Here’s a simplified Python snippet for storing a memory. With auto embeddings configured, you don’t even need to compute the vector yourself:

from pymongo import MongoClient
from datetime import datetime

client = MongoClient(MONGO_URI)
memories = client.agents.memories

def remember(agent_id: str, session_id: str, content: str, mem_type: str):
    memories.insert_one({
        "agent_id": agent_id,
        "session_id": session_id,
        "type": mem_type,
        "content": content,                 # embedding auto-generated
        "created_at": datetime.utcnow(),
        "last_accessed": datetime.utcnow(),
    })

remember(
    agent_id="agent_42",
    session_id="sess_1029",
    content="User prefers concise answers and TypeScript code samples.",
    mem_type="long_term",
)

Retrieving Memories

When the agent needs context, do a hybrid retrieval — recent short-term messages plus semantically relevant long-term memories:

def recall(agent_id: str, session_id: str, query_embedding: list[float]):
    # Recent short-term context (last 10 turns)
    short_term = list(memories.find(
        {"agent_id": agent_id, "session_id": session_id, "type": "short_term"}
    ).sort("created_at", -1).limit(10))

    # Semantically relevant long-term memories
    long_term = list(memories.aggregate([
        {
            "$vectorSearch": {
                "index": "memory_vector_index",
                "path": "embedding",
                "queryVector": query_embedding,
                "numCandidates": 100,
                "limit": 5,
                "filter": {"agent_id": agent_id, "type": "long_term"}
            }
        },
        {"$project": {"content": 1, "score": {"$meta": "vectorSearchScore"}}}
    ]))

    return {"short_term": short_term, "long_term": long_term}

Building the Prompt

Then fold both into your prompt:

def build_prompt(user_msg: str, memory: dict) -> str:
    long_term = "\n".join(f"- {m['content']}" for m in memory["long_term"])
    short_term = "\n".join(f"{m['role']}: {m['content']}" for m in reversed(memory["short_term"]))

    return f"""You are a helpful assistant.

Relevant long-term memory about the user:
{long_term}

Recent conversation:
{short_term}

User: {user_msg}
Assistant:"""

That’s a fully functional agent memory layer in one collection, one index, and a handful of queries.

Bonus: Memory Decay & Reinforcement

Because each memory is just a document, you can model real cognitive patterns trivially. Boost importance on access:

db.memories.updateOne(
  { _id: "mem_9f2c" },
  {
    $set: { last_accessed: new Date() },
    $inc: { "metadata.access_count": 1 }
  }
);

Or prune stale, low-importance memories on a schedule:

db.memories.deleteMany({
  type: "short_term",
  created_at: { $lt: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000) },
  "metadata.importance": { $lt: 0.3 }
});

Try doing that cleanly across three separate systems.

Key Takeaways

The document model matches how agents think. JSON in, JSON out, no translation layer.
Auto embeddings collapse the stack. Your operational data and its vector representation live in the same document, updated atomically.
Agent memory becomes a single collection. Short-term, long-term, and structured memory all fit one flexible schema — with hybrid retrieval in a single aggregation pipeline.
You can model real memory dynamics (decay, reinforcement, importance) with plain MongoDB updates — no custom infrastructure.

If you’re building agentic workflows, RAG systems, or anything where the data is fluid and the access patterns mix semantic + structured queries, MongoDB deserves a serious look. The fewer moving parts in your agent infrastructure, the more time you spend on the agent itself.

Thanks to the MongoDB team for a great two days. If you want to dig in further, check out the Data Modeling slides, Search Fundamentals, and the Vector Search lab.