AI Agent Memory Systems: Complete Architecture Guide

Building AI agents that remember and learn from their experiences is the difference between a glorified chatbot and a truly intelligent system. Yet memory architecture is where most agent projects fail—not from lack of LLM capability, but from fundamental misunderstandings about how AI agents should store, retrieve, and utilize information.

This guide provides a comprehensive framework for implementing robust memory systems in production AI agents. You’ll learn the four types of agent memory, when to use each, and proven implementation patterns that scale from prototype to production.

Understanding AI Agent Memory Architecture

Why Memory Matters for Production AI Agents

Memory transforms AI agents from stateless query processors into intelligent systems capable of:

Contextual Continuity: Maintaining coherent conversations across multiple sessions without repeating questions or losing user preferences.

Learning and Adaptation: Improving performance based on past interactions and successful patterns.

Complex Workflow Management: Executing multi-step tasks that span minutes, hours, or days while preserving state and progress.

Personalization: Tailoring responses and recommendations based on user history and preferences.

Error Recovery: Learning from failures to avoid repeating mistakes and improving reliability over time.

Without proper memory architecture, agents experience the memory-related failure modes in production that kill 80% of AI agent projects before they reach stable deployment.

The Four Types of AI Agent Memory

Modern production AI agents require multiple memory systems working in coordination:

1. Working Memory (Short-term)
– Immediate conversation context
– Current task state and progress
– Recent tool outputs and user inputs
– Duration: Current session only

2. Persistent Memory (Long-term)
– User preferences and behavioral patterns
– Historical interaction summaries
– Learned procedures and successful workflows
– Duration: Permanent, across all sessions

3. Episodic Memory
– Grouped interactions around specific events or goals
– Context for retrieving similar past situations
– Learning from successful and failed episodes
– Duration: Permanent, organized by episodes

4. Procedural Memory (Semantic)
– General knowledge about tools and their usage
– Learned rules and decision patterns
– Abstract concepts and relationships
– Duration: Permanent, continuously updated

Memory vs. Context: Key Architectural Differences

Many developers confuse LLM context windows with true memory systems. Here’s the critical distinction:

Aspect	Context Window	True Memory
Capacity	Limited by token count (4K-32K tokens)	Unlimited with proper storage
Persistence	Lost between sessions	Maintained indefinitely
Retrieval	Linear search through all context	Selective, semantic search
Performance	Degrades with size	Scales with proper indexing
Cost	Expensive (all tokens processed each request)	Cost-effective (only relevant data retrieved)
Organization	Chronological only	Semantic, episodic, hierarchical

Key Insight: Context windows are working memory. True agent intelligence requires external memory systems that persist beyond individual conversations.

Short-Term Memory Implementation Patterns

Context Window Management Strategies

Effective short-term memory balances completeness with efficiency:

Sliding Window Approach:

class SlidingContextWindow:
    def __init__(self, max_tokens=4000, preservation_ratio=0.3):
        self.max_tokens = max_tokens
        self.preservation_ratio = preservation_ratio
        self.messages = []

    def add_message(self, message):
        self.messages.append(message)
        if self._token_count() > self.max_tokens:
            self._compress_context()

    def _compress_context(self):
        # Preserve first few messages (system prompt, initial context)
        preserve_count = int(len(self.messages) * self.preservation_ratio)
        preserved = self.messages[:preserve_count]

        # Keep recent messages
        recent = self.messages[-(self.max_tokens // 4):]

        # Summarize middle section
        middle_summary = self._summarize_messages(
            self.messages[preserve_count:-len(recent)]
        )

        self.messages = preserved + [middle_summary] + recent

Priority-Based Retention:

class PriorityContextManager:
    def __init__(self):
        self.high_priority = []  # System instructions, user goals
        self.medium_priority = []  # Recent successful actions
        self.low_priority = []  # General conversation

    def categorize_message(self, message):
        if message.contains_system_instruction():
            self.high_priority.append(message)
        elif message.contains_successful_action():
            self.medium_priority.append(message)
        else:
            self.low_priority.append(message)

    def build_context(self, max_tokens):
        context = []
        remaining_tokens = max_tokens

        # Always include high priority
        for msg in self.high_priority:
            context.append(msg)
            remaining_tokens -= msg.token_count

        # Add medium priority if space allows
        for msg in reversed(self.medium_priority):
            if msg.token_count <= remaining_tokens:
                context.insert(-len([m for m in context if m in self.medium_priority]), msg)
                remaining_tokens -= msg.token_count

        # Fill remaining space with recent low priority
        for msg in reversed(self.low_priority):
            if msg.token_count <= remaining_tokens:
                context.append(msg)
                remaining_tokens -= msg.token_count
            else:
                break

        return context

Working Memory Patterns for LLM Agents

State-Driven Working Memory:
Maintain explicit state representations that persist across LLM calls:

class WorkingMemory:
    def __init__(self):
        self.current_task = None
        self.task_progress = {}
        self.pending_actions = []
        self.recent_outputs = {}
        self.user_context = {}

    def update_task_state(self, task_id, step, status, output=None):
        if task_id not in self.task_progress:
            self.task_progress[task_id] = {}
        self.task_progress[task_id][step] = {
            'status': status,
            'output': output,
            'timestamp': datetime.now()
        }

    def get_relevant_context(self, max_items=5):
        context = {
            'current_task': self.current_task,
            'recent_progress': self._get_recent_progress(max_items),
            'pending_actions': self.pending_actions[-max_items:],
            'user_preferences': self.user_context
        }
        return context

Buffer Architectures for Multi-Agent Systems

When multiple agents share working memory:

class SharedWorkingMemory:
    def __init__(self):
        self.shared_state = {}
        self.agent_buffers = {}
        self.coordination_queue = []
        self.lock = asyncio.Lock()

    async def agent_write(self, agent_id, key, value):
        async with self.lock:
            if agent_id not in self.agent_buffers:
                self.agent_buffers[agent_id] = {}
            self.agent_buffers[agent_id][key] = value

            # Broadcast critical updates to shared state
            if key.startswith('shared_'):
                self.shared_state[key] = value
                await self._notify_agents(agent_id, key, value)

    async def agent_read(self, agent_id, key):
        # First check agent's local buffer
        if agent_id in self.agent_buffers and key in self.agent_buffers[agent_id]:
            return self.agent_buffers[agent_id][key]

        # Fall back to shared state
        return self.shared_state.get(key)

Long-Term Memory Systems for Production

Persistent Memory Storage Options

Vector Database Integration:
Vector databases excel at semantic similarity search for agent memory:

import chromadb
from sentence_transformers import SentenceTransformer

class VectorMemoryStore:
    def __init__(self, collection_name="agent_memory"):
        self.client = chromadb.Client()
        self.collection = self.client.create_collection(collection_name)
        self.encoder = SentenceTransformer('all-MiniLM-L6-v2')

    def store_memory(self, content, metadata=None):
        embedding = self.encoder.encode([content])[0]
        memory_id = f"mem_{uuid.uuid4()}"

        self.collection.add(
            embeddings=[embedding.tolist()],
            documents=[content],
            metadatas=[metadata or {}],
            ids=[memory_id]
        )
        return memory_id

    def retrieve_similar(self, query, n_results=5):
        query_embedding = self.encoder.encode([query])[0]

        results = self.collection.query(
            query_embeddings=[query_embedding.tolist()],
            n_results=n_results
        )

        return [
            {
                'content': doc,
                'metadata': meta,
                'similarity': 1 - distance  # Convert distance to similarity
            }
            for doc, meta, distance in zip(
                results['documents'][0],
                results['metadatas'][0],
                results['distances'][0]
            )
        ]

Hybrid SQL + Vector Architecture:
Combine structured data storage with semantic search:

class HybridMemoryStore:
    def __init__(self, db_connection, vector_store):
        self.db = db_connection
        self.vector_store = vector_store

    def store_interaction(self, user_id, session_id, content, metadata):
        # Store structured data in SQL
        interaction_id = self.db.execute("""
            INSERT INTO interactions (user_id, session_id, content, metadata, timestamp)
            VALUES (?, ?, ?, ?, ?)
            RETURNING id
        """, (user_id, session_id, content, json.dumps(metadata), datetime.now())).fetchone()[0]

        # Store embeddings for semantic search
        vector_id = self.vector_store.store_memory(
            content,
            metadata={
                'interaction_id': interaction_id,
                'user_id': user_id,
                'session_id': session_id,
                **metadata
            }
        )

        # Link vector and SQL records
        self.db.execute("""
            UPDATE interactions 
            SET vector_id = ? 
            WHERE id = ?
        """, (vector_id, interaction_id))

        return interaction_id

    def retrieve_user_history(self, user_id, query=None, limit=10):
        if query:
            # Semantic search with user filtering
            vector_results = self.vector_store.retrieve_similar(query, n_results=limit*2)
            user_results = [r for r in vector_results if r['metadata']['user_id'] == user_id][:limit]

            # Enrich with SQL data
            interaction_ids = [r['metadata']['interaction_id'] for r in user_results]
            sql_data = self.db.execute("""
                SELECT * FROM interactions 
                WHERE id IN ({})
                ORDER BY timestamp DESC
            """.format(','.join('?' * len(interaction_ids))), interaction_ids).fetchall()

            return self._merge_results(user_results, sql_data)
        else:
            # Simple chronological retrieval
            return self.db.execute("""
                SELECT * FROM interactions 
                WHERE user_id = ? 
                ORDER BY timestamp DESC 
                LIMIT ?
            """, (user_id, limit)).fetchall()

When to Use SQL vs. NoSQL for Agent State

Use SQL When:
– Strict data consistency requirements
– Complex relational queries needed
– ACID transactions required
– Structured data with clear schema
– Financial or audit trail requirements

Use NoSQL When:
– Flexible, evolving data schemas
– High write volumes with eventual consistency
– Need for horizontal scaling
– Document or graph data structures
– Real-time updates and low latency critical

Hybrid Approach Example:

class AgentMemoryArchitecture:
    def __init__(self):
        # SQL for critical, structured data
        self.sql_store = SQLDatabase()  # User profiles, sessions, transactions

        # NoSQL for flexible, high-volume data  
        self.nosql_store = MongoDB()    # Conversation logs, variable metadata

        # Vector store for semantic search
        self.vector_store = ChromaDB()  # Memory embeddings, similarity search

        # Redis for real-time state
        self.cache = Redis()            # Session state, temporary buffers

    def store_complete_interaction(self, interaction_data):
        # Structured data to SQL
        session_id = self.sql_store.create_session(
            user_id=interaction_data['user_id'],
            start_time=interaction_data['timestamp']
        )

        # Flexible data to NoSQL
        self.nosql_store.store_conversation(
            session_id=session_id,
            messages=interaction_data['messages'],
            metadata=interaction_data['metadata']
        )

        # Semantic embeddings to vector store
        for message in interaction_data['messages']:
            self.vector_store.store_memory(
                content=message['content'],
                metadata={'session_id': session_id, 'type': message['type']}
            )

        # Current state to cache
        self.cache.set(
            f"session:{session_id}:state",
            json.dumps(interaction_data['current_state']),
            ex=3600  # 1 hour expiry
        )

Episodic Memory for Complex Workflows

What is Episodic Memory in AI Agents

Episodic memory groups related interactions into meaningful episodes, enabling agents to:

Pattern Recognition: Identify similar situations from past episodes
Contextual Learning: Understand what worked in specific circumstances
Workflow Optimization: Improve multi-step processes based on episode outcomes
Error Prevention: Avoid repeating failed episode patterns

Implementing Episode Boundaries and Retrieval

Episode Detection Algorithm:

class EpisodicMemory:
    def __init__(self):
        self.current_episode = None
        self.episode_store = {}
        self.episode_embeddings = {}

    def detect_episode_boundary(self, new_interaction):
        """Determine if new interaction starts a new episode"""
        if self.current_episode is None:
            return self._start_new_episode(new_interaction)

        # Check for episode termination signals
        if self._is_goal_completed(new_interaction):
            self._close_current_episode(success=True)
            return self._start_new_episode(new_interaction)

        if self._is_context_shift(new_interaction):
            self._close_current_episode(success=False)
            return self._start_new_episode(new_interaction)

        # Check for episode continuation
        if self._is_related_to_current_episode(new_interaction):
            self._add_to_current_episode(new_interaction)
            return self.current_episode

        # Default: start new episode for unrelated content
        self._close_current_episode(success=False)
        return self._start_new_episode(new_interaction)

    def _is_goal_completed(self, interaction):
        """Detect completion signals in interaction"""
        completion_indicators = [
            "task completed", "finished", "done", "success",
            "achieved", "solved", "resolved", "accomplished"
        ]
        return any(indicator in interaction['content'].lower() 
                  for indicator in completion_indicators)

    def _is_context_shift(self, interaction):
        """Detect major topic/goal changes"""
        if not self.current_episode:
            return True

        current_embedding = self._get_episode_embedding(self.current_episode)
        new_embedding = self._get_interaction_embedding(interaction)

        similarity = cosine_similarity(current_embedding, new_embedding)
        return similarity < 0.3  # Threshold for context shift

    def retrieve_similar_episodes(self, current_context, n_episodes=5):
        """Find past episodes similar to current situation"""
        context_embedding = self._get_interaction_embedding(current_context)

        similarities = []
        for episode_id, episode_data in self.episode_store.items():
            episode_embedding = self.episode_embeddings.get(episode_id)
            if episode_embedding is not None:
                similarity = cosine_similarity(context_embedding, episode_embedding)
                similarities.append((episode_id, similarity, episode_data))

        # Return top N most similar episodes
        similarities.sort(key=lambda x: x[1], reverse=True)
        return similarities[:n_episodes]

Real-World Use Cases for Episodic Memory

Customer Support Agent:

class SupportEpisodicMemory(EpisodicMemory):
    def categorize_episode(self, episode):
        """Categorize support episodes by type and outcome"""
        categories = {
            'technical_issue': ['bug', 'error', 'not working', 'broken'],
            'billing_question': ['payment', 'charge', 'invoice', 'billing'],
            'feature_request': ['add', 'want', 'wish', 'feature', 'enhancement'],
            'account_management': ['password', 'login', 'account', 'profile']
        }

        content = ' '.join([msg['content'] for msg in episode['messages']]).lower()

        for category, keywords in categories.items():
            if any(keyword in content for keyword in keywords):
                episode['category'] = category
                break
        else:
            episode['category'] = 'other'

        # Track resolution patterns
        if episode['outcome'] == 'success':
            self._learn_successful_pattern(episode)

    def suggest_response_strategy(self, current_issue):
        """Suggest response strategy based on similar past episodes"""
        similar_episodes = self.retrieve_similar_episodes(current_issue)

        successful_episodes = [ep for ep in similar_episodes if ep[2]['outcome'] == 'success']

        if successful_episodes:
            # Analyze successful resolution patterns
            strategies = [self._extract_strategy(ep[2]) for ep in successful_episodes]
            return self._rank_strategies(strategies)

        return self._get_default_strategy(current_issue)

Project Management Agent:

class ProjectEpisodicMemory(EpisodicMemory):
    def track_project_patterns(self, project_episode):
        """Learn from project episodes to improve future planning"""
        episode_data = {
            'project_type': project_episode['metadata']['type'],
            'team_size': project_episode['metadata']['team_size'],
            'duration_planned': project_episode['metadata']['planned_duration'],
            'duration_actual': project_episode['actual_duration'],
            'challenges': self._extract_challenges(project_episode),
            'success_factors': self._extract_success_factors(project_episode),
            'outcome': project_episode['outcome']
        }

        self._update_project_patterns(episode_data)

    def predict_project_risks(self, new_project_plan):
        """Predict risks based on similar past projects"""
        similar_projects = self.retrieve_similar_episodes(new_project_plan)

        risk_patterns = {}
        for _, similarity, past_project in similar_projects:
            for challenge in past_project['challenges']:
                risk_patterns[challenge] = risk_patterns.get(challenge, 0) + similarity

        # Return top risks weighted by similarity to past projects
        return sorted(risk_patterns.items(), key=lambda x: x[1], reverse=True)

Memory Retrieval and Search Strategies

Semantic Search for Agent Memory

Implement multi-layered search that combines semantic similarity with metadata filtering:

class AdvancedMemoryRetrieval:
    def __init__(self, vector_store, sql_store):
        self.vector_store = vector_store
        self.sql_store = sql_store

    def search_memory(self, query, filters=None, search_type='hybrid'):
        """Multi-strategy memory search"""
        if search_type == 'semantic':
            return self._semantic_search(query, filters)
        elif search_type == 'temporal':
            return self._temporal_search(query, filters)
        elif search_type == 'episodic':
            return self._episodic_search(query, filters)
        else:  # hybrid
            return self._hybrid_search(query, filters)

    def _hybrid_search(self, query, filters):
        """Combine multiple search strategies"""
        # Get semantic candidates
        semantic_results = self._semantic_search(query, filters)

        # Get temporal candidates
        temporal_results = self._temporal_search(query, filters)

        # Get episodic candidates
        episodic_results = self._episodic_search(query, filters)

        # Merge and rank results
        all_results = {}

        # Weight semantic similarity highest
        for result in semantic_results:
            all_results[result['id']] = result
            all_results[result['id']]['score'] = result['similarity'] * 0.5

        # Add temporal relevance
        for result in temporal_results:
            if result['id'] in all_results:
                all_results[result['id']]['score'] += result['recency_score'] * 0.3
            else:
                all_results[result['id']] = result
                all_results[result['id']]['score'] = result['recency_score'] * 0.3

        # Add episodic relevance
        for result in episodic_results:
            if result['id'] in all_results:
                all_results[result['id']]['score'] += result['episode_relevance'] * 0.2
            else:
                all_results[result['id']] = result
                all_results[result['id']]['score'] = result['episode_relevance'] * 0.2

        # Return top results sorted by combined score
        return sorted(all_results.values(), key=lambda x: x['score'], reverse=True)

Hierarchical Memory Organization

Structure memory in hierarchical layers for efficient retrieval:

class HierarchicalMemory:
    def __init__(self):
        self.memory_hierarchy = {
            'immediate': {},    # Last 10 interactions
            'recent': {},       # Last hour
            'session': {},      # Current session
            'daily': {},        # Today's interactions
            'weekly': {},       # This week's patterns
            'archived': {}      # Long-term storage
        }

    def store_memory(self, memory_item, importance_score):
        """Store memory at appropriate hierarchy level"""
        timestamp = memory_item['timestamp']
        now = datetime.now()

        # Always store in immediate memory
        self.memory_hierarchy['immediate'][memory_item['id']] = memory_item

        # Promote to higher levels based on importance and recency
        if importance_score > 0.8:
            self.memory_hierarchy['session'][memory_item['id']] = memory_item

        if importance_score > 0.9:
            self.memory_hierarchy['daily'][memory_item['id']] = memory_item

        # Automatic aging and promotion
        self._age_memories()

    def retrieve_memory(self, query, max_results=10):
        """Search hierarchy from most recent to oldest"""
        results = []
        remaining = max_results

        # Search each level until we have enough results
        for level in ['immediate', 'recent', 'session', 'daily', 'weekly', 'archived']:
            if remaining <= 0:
                break

            level_results = self._search_level(level, query, remaining)
            results.extend(level_results)
            remaining -= len(level_results)

        return results

    def _age_memories(self):
        """Move memories to appropriate levels based on age"""
        now = datetime.now()

        # Move old immediate memories to recent
        for memory_id, memory in list(self.memory_hierarchy['immediate'].items()):
            age_minutes = (now - memory['timestamp']).total_seconds() / 60

            if age_minutes > 30:  # Older than 30 minutes
                if memory_id not in self.memory_hierarchy['recent']:
                    self.memory_hierarchy['recent'][memory_id] = memory
                del self.memory_hierarchy['immediate'][memory_id]

        # Continue aging process for other levels...

Optimizing Memory Lookup Performance

Caching Strategy:

class MemoryCache:
    def __init__(self, cache_size=1000):
        self.cache = OrderedDict()
        self.cache_size = cache_size

    def get_memory(self, query_hash):
        """Retrieve cached memory search results"""
        if query_hash in self.cache:
            # Move to end (most recently used)
            self.cache.move_to_end(query_hash)
            return self.cache[query_hash]
        return None

    def cache_memory(self, query_hash, results):
        """Cache memory search results"""
        if len(self.cache) >= self.cache_size:
            # Remove least recently used item
            self.cache.popitem(last=False)

        self.cache[query_hash] = results

class OptimizedMemoryRetrieval:
    def __init__(self, memory_stores, cache_size=1000):
        self.memory_stores = memory_stores
        self.cache = MemoryCache(cache_size)
        self.query_optimizer = QueryOptimizer()

    async def retrieve_memory(self, query, filters=None):
        """Optimized memory retrieval with caching and query optimization"""
        # Create query hash for caching
        query_hash = self._hash_query(query, filters)

        # Check cache first
        cached_results = self.cache.get_memory(query_hash)
        if cached_results:
            return cached_results

        # Optimize query for better performance
        optimized_query = self.query_optimizer.optimize(query, filters)

        # Parallel search across memory stores
        search_tasks = [
            store.search(optimized_query, filters) 
            for store in self.memory_stores
        ]

        search_results = await asyncio.gather(*search_tasks)

        # Merge and rank results
        merged_results = self._merge_results(search_results)

        # Cache results for future queries
        self.cache.cache_memory(query_hash, merged_results)

        return merged_results

Production Implementation Patterns

Stateful vs. Stateless Agent Design

Stateful Agent Architecture:

class StatefulAgent:
    def __init__(self, agent_id):
        self.agent_id = agent_id
        self.memory_manager = MemoryManager(agent_id)
        self.state = AgentState()
        self.conversation_history = []

    async def process_message(self, message):
        # Load relevant memory
        relevant_memories = await self.memory_manager.retrieve_relevant(message)

        # Update state
        self.state.update_with_message(message)

        # Generate response with full context
        response = await self.generate_response(
            message=message,
            memories=relevant_memories,
            state=self.state,
            history=self.conversation_history[-10:]  # Recent history
        )

        # Store interaction in memory
        await self.memory_manager.store_interaction(message, response)

        # Update conversation history
        self.conversation_history.append({
            'message': message,
            'response': response,
            'timestamp': datetime.now()
        })

        return response

    async def save_state(self):
        """Persist agent state for recovery"""
        await self.memory_manager.save_agent_state(self.state)

    async def load_state(self):
        """Restore agent state from storage"""
        self.state = await self.memory_manager.load_agent_state()

Stateless Agent Architecture:

class StatelessAgent:
    def __init__(self):
        self.memory_store = SharedMemoryStore()

    async def process_message(self, message, session_id):
        # Reconstruct context from memory
        session_context = await self.memory_store.get_session_context(session_id)
        relevant_memories = await self.memory_store.retrieve_relevant(
            message, session_id
        )

        # Generate response without persistent state
        response = await self.generate_response(
            message=message,
            context=session_context,
            memories=relevant_memories
        )

        # Update session in memory store
        await self.memory_store.update_session(
            session_id, message, response
        )

        return response

Memory Consistency and Synchronization

Distributed Memory Consistency:

class DistributedMemoryManager:
    def __init__(self):
        self.local_cache = MemoryCache()
        self.distributed_store = DistributedMemoryStore()
        self.consistency_manager = ConsistencyManager()

    async def write_memory(self, memory_item):
        """Write to memory with consistency guarantees"""
        # Write to distributed store first
        version = await self.distributed_store.write(memory_item)

        # Update local cache
        self.local_cache.set(memory_item.id, memory_item, version)

        # Notify other nodes
        await self.consistency_manager.broadcast_update(memory_item.id, version)

        return version

    async def read_memory(self, memory_id):
        """Read memory with consistency checks"""
        # Check local cache first
        cached_item, cached_version = self.local_cache.get(memory_id)

        if cached_item:
            # Validate cache consistency
            latest_version = await self.distributed_store.get_version(memory_id)
            if cached_version == latest_version:
                return cached_item

        # Fetch from distributed store
        memory_item = await self.distributed_store.read(memory_id)

        # Update local cache
        self.local_cache.set(memory_id, memory_item, memory_item.version)

        return memory_item

Scaling Memory Systems for Multi-Agent Environments

Shared Memory Architecture:

class MultiAgentMemorySystem:
    def __init__(self):
        self.agent_private_memories = {}
        self.shared_memory = SharedMemoryStore()
        self.coordination_service = CoordinationService()

    async def register_agent(self, agent_id):
        """Register new agent and initialize memory spaces"""
        self.agent_private_memories[agent_id] = PrivateMemoryStore(agent_id)

        # Subscribe to relevant shared memory updates
        await self.coordination_service.subscribe_agent(agent_id)

    async def agent_write_private(self, agent_id, memory_item):
        """Write to agent's private memory space"""
        private_store = self.agent_private_memories[agent_id]
        await private_store.write(memory_item)

    async def agent_write_shared(self, agent_id, memory_item, scope='global'):
        """Write to shared memory with specified scope"""
        # Add agent attribution
        memory_item.metadata['author_agent'] = agent_id
        memory_item.metadata['scope'] = scope

        # Write to shared store
        await self.shared_memory.write(memory_item)

        # Notify other agents based on scope
        if scope == 'global':
            await self.coordination_service.notify_all_agents(memory_item)
        elif scope == 'team':
            team_agents = await self.coordination_service.get_team_agents(agent_id)
            await self.coordination_service.notify_agents(team_agents, memory_item)

    async def agent_read_context(self, agent_id, query, include_shared=True):
        """Read relevant context from both private and shared memory"""
        results = []

        # Search private memory
        private_store = self.agent_private_memories[agent_id]
        private_results = await private_store.search(query)
        results.extend(private_results)

        # Search shared memory if requested
        if include_shared:
            shared_results = await self.shared_memory.search(
                query, 
                filters={'accessible_to': agent_id}
            )
            results.extend(shared_results)

        # Rank combined results
        return self._rank_results(results, agent_id)

Frequently Asked Questions

Q: How much memory should I allocate for different memory types in production?
A: Start with 70% long-term/persistent memory, 20% episodic memory, and 10% working memory. Monitor usage patterns and adjust based on your agent’s specific needs. High-interaction agents may need more working memory, while analytical agents benefit from larger episodic stores.

Q: When should I use vector databases vs. traditional databases for memory storage?
A: Use vector databases for semantic search and content similarity (conversation retrieval, experience matching). Use traditional SQL databases for structured data, relationships, and transactions (user profiles, session metadata). Many production systems use both in a hybrid architecture.

Q: How do I handle memory consistency in multi-agent systems?
A: Implement eventual consistency for non-critical shared memory, strict consistency for coordination data, and partition memory by agent scope when possible. Use version vectors or logical clocks to detect conflicts and establish ordering.

Q: What’s the optimal memory retrieval strategy for real-time agent responses?
A: Use a tiered approach: immediate cache for recent interactions, indexed storage for semantic search, and background preloading for predicted needs. Aim for <100ms retrieval time for working memory, <500ms for episodic queries.

Q: How do I prevent memory leakage in long-running agent sessions?
A: Implement automatic memory aging, compress old episodes, archive inactive memories, and set hard limits on memory size per agent. Monitor memory growth patterns and implement cleanup strategies for different memory types.

Q: Should I implement memory compression for storage efficiency?
A: Yes, but selectively. Compress archived episodes and old conversations while keeping recent and high-importance memories in full resolution. Use semantic compression (summarization) rather than just technical compression to preserve meaning.

Q: How do I backup and restore agent memory systems?
A: Implement incremental backups for active memory, full snapshots for episodic archives, and versioned storage for critical agent state. Test restoration procedures regularly and maintain separate backups for different memory types.

For comprehensive implementation of these memory patterns in production environments, see our guide on memory system implementation strategies for production agents. To understand how proper memory architecture prevents common production failures, review our analysis of understanding context window limitations that cause failures.