BLH
Agent Architecture17 min2026-01-22

Building AI Agent Memory Systems on Google Cloud: Short-Term, Long-Term, and Shared Memory

Architecture patterns for AI agent memory systems including working memory, episodic memory, semantic memory, and shared memory implementations on Google Cloud infrastructure.

Brandon Lincoln Hendricks

Brandon Lincoln Hendricks

Autonomous AI Agent Architect

Why Memory Is the Missing Layer

Most AI agent discussions focus on reasoning and action — the model's ability to think and the tools it uses to act. But there is a third capability that separates genuinely useful agent systems from impressive demos: memory.

Without memory, every agent interaction starts from zero. The agent cannot learn from past experiences, cannot build on previous analyses, cannot maintain context across sessions, and cannot share knowledge with other agents. It is perpetually a new employee on day one.

Memory is what transforms an AI agent from a stateless reasoning engine into an intelligent operational participant that accumulates knowledge, learns from outcomes, and becomes more effective over time.

The Memory Taxonomy for AI Agents

Agent memory is not monolithic. Different operational requirements demand different memory types, each with distinct characteristics, access patterns, and implementation strategies.

Working Memory (Short-Term)

Working memory holds the information the agent is actively reasoning about during a single interaction. It includes the current conversation context, intermediate reasoning results, tool call responses, and the agent's current plan of action.

Working memory is characterized by small size (typically within the model's context window), fast access (in-memory), short duration (lives for a single session), and high relevance (everything in working memory is pertinent to the current task).

Implementation on Google Cloud: Working memory is managed within the ADK session context. It lives in the agent instance's memory and is included in Gemini API calls as part of the conversation context. For agents running on Agent Engine, session state provides the mechanism for maintaining working memory across multiple turns within a single interaction.

Design Considerations: Working memory is constrained by Gemini's context window. As interactions progress and working memory grows, older context must be managed — either summarized, pruned, or moved to longer-term storage. Effective context window management is critical for both reasoning quality and cost control.

Episodic Memory

Episodic memory stores records of past experiences — specific interactions, decisions, and their outcomes. It answers the question: what happened before in similar situations?

Episodic memory enables agents to learn from experience without retraining the underlying model. When an agent encounters a situation similar to one it has handled before, it can retrieve the relevant episode — what it decided, what actions it took, and what the outcome was — and use that experience to inform its current reasoning.

Implementation on Google Cloud: Cloud Firestore is the natural backend for episodic memory. Each episode is stored as a document containing the situation context (what signals were present), the agent's decision (what it chose to do and why), the actions taken (what tools were called with what parameters), and the outcome (what happened as a result). Episodes are indexed for similarity-based retrieval.

For efficient episodic retrieval, combine Firestore storage with Vertex AI's vector search capabilities. Encode episode contexts as embeddings and retrieve similar episodes through vector similarity search when the agent encounters a new situation. This gives the agent access to its most relevant past experiences during reasoning.

Semantic Memory

Semantic memory stores factual knowledge, domain concepts, business rules, and relational information. It represents the agent's understanding of its operational domain — not what happened, but what is true about the world it operates in.

Semantic memory includes organizational knowledge (team structures, process definitions, policy documents), domain knowledge (industry terminology, regulatory requirements, technical specifications), and relational knowledge (how entities relate to each other, dependency maps, influence networks).

Implementation on Google Cloud: Semantic memory benefits from a hybrid storage approach. Structured knowledge — entities, relationships, rules — is stored in Firestore with defined schemas. Unstructured knowledge — documents, policies, technical references — is stored in Cloud Storage and indexed for retrieval using Vertex AI Search. BigQuery serves as the analytical layer for semantic memory, enabling agents to reason about patterns and aggregates across large knowledge sets.

Procedural Memory

Procedural memory stores learned procedures — sequences of actions that the agent has determined to be effective for specific types of tasks. Rather than reasoning from scratch about how to handle a common scenario, the agent retrieves and follows a proven procedure.

Procedural memory is the agent equivalent of standard operating procedures. It captures best practices that have been validated through experience.

Implementation on Google Cloud: Procedural memory is stored in Firestore as structured procedure documents. Each procedure includes trigger conditions (when to use this procedure), steps (the sequence of actions), parameters (configurable elements of the procedure), and success criteria (how to determine if the procedure worked). Procedures are versioned and can be updated as the agent learns better approaches.

Shared Memory for Multi-Agent Systems

In multi-agent systems, memory must be shared across agents to enable effective collaboration. Shared memory enables agents to build on each other's work, avoid duplicating effort, and maintain a coherent understanding of the operational landscape.

Shared Working Memory

Multiple agents working on the same task need a shared view of current state. The blackboard pattern (discussed in multi-agent orchestration) is implemented through shared working memory in Firestore. Agents read from and write to a shared state document, with Firestore transactions ensuring consistency.

Design shared working memory with clear ownership rules. Each field in the shared state should have a designated owner agent that is responsible for its accuracy. Other agents can read the field but should not write to it. This prevents conflicts and ensures data quality.

Shared Episodic Memory

Agents in the same operational domain benefit from sharing experiences. When one agent learns that a particular approach works well for a type of problem, that learning should be available to other agents facing similar problems.

Implement shared episodic memory as a common Firestore collection that all agents in a domain can write to and read from. Include the source agent identifier with each episode so that consuming agents can weight experiences based on the expertise and reliability of the source agent.

Shared Semantic Memory

Organizational knowledge is inherently shared. All agents operating within an enterprise should have access to the same factual knowledge base. Implement a centralized semantic memory service that agents query through a standard API. This ensures knowledge consistency across agents and provides a single point of update when knowledge changes.

Memory Retrieval Strategies

Having memory is useless without effective retrieval. The agent must find the right memories at the right time.

Relevance-Based Retrieval

For episodic and semantic memory, relevance-based retrieval uses embedding similarity to find memories that are contextually relevant to the current situation. The agent's current context is encoded as an embedding and compared against stored memory embeddings using Vertex AI's vector search. The most similar memories are retrieved and included in the agent's reasoning context.

Recency-Based Retrieval

For rapidly changing domains, recent memories may be more relevant than older ones regardless of topical similarity. Implement recency weighting that boosts recently created or recently accessed memories in retrieval results.

Importance-Based Retrieval

Not all memories are equally important. Memories associated with high-impact decisions, unusual outcomes, or explicit learning events should be prioritized in retrieval. Assign importance scores to memories based on the impact of the associated decision and the rarity of the situation.

Hybrid Retrieval

Production systems typically combine retrieval strategies. A hybrid approach retrieves memories that score well on a weighted combination of relevance, recency, and importance. The weights can be tuned per agent and per domain based on what produces the best reasoning outcomes.

RAG Integration for Agent Memory

Retrieval-Augmented Generation (RAG) is the mechanism that connects long-term memory to the agent's reasoning process.

Agent-Optimized RAG

Standard RAG retrieves document chunks based on query similarity. Agent-optimized RAG goes further by retrieving structured memories (episodes, procedures, knowledge entities) rather than raw text chunks, formatting retrieved memories for optimal agent reasoning (including metadata about source, recency, and confidence), managing the context window budget by selecting the most valuable memories when space is constrained, and tracking which retrieved memories the agent actually used in its reasoning, enabling feedback-driven retrieval improvement.

Implementation with Vertex AI

Vertex AI Search provides the retrieval infrastructure. Memories stored in Firestore and Cloud Storage are indexed with embeddings generated by Vertex AI's embedding models. At reasoning time, the agent's current context is encoded and used to retrieve relevant memories from the index. Retrieved memories are formatted and injected into the Gemini prompt as contextual information.

Memory Lifecycle Management

Memories accumulate over time and must be managed. Implement lifecycle policies that archive old, low-value memories, consolidate redundant memories, update outdated factual knowledge, and prune memories that have been superseded by newer experience. Memory lifecycle management prevents unbounded growth while preserving the most valuable knowledge.

Practical Architecture for Production Memory

A production memory system on Google Cloud combines these components into an integrated architecture.

Storage Layer: Firestore for structured memories (episodes, procedures, entities), Cloud Storage for unstructured knowledge (documents, policies), BigQuery for analytical memory operations.

Index Layer: Vertex AI Vector Search for similarity-based retrieval, Firestore indexes for attribute-based retrieval, BigQuery for analytical queries over memory collections.

Retrieval Layer: A memory service that accepts retrieval requests from agents, executes hybrid retrieval across storage backends, formats results for agent consumption, and tracks retrieval usage for optimization.

Management Layer: Lifecycle management processes that run periodically to archive, consolidate, and prune memories. Monitoring dashboards that track memory growth, retrieval performance, and usage patterns.

Frequently Asked Questions

Why do AI agents need memory systems?

Without memory, every agent interaction starts from zero — the agent cannot learn from experience, maintain context across sessions, or build on previous work. Memory transforms agents from stateless reasoning engines into intelligent operational participants that accumulate knowledge over time. Episodic memory enables learning from past decisions and outcomes. Semantic memory provides domain knowledge and organizational context. Procedural memory stores validated approaches to common tasks. Together, these memory types enable agents to become more effective with experience, which is essential for production operational systems.

What types of memory do AI agents use?

AI agents use four primary memory types: working memory (short-term context for the current interaction), episodic memory (records of past experiences and their outcomes), semantic memory (factual knowledge, domain concepts, and organizational information), and procedural memory (learned procedures and best practices for specific task types). Each type serves a different purpose and has different storage, access, and lifecycle characteristics. Production systems implement all four types to support comprehensive agent intelligence.

How do you implement shared memory for multi-agent systems on Google Cloud?

Shared memory for multi-agent systems is implemented using Cloud Firestore as the primary storage backend. Shared working memory uses Firestore documents with transactional access to ensure consistency when multiple agents read and write concurrently. Shared episodic memory uses common Firestore collections with source agent attribution. Shared semantic memory is implemented as a centralized knowledge service backed by Firestore and Vertex AI Search. Clear ownership rules and access patterns prevent conflicts and ensure data quality across the agent system.

How does RAG integrate with AI agent memory?

RAG connects long-term memory to the agent's reasoning by retrieving relevant memories at reasoning time and injecting them into the model's context. Agent-optimized RAG goes beyond standard document retrieval by working with structured memories (episodes, procedures, knowledge entities), managing context window budgets to maximize the value of retrieved information, and tracking which memories the agent actually uses to enable retrieval improvement. On Google Cloud, Vertex AI Search provides the retrieval infrastructure, with embeddings generated by Vertex AI and memories stored across Firestore and Cloud Storage.