generative-ai/gemini/agents/always-on-memory-agent at main · GoogleCloudPlatform/generative-ai
Blog

generative-ai/gemini/agents/always-on-memory-agent at main · GoogleCloudPlatform/generative-ai

GoogleCloudPlatform
2026.03.13
·GitHub·by 이호민
#Agent#Gemini#LLM#Memory#RAG

Key Points

  • 1This project introduces an "Always On Memory Agent" designed to overcome the common issue of AI agents having amnesia, providing a persistent, evolving memory that continuously processes, consolidates, and connects information like a human brain.
  • 2Unlike traditional approaches, it avoids vector databases or embeddings, instead using Gemini 3.1 Flash-Lite as a lightweight background process to ingest multimodal data, review unconsolidated memories, find connections, and synthesize insights.
  • 3The agent offers flexible ingestion methods (file watcher, dashboard, HTTP API), runs on a cost-effective LLM for 24/7 operation, and provides an API and Streamlit dashboard for querying, managing, and observing its evolving memory.

The paper introduces the "Always On Memory Agent," a novel approach to addressing the persistent "amnesia" problem in AI agents by providing them with a continuously evolving, background memory system. Unlike traditional methods such as passive vector databases with RAG (Retrieval-Augmented Generation), detail-losing conversation summaries, or expensive knowledge graphs, this system operates 24/7 as a lightweight background process, actively consolidating and connecting information without relying on vector databases or embeddings. Its design is inspired by the human brain's memory consolidation process during sleep.

The core methodology revolves around an orchestrator managing three specialized agents, all leveraging Google's Gemini 3.1 Flash-Lite model for its cost-effectiveness, speed, and sufficient intelligence for continuous background operations. Persistent memory is managed via an SQLite database.

  1. Ingestion: The IngestAgent is responsible for continuously feeding information into the memory store. It supports a wide array of file types (27 in total, including text, image, audio, video, and PDF) by using Gemini's multimodal capabilities to extract structured information. For any input, the IngestAgent processes it to produce:
    • A concise summary of the content.
    • Identified entities within the content.
    • Relevant topics.
    • An importance score (e.g., 0.8).
This process is triggered either by a file watcher monitoring a designated inbox directory, manual upload via a Streamlit dashboard, or an HTTP API endpoint (POST /ingest) accepting text content.

  1. Consolidation: The ConsolidateAgent operates on a timer (defaulting to every 30 minutes), mimicking the human brain's memory consolidation during sleep. Its primary functions are:
    • Reviewing Unconsolidated Memories: It identifies and processes memories that have not yet undergone consolidation.
    • Finding Connections: It analyzes relationships between different memories, identifying semantic links and interdependencies. For example, it might connect "AI agent reliability" with "better memory architectures."
    • Generating Cross-Cutting Insights: Beyond simple connections, it synthesizes higher-level, overarching insights that emerge from the collective memory. For instance, deriving "The bottleneck for next-gen AI tools is the transition from static RAG to dynamic memory systems" from a set of related memories.
    • Compressing Related Information: It can abstract or summarize redundant or highly related information to create a more efficient and interconnected memory structure. This process implicitly uses the LLM to analyze the content of multiple memories and generate a new, more refined or abstract representation, which is then stored as an "insight" or "connection."
  1. Query: The QueryAgent handles natural language questions posed by the user. When a query is received, this agent accesses the entire memory store, including both the raw ingested memories and the consolidated connections and insights generated by the ConsolidateAgent. It then synthesizes a comprehensive answer, crucially providing source citations back to the original memories, demonstrating traceability and allowing users to verify information. An HTTP API endpoint (GET/query?q=...GET /query?q=...) facilitates querying.

The system is built on Google ADK for agent orchestration and provides a RESTful HTTP API for programmatic interaction, along with an optional Streamlit dashboard for visual management, including memory browsing, deletion, and manual consolidation triggers. The choice of Gemini 3.1 Flash-Lite ensures that the continuous, always-on operation is economically viable and performs with low latency.