Titans + MIRAS: Helping AI have long-term memory
Key Points
- 1Google introduces Titans, an architecture featuring a novel deep neural network long-term memory, and MIRAS, a unifying theoretical framework, designed to allow AI models to process massive contexts and adapt in real-time.
- 2Titans updates its core memory by selectively incorporating "surprising" or novel information based on an internal gradient signal, ensuring efficient retention of critical data while actively running.
- 3The MIRAS framework reinterprets sequence models as associative memory, enabling the exploration of robust, non-Euclidean objectives, collectively leading to models with superior accuracy, lower perplexity, and unprecedented long-context recall exceeding 2 million tokens.
The Titans architecture and MIRAS framework are introduced to enable AI models to process massive contexts and update core memory during active operation, addressing the limitations of traditional Transformers and RNNs/SSMs in scaling to extreme sequence lengths.
Titans Architecture:
Titans introduces a novel neural long-term memory module, replacing fixed-size memory representations found in conventional RNNs with a deep neural network, specifically a Multi-Layer Perceptron (MLP). This MLP-based memory provides significantly enhanced expressive power to summarize large volumes of information without losing crucial context. The model actively learns to recognize and retain important relationships and conceptual themes across the entire input sequence.
A core mechanism in Titans is the "surprise metric," which serves as a selective memory update mechanism. This metric quantifies the divergence between the model's current memory state and the new input. Technically, this "surprise" is represented by the magnitude of the internal error signal, often interpreted as a gradient. For instance, if the new input significantly deviates from the model's expectation based on its current memory state, the gradient will be high, signaling a "high surprise." This indicates that the information is novel, important, or anomalous, and should be prioritized for permanent storage in the long-term memory module. Conversely, a low gradient (low surprise) implies the input is expected and can be safely skipped for explicit memorization in the permanent state. This allows Titans to selectively update its long-term memory only with context-breaking information, ensuring efficiency.
Titans refines this selective update with two elements:
- Momentum: It considers both instantaneous "momentary surprise" from the current input and "past surprise" derived from the recent context flow. This captures relevant subsequent information even if individual tokens aren't independently surprising.
- Forgetting (Weight Decay): An adaptive weight decay mechanism acts as a retention gate to manage finite memory capacity. This regularization discards information that is no longer relevant, analogous to a forgetting process.
MIRAS Framework:
MIRAS provides a unified theoretical framework for sequence modeling, positing that all effective sequence models are fundamentally complex associative memory modules. It reinterprets diverse architectures as different methods for efficiently combining new information with existing memories while retaining essential concepts. MIRAS defines a sequence model through four interdependent design choices:
- Memory Architecture: The structural representation of stored information (e.g., vector, matrix, or deep MLP as in Titans).
- Attentional Bias: The internal learning objective that the model optimizes to prioritize information for storage or recall.
- Retention Gate: A memory regularizer that balances the integration of new learning against the retention of past knowledge. Forgetting mechanisms are reinterpreted as specific forms of regularization.
- Memory Algorithm: The optimization algorithm (e.g., gradient-based optimizers) used to update the memory state.
Crucially, MIRAS transcends the common reliance on Mean Squared Error (MSE) or dot-product similarity for bias and retention in existing models. It offers a generative framework to explore a richer design space, enabling the development of novel architectures with non-Euclidean objectives and regularization.
Examples of MIRAS variants that leverage this extended design space include:
- YAAD: Employs a Huber loss for its attentional bias, making it less sensitive to outliers or major errors in input data by using a gentler penalty function than MSE, thus increasing robustness to messy data. The Huber loss is defined as:
where is the error and is a threshold.
- MONETA: Investigates the use of more complex and strict mathematical penalties, such as generalized norms, for both attentional bias and retention. This explores whether these disciplined rules can lead to more powerful and stable long-term memory systems.
- MEMORA: Achieves memory stability by enforcing its memory to behave as a strict probability map. This constraint ensures that memory state updates are controlled and balanced, guaranteeing a stable integration process for new information.
Experiments and Results:
Titans and MIRAS variants (YAAD, MONETA, MEMORA) were rigorously compared against state-of-the-art architectures like Transformer++, Mamba-2, and Gated DeltaNet across language modeling (C4, WikiText), zero-shot reasoning (HellaSwag, PIQA), genomic modeling, and time-series forecasting.
Key findings include:
- Higher Accuracy and Lower Perplexity: The proposed models consistently demonstrated improved accuracy and lower perplexity (a measure of model surprise) compared to baselines.
- Crucial Memory Depth: Ablation studies confirmed that the depth of the deep neural network used for the long-term memory module is crucial, leading to lower perplexity and better scaling properties with increasing sequence length.
- Outperformance in Long Contexts: Titans significantly outperformed baselines, including larger models like GPT-4, on the BABILong benchmark, a task requiring reasoning over extremely long documents. It demonstrated the capability to scale effectively to context window sizes exceeding 2 million tokens.
- Efficiency: Despite enhanced capabilities, these models maintain efficient, parallelizable training and fast linear inference speeds.
In conclusion, Titans and MIRAS represent a significant advancement in sequence modeling by introducing deep neural networks as dynamic memory modules that learn and update online. MIRAS provides a powerful theoretical unification, linking online optimization, associative memory, and architectural design, moving beyond standard Euclidean paradigms to enable a new generation of efficient and expressive long-context AI models.