Paper

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

2026.01.14

·Web·by 이호민

#LLM#Sparsity#Conditional Memory#MoE#N-gram

Key Points

1This paper introduces "conditional memory" as a novel sparsity axis for large language models, complementing MoE, to address inefficient knowledge retrieval by separating static knowledge storage from dynamic computation.
2It proposes Engram, an O(1) lookup module that leverages hashed N-grams and context-aware gating to integrate this conditional memory into Transformer backbones.
3Experiments demonstrate Engram's superior performance over iso-parameter and iso-FLOPs MoE baselines across various benchmarks, identify a U-shaped scaling law for optimal sparsity allocation, and highlight its infrastructure efficiency due to deterministic prefetching.

X = (x_1, \ldots, x_T)

Paper

2026.01.14

·Web·by 이호민

#LLM#Sparsity#Conditional Memory#MoE#N-gram

1This paper introduces "conditional memory" as a novel sparsity axis for large language models, complementing MoE, to address inefficient knowledge retrieval by separating static knowledge storage from dynamic computation.
2It proposes Engram, an O(1) lookup module that leverages hashed N-grams and context-aware gating to integrate this conditional memory into Transformer backbones.
3Experiments demonstrate Engram's superior performance over iso-parameter and iso-FLOPs MoE baselines across various benchmarks, identify a U-shaped scaling law for optimal sparsity allocation, and highlight its infrastructure efficiency due to deterministic prefetching.

X = (x_1, \ldots, x_T)