Paper

GitHub - EverMind-AI/MSA

EverMind-AI

2026.03.19

·GitHub·by 네루

#Attention#LLM#Memory#RAG#Scalability

Key Points

1Memory Sparse Attention (MSA) is an end-to-end trainable, scalable sparse latent-memory framework designed to overcome LLM context length limitations, enabling efficient processing of up to 100M tokens.
2MSA achieves this through innovations like scalable sparse attention with document-wise RoPE for extrapolation, KV cache compression via a Memory Parallel inference engine, and Memory Interleave for multi-round reasoning.
3Evaluated on long-context QA and Needle-in-a-Haystack benchmarks, MSA consistently outperforms RAG and other leading long-context models, exhibiting remarkable stability with less than 9% degradation from 16K to 100M tokens.

O(L)

Paper

EverMind-AI

2026.03.19

·GitHub·by 네루

#Attention#LLM#Memory#RAG#Scalability

1Memory Sparse Attention (MSA) is an end-to-end trainable, scalable sparse latent-memory framework designed to overcome LLM context length limitations, enabling efficient processing of up to 100M tokens.
2MSA achieves this through innovations like scalable sparse attention with document-wise RoPE for extrapolation, KV cache compression via a Memory Parallel inference engine, and Memory Interleave for multi-round reasoning.
3Evaluated on long-context QA and Needle-in-a-Haystack benchmarks, MSA consistently outperforms RAG and other leading long-context models, exhibiting remarkable stability with less than 9% degradation from 16K to 100M tokens.

O(L)