Paper

End-to-End Test-Time Training for Long Context

Marcel Rød

2026.01.31

·Arxiv·by 네루

#LLM#Continual Learning#Test-Time Training#Transformer#Long Context

핵심 포인트

1본 논문은 long-context language modeling을 continual learning 문제로 재정의하고, standard Transformer와 Test-Time Training (TTT)을 결합하여 기존 아키텍처의 한계를 극복합니다.
2제안하는 TTT-E2E는 next-token prediction을 통해 test-time에 모델이 계속 학습하게 하고, meta-learning으로 training-time에 TTT에 최적화된 초기화를 학습하여 End-to-End 방식을 구현합니다.
3그 결과, TTT-E2E는 full attention Transformer와 유사하게 context 길이에 따라 성능이 확장되면서도, RNN처럼 context 길이에 무관하게 일정한 inference latency를 유지하여 128K context에서 2.7배 더 빠릅니다.

O(T^2)

Paper

Marcel Rød

2026.01.31

·Arxiv·by 네루

#LLM#Continual Learning#Test-Time Training#Transformer#Long Context

1본 논문은 long-context language modeling을 continual learning 문제로 재정의하고, standard Transformer와 Test-Time Training (TTT)을 결합하여 기존 아키텍처의 한계를 극복합니다.
2제안하는 TTT-E2E는 next-token prediction을 통해 test-time에 모델이 계속 학습하게 하고, meta-learning으로 training-time에 TTT에 최적화된 초기화를 학습하여 End-to-End 방식을 구현합니다.
3그 결과, TTT-E2E는 full attention Transformer와 유사하게 context 길이에 따라 성능이 확장되면서도, RNN처럼 context 길이에 무관하게 일정한 inference latency를 유지하여 128K context에서 2.7배 더 빠릅니다.

O(T^2)