Paper

mHC: Manifold-Constrained Hyper-Connections

Yixuan Wei

2026.01.10

·Arxiv·by 네루

#LLM#Deep Learning#Neural Network Architecture#Residual Connection#Foundational Models

Key Points

1Hyper-Connections (HC) enhance model performance by expanding the residual stream, but their unconstrained nature compromises identity mapping, leading to training instability and significant memory access overhead.
2Manifold-Constrained Hyper-Connections (mHC) address this by projecting HC's residual mapping onto a manifold of doubly stochastic matrices, restoring the identity mapping property and ensuring stable signal propagation through norm preservation.
3Furthermore, mHC incorporates rigorous infrastructure optimizations like kernel fusion and selective recomputing, enabling efficient and stable large-scale training with tangible performance improvements and superior scalability.

\mathbf{x}_{l+1} = \mathbf{x}_l + \mathcal{F}(\mathbf{x}_l, \mathbf{W}_l)

Paper

Yixuan Wei

2026.01.10

·Arxiv·by 네루

#LLM#Deep Learning#Neural Network Architecture#Residual Connection#Foundational Models

1Hyper-Connections (HC) enhance model performance by expanding the residual stream, but their unconstrained nature compromises identity mapping, leading to training instability and significant memory access overhead.
2Manifold-Constrained Hyper-Connections (mHC) address this by projecting HC's residual mapping onto a manifold of doubly stochastic matrices, restoring the identity mapping property and ensuring stable signal propagation through norm preservation.
3Furthermore, mHC incorporates rigorous infrastructure optimizations like kernel fusion and selective recomputing, enabling efficient and stable large-scale training with tangible performance improvements and superior scalability.

\mathbf{x}_{l+1} = \mathbf{x}_l + \mathcal{F}(\mathbf{x}_l, \mathbf{W}_l)