Blog

GitHub - maderix/ANE: Training neural networks on Apple Neural Engine via reverse-engineered private APIs

maderix

2026.03.03

·GitHub·by 이호민

#ANE#Apple Neural Engine#Machine Learning#Performance Benchmarking#Reverse Engineering

Key Points

1This research project reverse-engineers Apple Neural Engine (ANE) private APIs to enable direct neural network training on Apple Silicon, demonstrating that the ANE is capable of computation beyond inference.
2By utilizing undocumented `_ANEClient` and `_ANECompiler` APIs and generating custom MIL programs, the project executes full backpropagation for a transformer layer, with some operations handled by the ANE and others by the CPU.
3The project serves as a proof-of-concept, achieving 11.2% ANE utilization (1.78 TFLOPS sustained) on an M4 chip for a single transformer layer, despite acknowledging current limitations like low peak utilization and partial CPU fallback.

\text{W}_2^\text{T} + \text{SiLU}_\text{bwd} + \text{W}_1^\text{T} + \text{W}_3^\text{T}

Blog

maderix

2026.03.03

·GitHub·by 이호민

#ANE#Apple Neural Engine#Machine Learning#Performance Benchmarking#Reverse Engineering

1This research project reverse-engineers Apple Neural Engine (ANE) private APIs to enable direct neural network training on Apple Silicon, demonstrating that the ANE is capable of computation beyond inference.
2By utilizing undocumented `_ANEClient` and `_ANECompiler` APIs and generating custom MIL programs, the project executes full backpropagation for a transformer layer, with some operations handled by the ANE and others by the CPU.
3The project serves as a proof-of-concept, achieving 11.2% ANE utilization (1.78 TFLOPS sustained) on an M4 chip for a single transformer layer, despite acknowledging current limitations like low peak utilization and partial CPU fallback.

\text{W}_2^\text{T} + \text{SiLU}_\text{bwd} + \text{W}_1^\text{T} + \text{W}_3^\text{T}