Service

GitHub - deepseek-ai/profile-data: Analyze computation-communication overlap in V3/R1.

deepseek-ai

2025.03.08

·GitHub·by Anonymous

#LLM#Profiling#DeepSeek#MoE#Optimization

Key Points

1DeepSeek-AI publicly shares profiling data from their training and inference framework to demonstrate communication-computation overlap strategies and low-level implementation details, captured via PyTorch Profiler.
2The training profile illustrates an overlapping strategy for DualPipe forward and backward chunks, each with four MoE layers, using an EP64 and TP1 configuration, excluding PP communication.
3Inference profiles for prefilling and decoding utilize two micro-batches to overlap computation with all-to-all communication; prefilling balances attention load, while decoding frees GPU SMs during all-to-all operations.

Service

deepseek-ai

2025.03.08

·GitHub·by Anonymous

#LLM#Profiling#DeepSeek#MoE#Optimization

1DeepSeek-AI publicly shares profiling data from their training and inference framework to demonstrate communication-computation overlap strategies and low-level implementation details, captured via PyTorch Profiler.
2The training profile illustrates an overlapping strategy for DualPipe forward and backward chunks, each with four MoE layers, using an EP64 and TP1 configuration, excluding PP communication.
3Inference profiles for prefilling and decoding utilize two micro-batches to overlap computation with all-to-all communication; prefilling balances attention load, while decoding frees GPU SMs during all-to-all operations.