Service

GitHub - deepseek-ai/EPLB: Expert Parallelism Load Balancer

deepseek-ai

2025.03.08

·GitHub·by Anonymous

#LLM#MoE#Load Balancing#DeepSeek#AI

Key Points

1The Expert Parallelism Load Balancer (EPLB) aims to balance GPU loads in expert parallelism by employing a redundant experts strategy that duplicates heavy-loaded experts and heuristically packs them.
2It offers two policies: Hierarchical Load Balancing for prefilling, which balances expert groups across nodes then replicates within, and Global Load Balancing for decoding, which replicates experts globally.
3Inspired by DeepSeek-V3, EPLB also attempts to place experts of the same group on the same node to minimize inter-node data traffic, providing an expert replication and placement plan based on estimated expert loads.

Service

deepseek-ai

2025.03.08

·GitHub·by Anonymous

#LLM#MoE#Load Balancing#DeepSeek#AI

1The Expert Parallelism Load Balancer (EPLB) aims to balance GPU loads in expert parallelism by employing a redundant experts strategy that duplicates heavy-loaded experts and heuristically packs them.
2It offers two policies: Hierarchical Load Balancing for prefilling, which balances expert groups across nodes then replicates within, and Global Load Balancing for decoding, which replicates experts globally.
3Inspired by DeepSeek-V3, EPLB also attempts to place experts of the same group on the same node to minimize inter-node data traffic, providing an expert replication and placement plan based on estimated expert loads.