ELDR is a new expert-locality-aware decode router designed for prefill-decode (PD) disaggregated MoE serving. Unlike traditional routers that focus solely on load balancing, ELDR utilizes prefill expert activations to optimize routing based on expert locality. This approach addresses latency variances caused by the weight-loading requirements of distinct experts within decode workers.

Read original