Netflix is committed to providing a seamless streaming experience to millions of users simultaneously. To achieve this goal, they have introduced the concept of prioritized load shedding, initially implemented at the API gateway level and later extended to individual service levels, focusing on the video streaming control plane and data plane.
PlayAPI, a critical backend service on the video streaming control plane, handles device-initiated manifest and license requests necessary to start playback. Netflix categorizes these requests into two types based on their criticality:
To handle large traffic spikes, high backend latency, or an under-scaled backend service, Netflix implemented a concurrency limiter within PlayAPI that prioritizes user-initiated requests over prefetch requests without physically sharding the two request handlers. This mechanism uses the partitioning functionality of the open-source Netflix/concurrency-limits Java library.
In steady state, there is no throttling, and the prioritization has no effect on the handling of pre-fetch requests. The prioritization mechanism only kicks in when a server is at the concurrency limit and needs to reject requests.
To validate the load-shedding implementation, Netflix used Failure Injection Testing to inject 2-second latency in pre-fetch calls, simulating a scenario where a pre-fetch cluster for a downstream service is experiencing high latency.
During an infrastructure outage at Netflix that impacted streaming for many users, Netflix experienced a 12x spike in pre-fetch requests per second from Android devices after the outage was fixed, presumably due to a backlog of queued requests.
Based on the success of this approach, Netflix created an internal library to enable services to perform prioritized load shedding based on pluggable utilization measures, with multiple priority levels:
Most services at Netflix autoscale on CPU utilization, so it is a natural measure of system load to tie into the prioritized load shedding framework. Once a request is mapped to a priority bucket, services can determine when to shed traffic from a particular bucket based on CPU utilization.
Some services at Netflix are IO-bound by backing services or datastores that can apply back pressure via increased latency when overloaded. For these services, Netflix reuses the prioritized load shedding techniques, introducing new utilization measures to feed into the shedding logic:
These utilization measures provide early warning signs that a service is generating too much load to a backend, allowing it to shed low-priority work before overwhelming the backend.
Netflix identified two anti-patterns to avoid with load shedding:
The implementation of service-level prioritized load shedding has proven to be a significant step forward in maintaining high availability and excellent user experience for Netflix customers, even during unexpected system stress.
Netflix continues to innovate and enhance its resilience strategies to ensure smooth streaming experiences for its users, no matter the challenges faced.
Ask anything...