We're Encore and we build tools to help developers create distributed systems and event-driven applications. In this blog, you're going on an interactive journey to help you understand common queueing strategies. It's been meticulously crafted with love and attention by Sam Rose.
Queues are everywhere. We queue at bars, in restaurants, and at the bank. When you loaded this web page, the request to fetch it interacted with dozens of different queues on its way from your machine to the server this page is hosted on. Queues are fundamental.
In this post, we're going to explore queueing in the context of HTTP requests. We'll start simple and gradually introduce more complex queues. When you're finished with this post, you will know:
Let's start at the beginning: one client and one server. For this to work, I'm going to need your help. You're going to play a central role in this post by being the user that sends requests.
When you
Dropped requests are a problem. If these are HTTP requests, dropping them would mean showing an error page to the user. This could cause the user to leave your site, which could be a lost signup or a lost sale. We want to avoid dropping requests if we can.
One of the ways we could tackle this problem is to introduce a queue. The next
example puts a queue between you and the server. New requests will be
temporarily
Have 2
Now when a
Queues help us smooth the flow of requests to the server. We can't control the rate at which requests arrive, but for the cost of some memory we can hold on to them until the server is ready. Having users wait a bit longer than usual is often better than showing them an error.
The queue we used in the previous example is what's called a first-in first-out, or FIFO, queue. As the name suggests, requests are processed in the order they were added.
Send a
While queues do help absorb bursts of
You might be thinking: "Why not just make the queue longer? If the queue is longer, it can hold more requests, and there is less risk of dropping them."
Let's give that a try! In this next example, the queue size has been increased
from 3 to 5. The catch is that if a user has to wait more than 5 seconds, they
will give up. We'll represent this by making the request
Fill up all 5 slots in the
Observe a
Things start off okay, but you will see that when you fill up the
If you send requests faster than they can be processed, you can end up only serving timed out requests. All of the server's time is spent serving requests that users have given up on. Queue length is an important parameter to get right, and getting it right depends on your application. If in doubt, it's better to be too short than too long.
"Wait, why don't we just drop requests that have timed out? Why are we serving them at all?"
That's a good question! There's nothing stopping you devising a system to check if requests are still valid before serving them, but this is not something HTTP servers often do by default.
One thing we can do to prevent the problem of only serving timed out requests is to use a last-in first-out, or LIFO queue. Sometimes called a "stack", in a LIFO queue the most recently added request is processed first. This is the opposite of FIFO, where the least recently added request is processed first.
Send a
Processing the most recent
For web requests, this failure mode is often preferable. It doesn't matter if the request has been stuck for 20 seconds or 20 minutes, the user is gone. May as well serve new requests ASAP and let the old ones clear out when things calm down.
Both FIFO and LIFO queues treat all requests the same. The only thing that differs is which end of the queue requests get added to. It's not uncommon, though, for some requests to be more important than others. For example, you probably want to make sure a checkout request on your store is treated with the highest priority. To help with this, let's look at "priority queues."
The way priority queues work is that when a new request arrives, it is added to the queue in a position determined by its priority. Requests with a higher priority get added closer to the front. Lower priority requests get added behind higher priority ones.
In the following example, we introduce the idea of
Send a
Send a
Notice that
What we'd like to be able to do is push low priority requests out of the queue to make room when a priority request arrives. This is where "active queue management" (AQM) comes in.
Up until now, we've only dropped requests that arrive when the queue is full. This is called "tail drop," because we're dropping the tail end of the queue. This is a simple strategy, but it has some problems. One of the problems is that we drop priority requests.
To make it less likely we'll drop priority requests, we're going to drop low priority requests before the queue gets full. The way we're going to do it is proportional to how full the queue is. If the queue is empty, we never drop. If the queue is 50% full, there's a 50% chance we'll drop. If the queue is 90% full, there's a 90% chance we'll drop.
Remember, this rule only applies to low priority requests. Priority requests are never dropped unless the queue is full.
See a
It may take a few goes to see this in action, but you should notice that as the
This is called "random early detection" (RED), and it's trading off dropping more low priority requests in return for dropping fewer priority requests. Best of all, it's cheap and simple to implement. Because of this, RED is a commonly used AQM algorithm in the wild. Technically speaking, because we're using different probabilities for low priority and high priority, what we're using here is called "weighted random early detection" (WRED).
We've spoken about lots of different types of queue, let's see how they compare to each other.
Below we see all four queues we've explored throughout this post:
Add 15
Add 10
Now you've generated a good amount of data, let's dive into the numbers! All of the data below is taken from the requests you just sent above. You may also notice the graphs updating as requests get processed. Feel free to send more requests to see how it changes the data.
Let's start with arguably the most important metric: how long do requests spend
inside of each
Reminder that "50th percentile" is the time at which 50% of requests are at or below. So if the 50th percentile is 1s, that means 50% of requests are processed in 1s or less.
At least to me, the stand out here is
Below are a set of bar graphs showing how many
You should notice that
It isn't a sure thing for
Lastly, let's look at how many
It's a little harder to predict this one because it depends more on how you
completed the goals, but you should see that
You should also see that
I hope you enjoyed this journey through queueing! Let's recap what we've learned:
For HTTP requests, it's common to see FIFO queues used in practice. I would argue, with this post to back me up, that LIFO queues are a better choice for most workloads. If you need to prioritise some requests over others, a priority queue with AQM is a good choice, though seen less often in the wild.
Queueing is both a wide and deep topic. We've looked at only one use-case here, but queues are used everywhere. I hope this post has ignited a desire to learn more about this topic, and if it has I can recommend the following reads:
Sam Rose has been programming professionally for over 10 years, with a focus on the backend and SRE domains. He has worked at a wide range of companies, from large ones like Google to smaller ones like Nebula.
If you enjoyed this post, Sam has a collection of similarly visual and interactive posts on his personal site and here on the Encore Blog: Retries - an interactive study of common retry methods.
To keep up to date with his work you can follow him on Twitter, and if you want to support what he does he also has Patreon.