Queueing

We're Encore and we build tools to help developers create distributed systems and event-driven applications. In this blog, you're going on an interactive journey to help you understand common queueing strategies. It's been meticulously crafted with love and attention by Sam Rose.

Queues are everywhere. We queue at bars, in restaurants, and at the bank. When you loaded this web page, the request to fetch it interacted with dozens of different queues on its way from your machine to the server this page is hosted on. Queues are fundamental.

In this post, we're going to explore queueing in the context of HTTP requests. We'll start simple and gradually introduce more complex queues. When you're finished with this post, you will know:

Why queues are useful.
3 different types of queue.
How these 3 queues compare to each other.
1 extra queueing strategy you can apply to any type of queue to make sure you don't drop priority requests.

Why do we need queues?

Let's start at the beginning: one client and one server. For this to work, I'm going to need your help. You're going to play a central role in this post by being the user that sends requests.

Goals

Click the button to send a request to the server.
Click the button rapidly to see a request get dropped.

When you clicked, a request traveled from the button to the server. When the request arrived at the server, the server began processing it. The request shrinks in a clock-like fashion until it has disappeared, at which point it is considered processed. If you click fast enough you will notice that the server can only process one request at a time, and requests that arrive while the server is busy get dropped.

Animations too fast? You can control the speed of the animations with the toggle below.

Dropped requests are a problem. If these are HTTP requests, dropping them would mean showing an error page to the user. This could cause the user to leave your site, which could be a lost signup or a lost sale. We want to avoid dropping requests if we can.

One of the ways we could tackle this problem is to introduce a queue. The next example puts a queue between you and the server. New requests will be temporarily highlighted to make it easier to see when they end up in the queue.

Goals

Have 2 requests in the queue at the same time.

Now when a request arrives at the server and the server is busy, the request is queued. The server takes requests off the queue in the order they were added, processing them at a steady rate.

Queues help us smooth the flow of requests to the server. We can't control the rate at which requests arrive, but for the cost of some memory we can hold on to them until the server is ready. Having users wait a bit longer than usual is often better than showing them an error.

FIFO

The queue we used in the previous example is what's called a first-in first-out, or FIFO, queue. As the name suggests, requests are processed in the order they were added.

Goals

Send a request while the queue is full.

While queues do help absorb bursts of requests, they can get full. If a request arrives and the queue is full, we still drop that request. Queues do not prevent requests getting dropped.

You might be thinking: "Why not just make the queue longer? If the queue is longer, it can hold more requests, and there is less risk of dropping them."

Let's give that a try! In this next example, the queue size has been increased from 3 to 5. The catch is that if a user has to wait more than 5 seconds, they will give up. We'll represent this by making the request hollow. When this happens, we refer to it as the request "timing out."

Goals

Fill up all 5 slots in the queue.
Observe a request wait long enough to time out.

Things start off okay, but you will see that when you fill up the queue, the requests that join in the last 2 slots are at risk of timing out. When a request gets to the front of the queue after timing out, the server wastes time processing it. This then increases the chance that the requests behind it will also time out.

If you send requests faster than they can be processed, you can end up only serving timed out requests. All of the server's time is spent serving requests that users have given up on. Queue length is an important parameter to get right, and getting it right depends on your application. If in doubt, it's better to be too short than too long.

"Wait, why don't we just drop requests that have timed out? Why are we serving them at all?"

That's a good question! There's nothing stopping you devising a system to check if requests are still valid before serving them, but this is not something HTTP servers often do by default.

LIFO

One thing we can do to prevent the problem of only serving timed out requests is to use a last-in first-out, or LIFO queue. Sometimes called a "stack", in a LIFO queue the most recently added request is processed first. This is the opposite of FIFO, where the least recently added request is processed first.

Goals

Send a request while the queue has 1 timed out request in it.

Processing the most recent request first means it's not possible for us to be in a situation where we only serve timed out requests. However, the mischievous among you may have noticed that it is possible to have a timed out request stuck at the back of the queue indefinitely if you keep clicking the button.

For web requests, this failure mode is often preferable. It doesn't matter if the request has been stuck for 20 seconds or 20 minutes, the user is gone. May as well serve new requests ASAP and let the old ones clear out when things calm down.

Priority queues

Both FIFO and LIFO queues treat all requests the same. The only thing that differs is which end of the queue requests get added to. It's not uncommon, though, for some requests to be more important than others. For example, you probably want to make sure a checkout request on your store is treated with the highest priority. To help with this, let's look at "priority queues."

The way priority queues work is that when a new request arrives, it is added to the queue in a position determined by its priority. Requests with a higher priority get added closer to the front. Lower priority requests get added behind higher priority ones.

In the following example, we introduce the idea of priority requests. These are visually distinct from requests by colour and by the use of stripes. Priority requests jump the queue, getting placed at the front when they arrive.

Goals

Send a priority request while the queue has 1 request in it.
Send a priority request when the queue is full.

Notice that priority requests get added to the front of the queue. This ensures that they are processed first, and don't wait as long as low priority requests. When the queue is full, however, we still have to drop priority requests.

Active queue management

What we'd like to be able to do is push low priority requests out of the queue to make room when a priority request arrives. This is where "active queue management" (AQM) comes in.

Up until now, we've only dropped requests that arrive when the queue is full. This is called "tail drop," because we're dropping the tail end of the queue. This is a simple strategy, but it has some problems. One of the problems is that we drop priority requests.

To make it less likely we'll drop priority requests, we're going to drop low priority requests before the queue gets full. The way we're going to do it is proportional to how full the queue is. If the queue is empty, we never drop. If the queue is 50% full, there's a 50% chance we'll drop. If the queue is 90% full, there's a 90% chance we'll drop.

Remember, this rule only applies to low priority requests. Priority requests are never dropped unless the queue is full.

Goals

See a request get dropped when the queue is not full.

It may take a few goes to see this in action, but you should notice that as the queue gets more full, more requests get dropped, even though the queue has space in it. Priority requests are only dropped when the queue is full.

This is called "random early detection" (RED), and it's trading off dropping more low priority requests in return for dropping fewer priority requests. Best of all, it's cheap and simple to implement. Because of this, RED is a commonly used AQM algorithm in the wild. Technically speaking, because we're using different probabilities for low priority and high priority, what we're using here is called "weighted random early detection" (WRED).

Comparing queues

We've spoken about lots of different types of queue, let's see how they compare to each other.

Below we see all four queues we've explored throughout this post: FIFO, LIFO, priority, and priority+RED. Clicking the buttons will send requests to all queues. Work through all of the goals to generate the data we need for a good comparison. It looks like a lot at first, but I promise it won't take you more than a minute or two.

Goals

Add 15 requests to each queue.
Add 10 priority requests to each queue.
Drop 15 requests.
Drop 10 priority requests.

FIFO

LIFO

Priority

Priority+RED

Now you've generated a good amount of data, let's dive into the numbers! All of the data below is taken from the requests you just sent above. You may also notice the graphs updating as requests get processed. Feel free to send more requests to see how it changes the data.

Wait time

Let's start with arguably the most important metric: how long do requests spend inside of each queue? Below is a bar graph showing latency percentiles for each queue, split out by requests and priority requests. Change the percentile using the toggle below the graph; does anything stand out as you increase the percentile?

Reminder that "50th percentile" is the time at which 50% of requests are at or below. So if the 50th percentile is 1s, that means 50% of requests are processed in 1s or less.

At least to me, the stand out here is LIFO. At the 50th percentile it looks to perform well, but gets dramatically worse as you approach the 99th percentile. This makes sense, as it tries to serve the newest requests as fast as it can. This gives it a great median at the cost of very poor tail-end performance.

Dropped requests

Below are a set of bar graphs showing how many dropped requests each queue had, split out by priority requests and requests.

You should notice that priority with RED has the fewest dropped priority requests, and the most dropped low priority requests. This is the trade-off we make when we use RED. All of the other queues should be equal.

It isn't a sure thing for priority with RED to always have the fewest dropped priority requests, it can sometimes have the same as the other queues depending on how you clicked. It will never have more dropped priority requests than the other queues, though.

Processed time outs

Lastly, let's look at how many timed out requests were processed by servers.

It's a little harder to predict this one because it depends more on how you completed the goals, but you should see that FIFO and LIFO have the most timed out priority requests. The priority queues will process priority requests faster, and thus have them time out less.

You should also see that LIFO has the fewest timed out requests overall. This is, again, because it prioritises the newest requests.

Conclusion

I hope you enjoyed this journey through queueing! Let's recap what we've learned:

There are lots of different types of queue, each with their own set of trade-offs.
Queues help absorb bursts of requests, but don't fully prevent requests from being dropped.
Priority queues can help ensure important requests are processed first.
Active queue management can help process more priority requests by dropping low priority requests before the queue gets full.

For HTTP requests, it's common to see FIFO queues used in practice. I would argue, with this post to back me up, that LIFO queues are a better choice for most workloads. If you need to prioritise some requests over others, a priority queue with AQM is a good choice, though seen less often in the wild.

Queueing is both a wide and deep topic. We've looked at only one use-case here, but queues are used everywhere. I hope this post has ignited a desire to learn more about this topic, and if it has I can recommend the following reads:

About the author

Sam Rose has been programming professionally for over 10 years, with a focus on the backend and SRE domains. He has worked at a wide range of companies, from large ones like Google to smaller ones like Nebula.

If you enjoyed this post, Sam has a collection of similarly visual and interactive posts on his personal site and here on the Encore Blog: Retries - an interactive study of common retry methods.

To keep up to date with his work you can follow him on Twitter, and if you want to support what he does he also has Patreon.

An interactive study of queueing strategies

Queueing

An interactive study of queueing strategies

Goals

Goals

Goals

Goals

Goals

Goals

Goals

Goals

More Articles