A Little Explanation of Little's Law

Written at 2026-06-06

I recently read Concurrency in Go by Katherine Cox-Buday. In the “Queuing” section, there was a discussion of how we can use Little’s Law to predict our pipeline’s throughput, given sufficient sampling.

I honestly wondered why I had not come across this simple idea before, after finishing that part. As I understand it, it can potentially be used in almost any situation where a queue is involved. Not just message queues, even things like physical queues. So I thought I’d write an intuitive explanation to help it stick and share the idea.

Some Intuition for The Little’s Law

Suppose that you are the owner of a coffee shop. Your coffee shop is located in a busy area, so your place is usually active. Unfortunately (or fortunately, depending on where you look from), at certain hours of the day, the demand for coffee increases so much that you become unable to serve it at the rate people join the line. As the line gets longer and longer, some people who would otherwise join decide not to get in the queue. So, you lose some of the potential customers that you otherwise would not lose.

Here is a simple animation that demonstrates this intuition: customers arrive, wait in line, get served, or leave when the line is already full.

Look at how many customers you’re losing just because you can’t keep up with the customer arrival rate! Being a smart coffee shop owner, you pause for a moment and think, maybe during rush hours, I can increase my throughput to match the rate at which people are getting into the line.

After thinking a bit, you realize that there are essentially three important parameters that you should be looking at:

The length of the line is where new customers are still willing to join, so that you don’t lose any customers. Let’s call this $L$ .
The amount of time it takes to serve for a customer to get their coffee after joining the queue. In other words, how long each customer waits until their order is completed. Let’s call this $W$ .
The rate at which customers join the line, that is, how many customers per minute arrive. Let’s call this $\lambda$ .

Now, here, there is a relation between these parameters. If you don’t want to lose any customers, $L$ simply needs to be greater than or equal to $W\lambda$ . Otherwise, we have more customers than we can handle; The queue grows, and some customers decide not to join.

Since we cannot control the customer arrival rate, the best we can do is reduce the time customers spend in the system, $W$ . For this, we can just try to add another coffee machine and hire another employee.

With the same arrival rate, adding a second service line gives customers another path through the system. The individual orders do not become faster, but the average customer spends less time waiting for service to begin.

Since we’ve reduced the average waiting time for each customer simply by adding a new line, we’re now in a much better position than before and can avoid losing customers unnecessarily.

We just examined Little’s Law through an example. I think it’s now a good time to talk a bit more about the generalized version of the law itself.

The Little’s Law

Now, if we are to generalize this relation in a more abstract way, we arrive at the equation $L = \lambda W$ , where:

$L$ is the average number of items in the system,
$\lambda$ is the average arrival rate of items, and
$W$ is the average time an item spends in the system.

Here is a small simulator that shows how changing $L$ , $\lambda$ , or $W$ affects the flow of items through the system.

Now, you might question that, in the first example, it was not necessary for the number of users willing to join ( $L$ ) to equal the rate at which we dequeue users ( $W\lambda$ ). Here, the difference is that we have defined the $L, \lambda$ , and $W$ in such a way that it is logically not possible to break this formula. Katherine Cox-Buday, in the Concurrency in Go book, shortly notes an important point as follows:

The equation only applies to so-called stable systems. In a pipeline, a stable system is one in which the rate that work enters the pipeline, or ingress, is equal to the rate in which it exits the system, or egress. If the rate of ingress exceeds the rate of egress, your system is unstable and has entered a death-spiral. If the rate of ingress is less than the rate of egress, you still have an unstable system, but all that’s happening is that your resources aren’t being utilized completely.

Now, we can still use this “law” as a target-setting tool even when our systems are unstable: Simply treat our systems as if they were stable, and determine the values need to make our system work properly.

Consider we have an API receiving 50 requests per second. Say we want to keep the average response time under 200 ms so that users don’t get frustrated. We can estimate the number of concurrent requests that should be in the system and adjust the number of threads accordingly: $L=\lambda W = 50 \times 0.2 = 10$ So, we need around 10 concurrent workers processing requests!

The nice thing is that we can also go the other way around if needed. Simply fix the number of concurrent requests the system can handle and the arrival rate of requests. Then, just calculate how fast we need to process each request to avoid bottlenecks using Little’s Law.

The Real World

Of course, as this helpful criticism points out, Real systems are usually not as simple as we stated so far.

First of all, typical server load is usually not as uniform as the animations we showed earlier. Whatever causes one user to send a request may also cause many other users to send requests at the same time. This is similar to the coffee shop example from earlier, where we could not keep up with demand because many customers arrived during the same busy periods.

Most of the time, the requests-over-time graph will look more like a Poisson distribution than a uniform distribution. And that’s not the only discrepancy between our simulation and real-world systems. There is also jitter caused by network latency and other real-world effects. Our own processing speed can vary as well, especially under load. And there are probably many other factors that we might overlook.

So, instead of requests arriving at fixed intervals, we might actually see something more like this:

The point I am trying to make is that, while we hope for the best, we should still be prepared for the worst. In practice, this means it may be wiser to apply Little’s Law using not the average request arrival rate of your system, but the arrival rate during its busiest periods (using P95, or P99, as if there were the averages).

This also means that we should treat the results we get from Little’s Law more as lower bounds that must be satisfied, than as a sufficient success criterion. If you fall below the targets you derive from Little’s Law, that’s a bad sign. But meeting those targets does not necessarily mean everything will be fine. In that sense, Little’s Law is more like a litmus test: failing it is very bad, but passing it does not mean everything will be fine.

To Conclude

To be honest, I’ve never seen developers actually use this to estimate target goals when designing systems. Maybe that’s because we don’t run into tight constraints very often; our computers and programs are fast enough that we don’t have to think about these limits at all. With that said, I think this is still worth to keep in mind as it is a very simple yet powerful idea. Even if you usually get away without doing this kind of math, it’s reassuring to see how much headroom your system has, and to know exactly what to target just in case things start to slow down.

As a final note, it was the simulations in Alperen Keleş’s What Is Random Generation? blog post that inspired me to try something similar and begin incorporating interactive visuals into my own posts. So, I would also like to thank him for that inspiration as well.