Deriving the Poisson Distribution

The Poisson distribution can be derived from the binomial distribution by doing two steps:

  1. substitute \( \frac{\mu}{n} \) for p
  2. Let n increase without bound

Step one is possible because the mean of a binomial distribution is \( \mu = np\). So another way of expressing p, the probability of success on a single trial, is \( \frac{\mu}{n} \). This has some intuition. Recall that a binomial distribution gives the probability of a number of successes (x) in a fixed number of trials (n) for some probability (p). So if an event has a probability of 0.2, and we observe 10 trials (where the trials are independent), the expected value of seeing the event occur is 10(0.2) = 2. On average we would see the event happen twice in 10 trials. Which means we can state the probability as a rate, \( \frac{2}{10}\). Success occurs, on average, 2 times per 10 trials.

The next step leads to the Poisson. We let the number of trials increase without bound, which means the probability shrinks. Since n is in the denominator of \( \frac{\mu}{n} \), this gives a small chance of success in a huge number of trials. So whereas the binomial deals with a fixed number of trials, the Poisson accounts for an unlimited number of trials in time or space. Taking the limit as n tends to infinity gives us the Poisson. Here’s how.

Recall the binomial probability mass function:
$$ f(x) = \frac{n!}{x!(n-x)!}p^{x}(1-p)^{n-x}$$

Substitute \( \frac{\mu}{n} \) for p:
$$ f(x) = \frac{n!}{x!(n-x)!}(\frac{\mu}{n})^{x}(1-\frac{\mu}{n})^{n-x}$$

Now the fun begins. Let n grow very large:
$$ \lim_{n \to \infty}\frac{n!}{x!(n-x)!}(\frac{\mu}{n})^{x}(1-\frac{\mu}{n})^{n-x}$$

But wait! Don’t let n blow up just yet. Let’s do some rearranging first to help us take the limit:
$$ \lim_{n \to \infty}\frac{n!}{(n-x)!n^{x}}\frac{\mu^{x}}{x!}(1-\frac{\mu}{n})^{n}(1-\frac{\mu}{n})^{-x}$$

And let’s also simplify \( \frac{n!}{(n-x)!n^{x}}\):

$$ \frac{n!}{(n-x)!n^{x}} = \frac{n(n-1)(n-2)\dots(n-x+1)}{n^{x}}$$

$$ =1(1-\frac{1}{n})\dots(1-\frac{x-2}{n})(1-\frac{x-1}{n})$$

Before we flip the switch on the limit-to-infinity machine, let’s assess what we got:
$$ \lim_{n \to \infty}1(1-\frac{1}{n})\dots(1-\frac{x-2}{n})(1-\frac{x-1}{n})\frac{\mu^{x}}{x!}(1-\frac{\mu}{n})^{n}(1-\frac{\mu}{n})^{-x}$$

Notice this is basically four things multiplied together:

  1. \( 1(1-\frac{1}{n})\dots(1-\frac{x-2}{n})(1-\frac{x-1}{n})\)
  2. \( \frac{\mu^{x}}{x!}\)
  3. \( (1-\frac{\mu}{n})^{n}\)
  4. \( (1-\frac{\mu}{n})^{-x}\)

We can make life easier by taking the limit of each of those four pieces individually:

  1. \( \lim_{n \to \infty}1(1-\frac{1}{n})\dots(1-\frac{x-2}{n})(1-\frac{x-1}{n})=1\)
  2. \( \lim_{n \to \infty}\frac{\mu^{x}}{x!}=\frac{\mu^{x}}{x!}\) (no n to blow up!)
  3. \( \lim_{n \to \infty}(1-\frac{\mu}{n})^{n}=e^{-\mu}\)
  4. \( \lim_{n \to \infty}(1-\frac{\mu}{n})^{-x}=1\)

1, 2, and 4 are easy. Number 3, though, requires a long forgotten formula you probably learned in calculus just long enough to take an exam:

$$ \lim_{n \to \infty}(1+\frac{b}{n})^{n}=e^{b}$$

Set b = \( -\mu\) and you get the result.

Now put it all back together and you have the probability mass function of the Poisson distribution:

$$ f(x) = \frac{\mu^{x}e^{-\mu}}{x!}$$

Often you see \( \mu\) as \( \lambda\) in this formula.

This new formula is approximately the binomial probability mass function for a large number of trials and small probability. That’s basically what we did, right? Substitute an expression for p involving n and let n grow large. That means the probability got small while n got large. Which further means that before we had modern computers, the Poisson was very handy for approximating binomial probabilities since the binomial coefficient can be difficult to compute for large n.

For example, say we have 100 independent trials with probability of success 0.05. What is the probability of observing 7 successes?

Using the binomial:
$$ f(7) = \frac{100!}{7!(100-7)!}0.05^{7}(1-0.05)^{100-7} = 0.106026$$

Using the Poisson as an approximation (where \( \mu = np = 100(0.05) = 5\)):
$$ f(7) = \frac{5^{7}e^{-5}}{7!} = 0.104445$$

Very close indeed!

By the way, how did I calculate those probabilities? I used Excel functions:

Binomial: =BINOMDIST(7,100,0.05,0)
Poisson: =POISSON(7,5,0)

The numbers in the parentheses should be self-explanatory. The final 0 means “false, no cumulative”. If you set it to 1, you get the probability of 7 or fewer successes in 100 trials. You can also use a TI-83 Plus. Just press 2nd – DISTR and choose binompdf or poissonpdf. If want to do cumulative probabilities, select binomcdf or poissoncdf. The parameters are in a different order from Excel:


Of course “real statisticians” would use R:

> dbinom(7,100,0.05)
[1] 0.1060255

> dpois(7,5)
[1] 0.1044449

I would be careless not to mention that the Poisson distribution is not the same as the binomial. The binomial describes the distribution of events (or “successes”) over a fixed number of trials for a fixed probability of success on each trial. The Poisson describes the distribution of events in space or time for a given rate of occurrence. The binomial is specified by n (number of trails) and p (probability of success on each trial). The Poisson is specified by \( \mu\), the rate at which events occur per unit of space or time.

And finally, and no doubt most important, the way to pronounce Poisson is \pwa-sawn\. Say it correctly and you earn instant credibility at cocktail parties.

2 thoughts on “Deriving the Poisson Distribution

  1. Jun Shen

    This is very impressive. My question is how we know Poisson distribution can be derived in this way in the first place as Poisson and binomial distributions are two different processes? Thanks.

    1. Clay Ford Post author

      Well, the reason I know it is because I read about it in the book Principles of Statistics by M.G. Bulmer. This blog post was basically me working through his derivation to make sure I understood and filling in some of the steps. In his words: “The Poisson distribution is the limiting form of the binomial distribution where there is a large number of trials but only a small probability of success at each of them.”


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.