Deriving the exponential distribution – statistics you can probably trust

The exponential distribution looks harmless enough:

$$ f(x) = \frac{1}{\theta}e^{-x/\theta}$$

It looks like someone just took the exponential function and multiplied it by $ \frac{1}{\theta}$, and then for kicks decided to do the same thing in the exponent except with a negative sign. If we integrate this for all $ x>0 $ we get 1, demonstrating it’s a probability distribution function. So is this just a curiosity someone dreamed up in an ivory tower? No it actually turns out to be related to the Poisson distribution.

Recall the Poisson describes the distribution of probability associated with a Poisson process. That is, the number of events occurring over time or on some object in non-overlapping intervals are independent. For example, maybe the number of 911 phone calls for a particular city arrive at a rate of 3 per hour. The interval of 7 pm to 8 pm is independent of 8 pm to 9 pm. The expected number of calls for each hour is 3. The Poisson distribution allows us to find, say, the probability the city’s 911 number receives more than 5 calls in the next hour, or the probability they receive no calls in the next 2 hours. It deals with discrete counts.

Now what if we turn it around and ask instead how long until the next call comes in? Now we’re dealing with time, which is continuous as opposed to discrete. We’re limited only by the precision of our watch. Let’s be more specific and investigate the time until the first change in a Poisson process. Before diving into math, we can develop some intuition for the answer. If events in a process occur at a rate of 3 per hour, we would probably expect to wait about 20 minutes for the first event. Three per hour implies once every 20 minutes. But what is the probability the first event within 20 minutes? What about within 5 minutes? How about after 30 minutes? (Notice I’m saying within and after instead of at. When finding probabilities of continuous events we deal with intervals instead of specific points. The probability of an event occurring at a specific point in a continuous distribution is always 0.)

Let’s create a random variable called W, which stands for wait time until the first event. The probability the wait time is less than or equal to some particular time w is $ P(W \le w)$. Let’s say w=5 minutes, so we have $ P(W \le 5)$. We can take the complement of this probability and subtract it from 1 to get an equivalent expression:

$$ P(W \le 5) = 1 – P(W>5)$$

Now $ P(W>5)$ implies no events occurred before 5 minutes. That is, nothing happened in the interval [0, 5]. What is the probability that nothing happened in that interval? Well now we’re dealing with events again instead of time. And for that we can use the Poisson:

Probability of no events in interval [0, 5] = $ P(X=0) = \frac{\lambda^{0}e^{-\lambda}}{0!}=e^{-\lambda}$

So we have $ P(W \le 5) = 1 – P(W>5) = 1 – e^{-\lambda}$

That’s the cumulative distribution function. If we take the derivative of the cumulative distribution function, we get the probability distribution function:

$$ F'(w)=f(w)=\lambda e^{-\lambda}$$

And there we have the exponential distribution! Usually we let $ \lambda = \frac{1}{\theta}$. And that gives us what I showed in the beginning:

$$ f(x) = \frac{1}{\theta}e^{-x/\theta}$$

Why do we do that? That allows us to have a parameter in the distribution that represents the mean waiting time until the first change. Recall my previous example: if events in a process occur at a mean rate of 3 per hour, or 3 per 60 minutes, we expect to wait 20 minutes for the first event to occur. In symbols, if $ \lambda$ is the mean number of events, then $ \theta=\frac{1}{\lambda}$, the mean waiting time for the first event. So if $ \lambda=3$ is the mean number of events per hour, then the mean waiting time for the first event is $ \theta=\frac{1}{3}$ of an hour. Notice that $ \lambda=3=\frac{1}{1/3}=\frac{1}{\theta}$.

Tying everything together, if we have a Poisson process where events occur at, say, 12 per hour (or 1 every 5 minutes) then the probability that exactly 1 event occurs during the next 5 minutes is found using the Poisson distribution (with $ \lambda=\frac{1}{5}$):

$$ P(X=1) = \frac{(1/5)^{1}e^{-(1/5)}}{1!}=0.164$$

But the probability that we wait less than some time for the first event, say 5 minutes, is found using the exponential distribution (with $ \theta = \frac{1}{1/5} = 5$):

$$ P(X<5)=\int_{0}^{5}\frac{1}{5}e^{-x/5}dx=1-e^{-5/5}=1-e^{-1}=0.632$$

Now it may seem we have a contradiction here. We have a 63% of witnessing the first event within 5 minutes, but only a 16% chance of witnessing one event in the next 5 minutes. While the two statements seem identical, they’re actually assessing two very different things. The Poisson probability is the chance we observe exactly one event in the next 5 minutes. Not 2 events, Not 0, Not 3, etc. Just 1. That’s a fairly restrictive question. We’re talking about one outcome out of many. The exponential probability, on the other hand, is the chance we wait less than 5 minutes to see the first event. This is inclusive of all times before 5 minutes, such as 2 minutes, 3 minutes, 4 minutes and 15 seconds, etc. There are many times considered in this calculation. So we’re likely to witness the first event within 5 minutes with a better than even chance, but there’s only a 16% chance that all we witness in that 5 minute span is exactly one event. The latter probability of 16% is similar to the idea that you’re likely to get 5 heads if you toss a fair coin 10 times. That is indeed the most likely outcome, but that outcome only has about a 25% chance of happening. Not impossible, but not exactly what I would call probable. Again it has to do with considering only 1 outcome out of many. The Poisson probability in our question above considered one outcome while the exponential probability considered the infinity of outcomes between 0 and 5 minutes.

Clay Ford

8 comments

Pingback: » Deriving the gamma distribution Statistics you can Probably Trust
Tuan says:

November 24, 2015 at 1:11 am

Hi, I really like your explanation. However, would the $\lambda$ for computing the probability that exactly one event in the next 5 minutes equal to 1, instead of 1/5? Then the $\lambda$ in Poisson and the $\lambda$ in exponential are not the same thing. $\lambda$ in Poisson is the expected number of events occurring in a 5-min interval, whereas the \lambda$ in exponential is the Poisson exposure, the number of events occurring in a unit time interval.

1. Serhii says:
  
  January 9, 2019 at 6:55 am
  
  actually I agree, the probability is 0,35919, not 0,164. Your five minutes incoming rate should be equal to 1 (one per five minutes, and you’re exactly looking for the five-minutes long period probability
  
Matthew Theisen says:

July 12, 2016 at 12:52 pm

Informative! But it seems a little sloppy at points. For example, when you do the differentiation step, you end up with -lamdba*exp(-lambda). The negative sign shouldn’t be there–and it’s not really clear what you’re differentiating with respect to. If it’s lambda, the lambda factor out front shouldn’t be there. Then in the last step the x variable pops out of nowhere. It would be clearer if you started with (t*lambda) as the Poisson parameter where t is time waited and lambda is the expected number of events per time.

1. Clay Ford says:
  
  July 12, 2016 at 3:09 pm
  
  Sloppy indeed! Thanks for the heads up and your feedback. I have removed the negative sign. I was differentiating with respect to w. I guess I changed the w to x in the last step to match the pdf I presented at the beginning of the post.
  
Pingback: Some of my favorite Quora answers – Matthew Theisen's Data Blog
Anish says:

June 28, 2017 at 2:07 pm

This is the absolute clearest explanation of the Exponential distribution derivation I’ve found on the entire internet. Lol.

Kun says:

April 2, 2019 at 10:19 am

say x means time (or number of intervals)
within 1 interval the probability of 0 event happens is e^-Λ (e to the negative lambda)
so within x intervals the probability of 0 event happens is e^-Λx
so the cumulative probability of the first event happens within x intervals is 1-e^-Λx
Then take the derivative of that we get f(x) = Λe^-Λx

Clay Ford

8 comments

Leave a Reply Cancel reply