The forgetful exponential distribution – statistics you can probably trust

The exponential distribution has the quirky property of having no memory. Before we wade into the math and see why, let’s consider a situation where there is memory: drawing cards. Let’s say you have a well-shuffled deck of 52 cards and you draw a single card. What’s the probability of drawing an ace? Since there are 4 aces in a deck of 52 cards, the probability is $ \frac{4}{52}$. We draw our card and it’s not an ace. We set the card aside, away from the deck, and draw again. Now our probability of drawing an ace is $ \frac{4}{51}$. We have a slightly better chance on the 2nd draw. The condition that we have already selected a card that wasn’t an ace changes the probability we draw an ace. This doesn’t happen with the exponential distribution.

Let’s say we have a state-of-the-art widget (version 2.0) that has a lifespan that can be described with an exponential distribution. Further, let’s say the mean lifespan is 60 months, or 5 years. Thanks to the “no memory” property, the probability of the lifespan lasting 7 years is that same whether the widget is new or 5 years old. In math words:

$$ P(X > 7 + 5 | X>5) = P(X>7)$$

That means if I bought a widget that was 5 years old, it has the same probability of lasting another 7 years as a brand new widget has for lasting 7 years. Not realistic but certainly interesting. Showing why this is the case is actually pretty straight-ahead.

We want to show that for the exponential distribution, $ P(X > y + x | X > x) = P(X > y)$.

Recall the cumulative distribution of an exponential distribution is $ P(X \le x)=F(x) = 1 -e^{-x/\theta}$. That’s the probability of an event occurring before a certain time x. The complement of the cumulative distribution is the probability of an event occurring after a certain time:

$$ P(X > x) = 1 – P(X \le x) = 1 – (1 – e^{-x/ \theta} ) = e^{-x/ \theta}$$

Also recall the definition of conditional probability: $ P(A |B) = \frac{P(A \cap B)}{P(B)}$

Let’s plug into the equality we want to prove and see what happens:

$$ P(X > y + x | X > x) = \frac{P(X>y + x) \cap P(X>x)}{P(X > x)} = \frac{P(X>y + x)}{P(X > x)}$$

$$ =\frac{e^{-(x+y)/\theta}}{e^{-x/\theta}} = \frac{e^{-x/\theta}e^{-y/\theta}}{e^{-x/\theta}} = e^{-y/\theta} = P(X>y)$$

There you go. Not too bad.

We can actually go the other direction as well. That is, we can show that if $ P(X > y + x | X > x) = P(X > y)$ is true for a continuous random variable X, then X has an exponential distribution. Here’s how:

$ P(X > y + x | X > x) = P(X > y)$ (given)

$ P(1 – F(y + x) | 1 – F(x)) = 1 – F(y)$ (substitute the cdf expressions)

$ \frac{1-F(y + x) \cap 1-F(x))}{1-F(x)}=1-F(y)$ (using the definition of conditional probability)

$ \frac{1-F(y + x)}{1-F(x)}=1-F(y)$ (If X > y + x, then X > x)

Now substitute in generic function terminology, say $ h(x) = 1 – F(x)$:

$$ \frac{h(y + x)}{h(x)}=h(y)$$

Rearranging terms gives us $ h(y + x)=h(y)h(x)$

Now for that equality to hold, the function h(x) has to have an exponential form, where the variable is in the exponent, like this: $ a^{x}$. Recall that $ a^{x}a^{y}=a^{x+y}$. If $ h(x) = a^{x}$, then our equality above works. So we let $ h(x)=a^{x}$. That allows to make the following conclusion:

$$ 1-F(x) = h(x) = a^{x} = e^{ln a^{x}} = e^{x ln a}$$

Now let b = ln a. We get $ 1-F(x) = e^{bx}$. Solving for F(x) we get $ F(x) = 1 – e^{bx}$. Since $ F(\infty) = 1$, b must be negative. So we have $ F(x) = 1 – e^{-bx}$. Now we just let $ b = \frac{1}{\theta}$ and we have the cumulative distribution function for an exponential distribution: $ F(x) = 1 – e^{-x/\theta}$.

That’s the memoryless property for you. Or maybe it’s called the forgetfulness property. I can’t remember.

Clay Ford

One comment

jacob says:

May 16, 2023 at 8:54 am

fantastic explanations and very much appreciated, thanks for all your hard work , really helping beginners such as myself with derivations for prob distributions :)

Clay Ford

One comment

Leave a Reply Cancel reply