The exponential distribution has the quirky property of having no memory. Before we wade into the math and see why, let’s consider a situation where there *is* memory: drawing cards. Let’s say you have a well-shuffled deck of 52 cards and you draw a single card. What’s the probability of drawing an ace? Since there are 4 aces in a deck of 52 cards, the probability is \( \frac{4}{52}\). We draw our card and it’s not an ace. We set the card aside, away from the deck, and draw again. Now our probability of drawing an ace is \( \frac{4}{51}\). We have a slightly better chance on the 2nd draw. The *condition* that we have already selected a card that wasn’t an ace *changes the probability* we draw an ace. This doesn’t happen with the exponential distribution.

Let’s say we have a state-of-the-art widget (version 2.0) that has a lifespan that can be described with an exponential distribution. Further, let’s say the mean lifespan is 60 months, or 5 years. Thanks to the “no memory” property, the probability of the lifespan lasting 7 years is that same whether the widget is new or 5 years old. In math words:

$$ P(X > 7 + 5 | X>5) = P(X>7)$$

That means if I bought a widget that was 5 years old, it has the same probability of lasting another 7 years as a brand new widget has for lasting 7 years. Not realistic but certainly interesting. Showing why this is the case is actually pretty straight-ahead.

We want to show that for the exponential distribution, \( P(X > y + x | X > x) = P(X > y)\).

Recall the cumulative distribution of an exponential distribution is \( P(X \le x)=F(x) = 1 -e^{-x/\theta}\). That’s the probability of an event occurring *before* a certain time *x*. The complement of the cumulative distribution is the probability of an event occurring *after* a certain time:

$$ P(X > x) = 1 – P(X \le x) = 1 – (1 – e^{-x/ \theta} ) = e^{-x/ \theta}$$

Also recall the definition of conditional probability: \( P(A |B) = \frac{P(A \cap B)}{P(B)}\)

Let’s plug into the equality we want to prove and see what happens:

$$ P(X > y + x | X > x) = \frac{P(X>y + x) \cap P(X>x)}{P(X > x)} = \frac{P(X>y + x)}{P(X > x)}$$

$$ =\frac{e^{-(x+y)/\theta}}{e^{-x/\theta}} = \frac{e^{-x/\theta}e^{-y/\theta}}{e^{-x/\theta}} = e^{-y/\theta} = P(X>y)$$

There you go. Not too bad.

We can actually go the other direction as well. That is, we can show that if \( P(X > y + x | X > x) = P(X > y)\) is true for a continuous random variable X, then X has an exponential distribution. Here’s how:

\( P(X > y + x | X > x) = P(X > y)\) (*given*)

\( P(1 – F(y + x) | 1 – F(x)) = 1 – F(y)\) (*substitute the cdf expressions*)

\( \frac{1-F(y + x) \cap 1-F(x))}{1-F(x)}=1-F(y)\) (*using the definition of conditional probability*)

\( \frac{1-F(y + x)}{1-F(x)}=1-F(y)\) (*If X > y + x, then X > x*)

Now substitute in generic function terminology, say \( h(x) = 1 – F(x)\):

$$ \frac{h(y + x)}{h(x)}=h(y)$$

Rearranging terms gives us \( h(y + x)=h(y)h(x)\)

Now for that equality to hold, the function *h(x)* has to have an exponential form, where the variable is in the exponent, like this: \( a^{x}\). Recall that \( a^{x}a^{y}=a^{x+y}\). If \( h(x) = a^{x}\), then our equality above works. So we let \( h(x)=a^{x}\). That allows to make the following conclusion:

$$ 1-F(x) = h(x) = a^{x} = e^{ln a^{x}} = e^{x ln a}$$

Now let b = ln a. We get \( 1-F(x) = e^{bx}\). Solving for F(x) we get \( F(x) = 1 – e^{bx}\). Since \( F(\infty) = 1\), *b* must be negative. So we have \( F(x) = 1 – e^{-bx}\). Now we just let \( b = \frac{1}{\theta}\) and we have the cumulative distribution function for an exponential distribution: \( F(x) = 1 – e^{-x/\theta}\).

That’s the memoryless property for you. Or maybe it’s called the forgetfulness property. I can’t remember.

jacobfantastic explanations and very much appreciated, thanks for all your hard work , really helping beginners such as myself with derivations for prob distributions :)