Say you have a fair die. (Most of them are these days.) You roll it once and observe a 3. You roll it again and observe a 1. You roll it yet again and get a 4. In three throws you’ve observed 3 of the possible sides. What if you keep going? How many rolls would it take to see all six sides *at least once*. Obviously the minimum number is 6. But the probability of that happening is about 0.015. (). What is the most likely number of rolls? Or put in the language of a statistics textbook, what’s the expected number of rolls?

To solve this problem we need to use the **Geometric Distribution**. If a series of trials are independent of one another, and each trial has the same probability of success, then the number of trials needed to observe the first success has a geometric distribution. This is the scenario of our roll of the die. Each roll is independent of one another and the probability of success remains the same for each roll, whatever we define “success” to be. The mean of a geometric distribution is . That means the expected number of times we need to roll a dice to observe, say, a four is 6.

Back to our problem. We’re certain to get at least one of the faces on the first roll. So that probability is 1. On the second roll we have a probability of of getting a different face than what we got on the first roll. It’s not certain. There is a probability of of getting the same result as the first roll. Once we have observed two sides, the probability of observing a third side becomes . On so on. Notice that as we observe previously unseen sides the definition of success changes, therefore the probability changes. So to determine the expected number of rolls to see all sides of a die at least once, we need to find the expected number of rolls for each definition of “success” and sum them. The steps below describe the process. (The numbers represent the *order* of the observed sides, not the values on the dice.)

- We’re guaranteed to see one of the sides on the first roll. Expected number of rolls: 1
- We have probability of seeing a different side than what was previously observed. Expected number of rolls:
- We have probability of seeing a different side than what was previously observed in steps 1 and 2. Expected number of rolls:
- We have probability of seeing a different side than what was previously observed in steps 1 -3. Expected number of rolls:
- We have probability of seeing a different side than what was previously observed in steps 1-4. Expected number of rolls:
- We have probability of seeing a different side than what was previously observed in steps 1-5. Expected number of rolls:

Adding all the expected number of rolls for each definition of success we get 14.7. So we expect to roll a die about 15 times on averageÂ before we observe all sides at least once.

We can use this same line of thinking to answer such “real-life” questions as how many boxes of cereal would you have to buy to collect the latest series of cheap plastic toys, assuming the toys are randomly added to the cereal in a uniform manner. Of course if you’re desperate to collect the precious toys and have a modicum of patience, the answer is probably 0. You’d just wait to buy the whole set on eBay.

Marcus GABRIELSo, how does one determine the standard deviation about this 14.7 mean? Simply generating a large, random data set gives a sigma of around 6, but I cannot see how to predict it.

Marcus GABRIELMy first guess was to look at the Geometric Distribution. If the probability of success on each trial is p, then E(X) = 1/p, var(X) = (1 − p)/p^2 where var(X + Y) = var(X) + var(Y) (Hmm, is that correct). In pseudo code (Haskell code), the standard deviation is then

sqrt (sum (fmap (\x -> let p = 1/x in (1-p)/p^2) [1..d]))

which gives a wrong answer compared to an empirical computation.

I give up. I am missing an understanding somewhere.

Clay FordPost authorI think it’s basically the same as with the means. The variance of the sum is the sum of the variances.

The variance of the geometric dist’n is \(\frac{(1-p)}{p^2}\). Plug in the probabilities and sum. We can do this pretty quickly in R:

`geo_var <- function(p)(1-p)/p^2`

`sum(sapply((5:1)/6, geo_var))`

The result is 38.99. That's the variance. The standard deviation is the square root of that, which returns 6.244197. That seems to correspond with your simulation.

Marcus GABRIELThank you for the assist. I was not confident enough in my reasoning so I just could not see the bug in my code. I was not missing an understanding but a bug. For the record, the correct Haskell code that one can copy/paste into ghci is

d = 6

sqrt (sum (fmap (\x -> let p = x/6 in (1-p)/p^2) [1..d]))

which yields 6.244197… as expected.

Marcus GABRIELHow embarrassing: the x/6 should be x/d.

Clay FordPost authorNot embarrassing at all! I made at least 3 mistakes when I posted my comment and kept having to fix them, which is nice to be able to do when you’re the admin of the blog! :)

Marcus GABRIEL:)