# How many rolls of a die to see all sides at least once?

Say you have a fair die. (Most of them are these days.) You roll it once and observe a 3. You roll it again and observe a 1. You roll it yet again and get a 4. In three throws you’ve observed 3 of the possible sides. What if you keep going? How many rolls would it take to see all six sides at least once. Obviously the minimum number is 6. But the probability of that happening is about 0.015. ($$1 \times \frac{5}{6}\times \frac{4}{6}\times \frac{3}{6}\times \frac{2}{6}\times \frac{1}{6}$$). What is the most likely number of rolls? Or put in the language of a statistics textbook, what’s the expected number of rolls?

To solve this problem we need to use the Geometric Distribution. If a series of trials are independent of one another, and each trial has the same probability of success, then the number of trials needed to observe the first success has a geometric distribution. This is the scenario of our roll of the die. Each roll is independent of one another and the probability of success remains the same for each roll, whatever we define “success” to be. The mean of a geometric distribution is $$\frac{1}{p}$$. That means the expected number of times we need to roll a dice to observe, say, a four is 6.

Back to our problem. We’re certain to get at least one of the faces on the first roll. So that probability is 1. On the second roll we have a probability of $$\frac{5}{6}$$ of getting a different face than what we got on the first roll. It’s not certain. There is a probability of $$\frac{1}{6}$$ of getting the same result as the first roll. Once we have observed two sides, the probability of observing a third side becomes $$\frac{4}{6}$$. On so on. Notice that as we observe previously unseen sides the definition of success changes, therefore the probability changes. So to determine the expected number of rolls to see all sides of a die at least once, we need to find the expected number of rolls for each definition of “success” and sum them. The steps below describe the process. (The numbers represent the order of the observed sides, not the values on the dice.)

1. We’re guaranteed to see one of the sides on the first roll. Expected number of rolls: 1
2. We have $$\frac{5}{6}$$ probability of seeing a different side than what was previously observed. Expected number of rolls: $$1/\frac{5}{6}=1.2$$
3. We have $$\frac{4}{6}$$ probability of seeing a different side than what was previously observed in steps 1 and 2. Expected number of rolls: $$1/\frac{4}{6}=1.5$$
4. We have $$\frac{3}{6}$$ probability of seeing a different side than what was previously observed in steps 1 -3. Expected number of rolls: $$1/\frac{3}{6}=2$$
5. We have $$\frac{2}{6}$$ probability of seeing a different side than what was previously observed in steps 1-4. Expected number of rolls: $$1/\frac{2}{6}=3$$
6. We have $$\frac{1}{6}$$ probability of seeing a different side than what was previously observed in steps 1-5. Expected number of rolls: $$1/\frac{1}{6}=6$$

Adding all the expected number of rolls for each definition of success we get 14.7. So we expect to roll a die about 15 times on average before we observe all sides at least once.

We can use this same line of thinking to answer such “real-life” questions as how many boxes of cereal would you have to buy to collect the latest series of cheap plastic toys, assuming the toys are randomly added to the cereal in a uniform manner. Of course if you’re desperate to collect the precious toys and have a modicum of patience, the answer is probably 0. You’d just wait to buy the whole set on eBay.

# The moment-generating function

The moment-generating function (or mgf) is probably better understood if you simply focus on its name rather than its formula. It’s a function that generates moments. Of course it helps to know what “moments” are. A moment is a term from mechanics for the product of a distance and its weight. If you have several distances and associated weights, you can calculate all the products and sum them and get what you call the “first moment about the origin.” If that’s a little too abstract, just know that the calculation of the “first moment about the origin” is the same calculation one uses to find the mean of a random variable. Thus the mean of a random variable is the same as the first moment about the origin. Therefore we can use the mgf to generate the first moment about the origin, and thus find the mean of a random variable. But the moment-generating function doesn’t stop there. It also generates second moments, third moments, etc., provided the mgf exists (a detail I’m not going to address in this post). Since the variance of a random variable can be calculated using the first and second moments about the origin, we can use the mgf to find variance as well as the mean.

So that’s the big idea: the moment-generating function provides another (sometimes easier) way to find the mean and variance of a random variable. That’s why it’s taught in upper-level statistics (even though many instructors dive into this topic without providing any motivation for it). Recall the usual formulas for finding the mean and variance: $$\mu = \sum_{x\in S}xf(x)$$ and $$\sigma^{2} = \sum_{x\in S}(x-\mu)^{2}f(x)$$. These can lead to some awfully long and hard math, especially for finding the variance. Even the variance “shortcut” $$\sigma^{2} =E[X^{2}]-\mu^{2}$$ can get bogged down in finding $$E[X^{2}] = \sum_{x\in S}x^{2}f(x)$$. This is where the moment-generating function can be of service. Let’s introduce it (for discrete random variables):

mgf: $$M(t) = E(e^{tX}) = \sum_{x\in S}e^{tX}f(x)$$

I’ll admit it doesn’t look very friendly. And it’s certainly not obvious that it would help you find the mean and variance of a distribution. Why it works is a topic for another post, a post I’ll almost certainly never write. But seriously, why the mgf does what it does is explained rather well in one of my favorite Dover books, Principles of Statistics by M.G. Bulmer. A highly recommended book at only \$15. But say you believe me and take it on faith that the mgf can help us find the mean and variance. How does it work? Well, here are the steps you usually follow:

1. find the mgf
2. find the first derivative of the mgf
3. find the second derivative of the mgf
4. solve the first derivative of the mgf for t=0. That gives you the mean.
5. solve the second derivative of the mgf for t=0. That gives you $$E[X^{2}]$$
6. subtract the mean squared from $$E[X^{2}]$$ (ie, use the formula $$\sigma^{2} =E[X^{2}]-\mu^{2}$$)

Let’s do a simple example. Say we have a probability mass function (pmf) $$f(x) = (4 – x) / 6$$ with $$x = 1, 2, 3$$. (You can verify it’s a pmf by calculating $$f(1), f(2)$$ and $$f(3)$$, and adding the results. They sum to 1.) Use the moment-generating function to find the mean and variance.

Following the steps above we first find the mgf:

$$M(t) = E(e^{tX}) =e^{1t}((4-1)/6)+e^{2t}((4-2)/6)+e^{3t}((4-3)/6)$$

$$M(t) = \frac{3}{6}e^{t}+\frac{2}{6}e^{2t}+\frac{1}{6}e^{3t}$$

Next find the first and second derivatives:

$$M^{\prime}(t) = \frac{3}{6}e^{t}+\frac{4}{6}e^{2t}+\frac{3}{6}e^{3t}$$

$$M^{\prime\prime}(t) = \frac{3}{6}e^{t}+\frac{8}{6}e^{2t}+\frac{9}{6}e^{3t}$$

Now solve $$M^{\prime}(0)$$ to find the mean:

$$\mu = M^{\prime}(0) = \frac{3}{6}e^{0}+\frac{4}{6}e^{0}+\frac{3}{6}e^{0} = \frac{3}{6}+\frac{4}{6}+\frac{3}{6} = \frac{10}{6} \approxeq 1.667$$

Finally solve $$M^{\prime\prime}(0) = E[X^{2}]$$ to find the variance:

$$M^{\prime\prime}(0) = \frac{3}{6}e^{0}+\frac{8}{6}e^{0}+\frac{9}{6}e^{0}=\frac{3}{6}+\frac{8}{6}+\frac{9}{6} = \frac{20}{6}$$

$$\sigma^{2} = M^{\prime\prime}(0) – \mu^{2} = \frac{20}{6} – \frac{10}{6}^{2} = \frac{5}{9} \approxeq 0.556$$

Of course you wouldn’t normally employ the mgf for a simple distribution like this, but I thought it was a nice way to illustrate the concepts of the moment-generating function. By the way, I took this example from the textbook Probability and Statistical Inference (7th ed) by Hogg and Tanis (p. 101). Another highly recommended book.