Power is the probability ofÂ rejecting a null hypothesis when the alternative is true. Say your null hypothesis is and that your alternative hypothesis is . Further, say the true state of the world is such that . The probability we reject given that state of the world, that is make the correct decision, is called Power.

Let’s say we carry out such a test but we don’t know the true state of the world (we never do in real life). We sample 20 items and decide to reject the null of if . We’ll assume we’re dealing with a Normal distribution that has an unknown mean and standard deviation of . What is the power of such a test? Well, it depends on what really is. For example, if , then the power of the test is

This says we have about a 81% chance of getting a sample mean of 60 (or higher) if the true mean is 61. That’s pretty good power. Traditionally statisticians like to have at least 80% power when doing experiments. Of course the catch is you have to know the standard deviation. Is that even possible? Not really. What most people do is make the best estimate possible and err on the side of being too conservative. If they think the standard deviation is about 2.5, they’ll round it up to 3 to be safe. Now as your standard deviation increases, your power decreases. So being conservative means you have to increase your sample size to get the power back up to a desirable level. Going back to example, let’s say our standard deviation is 6 instead of 5:

Notice how the power dropped to 77%. To increase it back to around 80% I can increase my sample size. To do so in this example I need to increase my sample size from 20 to 27:

I could also hypothesize a different true mean in order to increase power. Previously I assumed a true mean of 60. If I leave my sample size at 20 and assume the larger standard deviation of 6, I can obtain 80% power by hypothesizing a mean of 61.15:

These formulas I’ve been using are power functions and they cry out for a spreadsheet. In one column enter various hypothesized means and then run a power function down an adjacent column to see how the power changes. You can do the same with sample size.

Here’s one trying different means:

Here’s another trying different sample sizes:

So we see that the further away the true mean is from our cut-off point, or the bigger our sample, the higher our power. This is a very useful and practical exercise for planning an experiment. Using some conservative assumptions, we can ballpark a good sample size. A size that’s not too small (i.e., under-powered) nor a size that’s not too big. If our sample size is too small, we’ll have a low probability of rejecting a false null hypothesis. If our sample size is too big, we spend unnecessary time and money and effort on our experiment.