Category Archives: Survival analysis

Pointwise confidence interval estimates of survivor functions

The most commonly used descriptive method for survival data is the Kaplan-Meier estimator (also known as the product-limit estimator). It estimates the probability of “surviving”, or working, past a certain point. For example, if the Kaplan-Meier estimate at 32 weeks is 0.54, that means the estimated probability of surviving 32 weeks or longer is 0.54. There are many web sites and books that will tell you all about the Kaplan-Meier estimator.

Now once you have an estimate, it’s a good idea to form a confidence interval to give you some idea of how precise your estimate is. The common approach in statistics is to take the standard error of the estimate, multiply it by 1.96, and add and subtract the product to the estimate to form a 95% confidence interval. The problem with that approach here is that our estimate is a probability that ranges in value from 0 to 1. It’s very possible that the margin of error added or subtracted to the estimate will result in an interval that falls below 0 or exceeds 1. One solution is the log-log transformation.

The log-log transformation does just what its names says, including the dash which represents a minus sign. You take the log of the estimate, make it negative, and then take the log again. The result of this transformation is a value that ranges from \( -\infty\) to \( +\infty\), just like the normal distribution (which is implied in our use of 1.96 as the number of standard errors to use for our margin of error calculation). In addition this assures us that when we transform back to our original scale, we have limits that range from 0 to 1.

Here’s an example: say we have a Kaplan-Meier estimate of 0.88 at 13 days. That means we estimate a 88% chance of surviving beyond 13 days. Furthermore let’s say our estimate has a standard error of 0.0650. (These values are often default output in statistical programs that perform survival analysis.) Without a transformation we get the following confidence interval:

$$ 0.88 \pm 1.96(0.0650) = (0.753,1.007)$$

We exceed one in the upper limit, so this won’t do. Let’s do the transformation. First we transform the Kaplan-Meier estimate:

$$ ln(-ln(0.88))=-2.057$$

Next we transform the standard error estimate. (Just trust me on this one):

$$ \sqrt{\frac{0.0650^{2}}{[0.88log(0.88)]^{2}}}=0.5777$$

Now we can calculate our confidence interval:

$$ -2.057 \pm 1.96(0.5777) = (-3.1894, -0.9247)$$

That’s great, but we have to transform it back to the original scale for it to have any meaning. To go back we take the exponential once, then twice, then take the inverse:

  1. \( exp(log(-log(0.88)) = -log(0.88) = log(0.88)^{-1} = log \frac{1}{0.88}\)
  2. \( exp(log \frac{1}{0.88}) = \frac{1}{0.88}\)
  3. \( (\frac{1}{0.88})^{-1} = 0.88\)

If we do that with our confidence limits we obtain:

$$ (exp(exp(-3.1894))^{-1}=0.9596$$
$$ (exp(exp(-0.9247))^{-1} = 0.6726$$

Notice that the upper limit before transforming back (-0.9247) becomes the lower limit after transformation (0.6726), and vice versa. Also notice the confidence interval is not symmetric about our estimate of 0.88.

Fortunately we don’t have to do this sort of thing by hand. Most statistical programs, if not all, give the log-log confidence intervals by default. There are other transformations you can use such as the logit and arcsine, but there is little practical difference between them (say Hosmer, Lemeshow, and May in Applied Survival Analysis, p. 32).

Having said all that, let me emphasize we calculated the 95% confidence interval for one time point at 13 days. Survival data sets will have many observed events, thus we will have many 95% confidence intervals. Now the probability that a collection of 95% confidence intervals all contain the true parameter they’re trying to estimate is much less than 95%. For example, the probability a collection of just ten 95% confidence intervals contain the true parameter is about 60%. So why do this at all? Mainly because we want to create confidence intervals for a few selected points, usually the 25th percentile, the median and the 75th percentile. We use the log-log pointwise confidence intervals to create a confidence interval for the length of time. If our events were measured in days, then our median confidence interval might be (32,65) days.

The way we construct these intervals is easy enough. Say we want a 25th percentile. Take the smallest event time such that “dying” earlier is greater than 0.25 in the lower limit column of all your confidence intervals. Do the same in the upper limit column of all your confidence intervals. The corresponding event times become the lower and upper limit for your 25th percentile confidence interval. Again most stats programs do this for you, but it’s good to know what the program is doing behind the scenes.