# Profile likelihood ratio confidence intervals

When you fit a generalized linear model (GLM) in R and call confint on the model object, you get confidence intervals for the model coefficients. But you also get an interesting message:

Waiting for profiling to be done...

What's that all about? What exactly is being profiled? Put simply, it's telling you that it's calculating a profile likelihood ratio confidence interval.

The typical way to calculate a 95% confidence interval is to multiply the standard error of an estimate by some normal quantile such as 1.96 and add/subtract that product to/from the estimate to get an interval. In the context of GLMs, we sometimes call that a Wald confidence interval.

Another way to determine an upper and lower bound of plausible values for a model coefficient is to find the minimum and maximum value of the set of all coefficients that satisfy the following:

$-2\log\left(\frac{L(\beta_{0}, \beta_{1}|y_{1},…,y_{n})}{L(\hat{\beta_{0}}, \hat{\beta_{1}}|y_{1},…,y_{n})}\right) < \chi_{1,1-\alpha}^{2}$

Inside the parentheses is a ratio of likelihoods. In the denominator is the likelihood of the model we fit. In the numerator is the likelihood of the same model but with different coefficients. (More on that in a moment.) We take the log of the ratio and multiply by -2. This gives us a likelihood ratio test (LRT) statistic. This statistic is typically used to test whether a coefficient is equal to some value, such as 0, with the null likelihood in the numerator (model without coefficient, that is, equal to 0) and the alternative or estimated likelihood in the denominator (model with coefficient). If the LRT statistic is less than $$\chi_{1,0.95}^{2} \approx 3.84$$, we fail to reject the null. The coefficient is statisically not much different from 0. That means the likelihood ratio is close to 1. The likelihood of the model without the coefficient is almost as high the model with it. On the other hand, if the ratio is small, that means the likelihood of the model without the coefficient is much smaller than the likelihood of the model with the coefficient. This leads to a larger LRT statistic since it's being log transformed, which leads to a value larger than 3.84 and thus rejection of the null.

Now in the formula above, we are seeking all such coefficients in the numerator that would make it a true statement. You might say we're “profiling” many different null values and their respective LRT test statistics. Do they fit the profile of a plausible coefficient value in our model? The smallest value we can get without violating the condition becomes our lower bound, and likewise with the largest value. When we're done we'll have a range of plausible values for our model coefficient that gives us some indication of the uncertainly of our estimate.

Let's load some data and fit a binomial GLM to illustrate these concepts. The following R code comes from the help page for confint.glm. This is an example from the classic Modern Applied Statistics with S. ldose is a dosing level and sex is self-explanatory. SF is number of successes and failures, where success is number of dead worms. We're interested in learning about the effects of dosing level and sex on number of worms killed. Presumably this worm is a pest of some sort.

# example from Venables and Ripley (2002, pp. 190-2.)
ldose <- rep(0:5, 2)
numdead <- c(1, 4, 9, 13, 18, 20, 0, 2, 6, 10, 12, 16)
sex <- factor(rep(c("M", "F"), c(6, 6)))
budworm.lg <- glm(SF ~ sex + ldose, family = binomial)
summary(budworm.lg)

##
## Call:
## glm(formula = SF ~ sex + ldose, family = binomial)
##
## Deviance Residuals:
##      Min        1Q    Median        3Q       Max
## -1.10540  -0.65343  -0.02225   0.48471   1.42944
##
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  -3.4732     0.4685  -7.413 1.23e-13 ***
## sexM          1.1007     0.3558   3.093  0.00198 **
## ldose         1.0642     0.1311   8.119 4.70e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
##     Null deviance: 124.8756  on 11  degrees of freedom
## Residual deviance:   6.7571  on  9  degrees of freedom
## AIC: 42.867
##
## Number of Fisher Scoring iterations: 4


The coefficient for ldose looks significant. Let's determine a confidence interval for the coefficient using the confint function. We call confint on our model object, budworm.lg and use the parm argument to specify that we only want to do it for ldose:

confint(budworm.lg, parm = "ldose")

## Waiting for profiling to be done...

##     2.5 %    97.5 %
## 0.8228708 1.3390581


We get our “waiting” message though there really was no wait. If we fit a larger model and request multiple confidence intervals, then there might actually be a waiting period of a few seconds. The lower bound is about 0.8 and the upper bound about 1.32. We might say every increase in dosing level increase the log odds of killing worms by at least 0.8. We could also exponentiate to get a CI for an odds ratio estimate:

exp(confint(budworm.lg, parm = "ldose"))

## Waiting for profiling to be done...

##    2.5 %   97.5 %
## 2.277027 3.815448


The odds of “success” (killing worms) is at least 2.3 times higher at one dosing level versus the next lower dosing level.

To better understand the profile likelihood ratio confidence interval, let's do it “manually”. Recall the denominator in the formula above was the likelihood of our fitted model. We can extract that with the logLik function:

den <- logLik(budworm.lg)
den

## 'log Lik.' -18.43373 (df=3)


The numerator was the likelihood of a model with a different coefficient. Here's the likelihood of a model with a coefficient of 1.05:

num <- logLik(glm(SF ~ sex + offset(1.05*ldose), family = binomial))
num

## 'log Lik.' -18.43965 (df=2)


Notice we used the offset function. That allows us to fix the coefficient to 1.05 and not have it estimated.

Since we already extracted the log likelihoods, we need to subtract them. Remember this rule from algebra?

$\log\frac{M}{N} = \log M – \log N$

So we subtract the denominator from the numerator, multiply by -2, and check if it's less than 3.84, which we calculate with qchisq(p = 0.95, df = 1)

-2*(num - den)

## 'log Lik.' 0.01184421 (df=2)

-2*(num - den) < qchisq(p = 0.95, df = 1)

## [1] TRUE


It is. 1.05 seems like a plausible value for the ldose coefficient. That makes sense since the estimated value was 1.0642. Let's try it with a larger value, like 1.5:

num <- logLik(glm(SF ~ sex + offset(1.5*ldose), family = binomial))
-2*(num - den) < qchisq(p = 0.95, df = 1)

## [1] FALSE


FALSE. 1.5 seems too big to be a plausible value for the ldose coefficient.

Now that we have the general idea, we can program a while loop to check different values until we exceed our threshold of 3.84.

cf <- budworm.lg$coefficients[3] # fitted coefficient 1.0642 cut <- qchisq(p = 0.95, df = 1) # about 3.84 e <- 0.001 # increment to add to coefficient LR <- 0 # to kick start our while loop while(LR < cut){ cf <- cf + e num <- logLik(glm(SF ~ sex + offset(cf*ldose), family = binomial)) LR <- -2*(num - den) } (upper <- cf)  ## ldose ## 1.339214  To begin we save the original coefficient to cf, store the cutoff value to cut, define our increment of 0.001 as e, and set LR to an initial value of 0. In the loop we increment our coefficient estimate which is used in the offset function in the estimation step. There we extract the log likelihood and then calculate LR. If LR is less than cut (3.84), the loop starts again with a new coefficient that is 0.001 higher. We see that our upper bound of 1.339214 is very close to what we got above using confint (1.3390581). If we set e to smaller values we'll get closer. We can find the LR profile lower bound in a similar way. Instead of adding the increment we subtract it: cf <- budworm.lg$coefficients[3]  # reset cf
LR <- 0                           # reset LR
while(LR < cut){
cf <- cf - e
num <- logLik(glm(SF ~ sex + offset(cf*ldose), family = binomial))
LR <- -2*(num - den)
}
(lower <- cf)

##    ldose
## 0.822214


The result, 0.822214, is very close to the lower bound we got from confint (0.8228708).

This is a very basic implementation of calculating a likelihood ratio confidence interval. It is only meant to give a general sense of what's happening when you see that message Waiting for profiling to be done.... I hope you found it helpful. To see how R does it, enter getAnywhere(profile.glm) in the console and inspect the code. It's not for the faint of heart.

I have to mention the book Analysis of Categorical Data with R, from which I gained a better understanding of the material in this post. The authors have kindly shared their R code at the following web site if you want to have a look: http://www.chrisbilder.com/categorical/

To see how they “manually” calculate likelihood ratio confidence intervals, go to the following R script and see the section “Examples of how to find profile likelihood ratio intervals without confint()”: http://www.chrisbilder.com/categorical/Chapter2/Placekick.R

# Poisson regression – Ch 6 of Gelman and Hill

Chapter 6 of Gelman and Hill's Data Analysis Using Regression and Multilevel/Hierarchical Models presents an interesting example of Poisson regression using data on police stops in New York. Put simply, they seek to model the count of “stop and frisks” as a function of ethnicity and precinct with number of arrests in the previous year included as an offset. While the example is fun and informative, it's not terribly easy to replicate. The R code is provided to run the regressions but the data is a little hard to locate. I was eventually able to find it and get everything to mostly work, so I thought I would blog about it in case anyone else was trying to do the same.

About the data: it turns out you can find it here in the “police” folder. In the folder is a dat file called “frisk_with_noise”. Here's one way to read it in:

url <- "http://www.stat.columbia.edu/~gelman/arm/examples/police/frisk_with_noise.dat"

##   stops  pop past.arrests precinct eth crime
## 1    75 1720          191        1   1     1
## 2    36 1720           57        1   1     2
## 3    74 1720          599        1   1     3
## 4    17 1720          133        1   1     4

dim(dat)

## [1] 900   6


But wait. You'll notice the data set has 900 rows but the regression output in the book references n = 225. To replicate the example in the book, I'm pretty sure we need to aggregate over precinct and eth, like so:

stops <- aggregate(cbind(stops, past.arrests) ~ eth + precinct, data=dat, sum)


Using cbind() on stops and past.arrests allows us to sum both stops and past.arrests over all combinations of eth and precinct. Now we're ready to run the code provided in the book, which can also be found here. Here's one of the models they fit:

library(arm) # for the display() function
fit.2 <- glm (stops ~ factor(eth), data=stops, family=poisson, offset=log(past.arrests))
display(fit.2)

## glm(formula = stops ~ factor(eth), family = poisson, data = stops,
##     offset = log(past.arrests))
##              coef.est coef.se
## (Intercept)  -0.59     0.00
## factor(eth)2  0.07     0.01
## factor(eth)3 -0.16     0.01
## ---
##   n = 225, k = 3
##   residual deviance = 45437.4, null deviance = 46120.3 (difference = 682.9)


That works, but our output doesn't match the book's output! What's going on? Actually the explanation is in the title of the data file: “frisk_with_noise”. Also notice how I skipped the first 6 lines of the file when reading it in to R. Here's the first line of what I skipped:

stop and frisk data (with noise added to protect confidentiality)


So noise was apparently added after publication of the book to protect confidentiality. I guess we're not supposed to see which precincts are making the most stops of certain ethnicities? Oh well, at least we can run the code now.

We can see fitted values using the fitted() function. Here are the first 10 fitted values compared to their observed values:

cbind(observed=stops$stops[1:10], fitted=fitted(fit.2)[1:10])  ## observed fitted ## 1 202 544.2816 ## 2 102 175.7562 ## 3 81 180.0316 ## 4 132 418.2082 ## 5 144 331.8515 ## 6 71 203.6578 ## 7 752 1215.1920 ## 8 441 373.5563 ## 9 410 584.9845 ## 10 385 261.5884  Doesn't look too good to the eye, at least not the first 10. How are those fitted values calculated? Like this: $y = exp(\alpha + \beta x + log(t))$ where t is the offset, in this case, past arrrests. So to calculate the fitted value for precinct 1, ethnicity = 1 (black), and 980 arrests in the previous year, we do the following: exp(coef(fit.2)[1] + log(980))  ## (Intercept) ## 544.2816  # or equivalently 980*exp(coef(fit.2)[1])  ## (Intercept) ## 544.2816  # compare to R's fitted value fitted(fit.2)[1] == exp(coef(fit.2)[1] + log(980))  ## 1 ## TRUE  Plotting standardized residuals versus fitted values can reveal overdispersion. That's what Gelman and Hill do in figure 6.1. Below I reproduce it. First I fit a new model that includes precinct as a predictor. The figure on the left plots raw residuals versus fitted values. This is of limited use since we expect variance to increase with larger fitted values in a Poisson distribution. In other words we wouldn't expect to see a constant scatter about 0 as we would in OLS regression. The figure on the right, however, uses standardized residuals which have a mean of 0 and standard deviation of 1. If the Poisson model is true, we would expect to see most residuals falling within 2 standard deviations from 0. Many residuals falling beyond this range reveal overdispersion. And indeed that's what we see here. fit.3 <- glm (stops ~ factor(eth) + factor(precinct), data=stops, family=poisson, offset=log(past.arrests)) par(mfrow=c(1,2)) pv <- fitted(fit.3) r <- (stops$stops - fitted(fit.3))
plot(pv, r, pch=20, ylab="raw residuals", xlab="predicted value")
abline(h=0)
sr <- (stops\$stops - fitted(fit.3))/sqrt(fitted(fit.3))
plot(pv, sr, pch=20, ylab="standardized residuals", xlab="predicted value")
abline(h=c(-2,0,2),lty=c(2,1,2))


par(mfrow=c(1,1))


To correct for overdisperson in R you can repeat the code for fit.3 above with family = quasipoisson. This corrects the standard errors of the regression coefficients by making them larger.