Category Archives: Hypothesis Testing

Profile likelihood ratio confidence intervals

When you fit a generalized linear model (GLM) in R and call confint on the model object, you get confidence intervals for the model coefficients. But you also get an interesting message:

Waiting for profiling to be done...

What's that all about? What exactly is being profiled? Put simply, it's telling you that it's calculating a profile likelihood ratio confidence interval.

The typical way to calculate a 95% confidence interval is to multiply the standard error of an estimate by some normal quantile such as 1.96 and add/subtract that product to/from the estimate to get an interval. In the context of GLMs, we sometimes call that a Wald confidence interval.

Another way to determine an upper and lower bound of plausible values for a model coefficient is to find the minimum and maximum value of the set of all coefficients that satisfy the following:

\[-2\log\left(\frac{L(\beta_{0}, \beta_{1}|y_{1},…,y_{n})}{L(\hat{\beta_{0}}, \hat{\beta_{1}}|y_{1},…,y_{n})}\right) < \chi_{1,1-\alpha}^{2}\]

Inside the parentheses is a ratio of likelihoods. In the denominator is the likelihood of the model we fit. In the numerator is the likelihood of the same model but with different coefficients. (More on that in a moment.) We take the log of the ratio and multiply by -2. This gives us a likelihood ratio test (LRT) statistic. This statistic is typically used to test whether a coefficient is equal to some value, such as 0, with the null likelihood in the numerator (model without coefficient, that is, equal to 0) and the alternative or estimated likelihood in the denominator (model with coefficient). If the LRT statistic is less than \(\chi_{1,0.95}^{2} \approx 3.84\), we fail to reject the null. The coefficient is statisically not much different from 0. That means the likelihood ratio is close to 1. The likelihood of the model without the coefficient is almost as high the model with it. On the other hand, if the ratio is small, that means the likelihood of the model without the coefficient is much smaller than the likelihood of the model with the coefficient. This leads to a larger LRT statistic since it's being log transformed, which leads to a value larger than 3.84 and thus rejection of the null.

Now in the formula above, we are seeking all such coefficients in the numerator that would make it a true statement. You might say we're “profiling” many different null values and their respective LRT test statistics. Do they fit the profile of a plausible coefficient value in our model? The smallest value we can get without violating the condition becomes our lower bound, and likewise with the largest value. When we're done we'll have a range of plausible values for our model coefficient that gives us some indication of the uncertainly of our estimate.

Let's load some data and fit a binomial GLM to illustrate these concepts. The following R code comes from the help page for confint.glm. This is an example from the classic Modern Applied Statistics with S. ldose is a dosing level and sex is self-explanatory. SF is number of successes and failures, where success is number of dead worms. We're interested in learning about the effects of dosing level and sex on number of worms killed. Presumably this worm is a pest of some sort.

# example from Venables and Ripley (2002, pp. 190-2.)
ldose <- rep(0:5, 2)
numdead <- c(1, 4, 9, 13, 18, 20, 0, 2, 6, 10, 12, 16)
sex <- factor(rep(c("M", "F"), c(6, 6)))
SF <- cbind(numdead, numalive = 20-numdead)
budworm.lg <- glm(SF ~ sex + ldose, family = binomial)
summary(budworm.lg)
## 
## Call:
## glm(formula = SF ~ sex + ldose, family = binomial)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.10540  -0.65343  -0.02225   0.48471   1.42944  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -3.4732     0.4685  -7.413 1.23e-13 ***
## sexM          1.1007     0.3558   3.093  0.00198 ** 
## ldose         1.0642     0.1311   8.119 4.70e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 124.8756  on 11  degrees of freedom
## Residual deviance:   6.7571  on  9  degrees of freedom
## AIC: 42.867
## 
## Number of Fisher Scoring iterations: 4

The coefficient for ldose looks significant. Let's determine a confidence interval for the coefficient using the confint function. We call confint on our model object, budworm.lg and use the parm argument to specify that we only want to do it for ldose:

confint(budworm.lg, parm = "ldose")
## Waiting for profiling to be done...
##     2.5 %    97.5 % 
## 0.8228708 1.3390581

We get our “waiting” message though there really was no wait. If we fit a larger model and request multiple confidence intervals, then there might actually be a waiting period of a few seconds. The lower bound is about 0.8 and the upper bound about 1.32. We might say every increase in dosing level increase the log odds of killing worms by at least 0.8. We could also exponentiate to get a CI for an odds ratio estimate:

exp(confint(budworm.lg, parm = "ldose"))
## Waiting for profiling to be done...
##    2.5 %   97.5 % 
## 2.277027 3.815448

The odds of “success” (killing worms) is at least 2.3 times higher at one dosing level versus the next lower dosing level.

To better understand the profile likelihood ratio confidence interval, let's do it “manually”. Recall the denominator in the formula above was the likelihood of our fitted model. We can extract that with the logLik function:

den <- logLik(budworm.lg)
den
## 'log Lik.' -18.43373 (df=3)

The numerator was the likelihood of a model with a different coefficient. Here's the likelihood of a model with a coefficient of 1.05:

num <- logLik(glm(SF ~ sex + offset(1.05*ldose), family = binomial))
num
## 'log Lik.' -18.43965 (df=2)

Notice we used the offset function. That allows us to fix the coefficient to 1.05 and not have it estimated.

Since we already extracted the log likelihoods, we need to subtract them. Remember this rule from algebra?

\[\log\frac{M}{N} = \log M – \log N\]

So we subtract the denominator from the numerator, multiply by -2, and check if it's less than 3.84, which we calculate with qchisq(p = 0.95, df = 1)

-2*(num - den)
## 'log Lik.' 0.01184421 (df=2)
-2*(num - den) < qchisq(p = 0.95, df = 1)
## [1] TRUE

It is. 1.05 seems like a plausible value for the ldose coefficient. That makes sense since the estimated value was 1.0642. Let's try it with a larger value, like 1.5:

num <- logLik(glm(SF ~ sex + offset(1.5*ldose), family = binomial))
-2*(num - den) < qchisq(p = 0.95, df = 1)
## [1] FALSE

FALSE. 1.5 seems too big to be a plausible value for the ldose coefficient.

Now that we have the general idea, we can program a while loop to check different values until we exceed our threshold of 3.84.

cf <- budworm.lg$coefficients[3]  # fitted coefficient 1.0642
cut <- qchisq(p = 0.95, df = 1)   # about 3.84
e <- 0.001                        # increment to add to coefficient
LR <- 0                           # to kick start our while loop 
while(LR < cut){
  cf <- cf + e
  num <- logLik(glm(SF ~ sex + offset(cf*ldose), family = binomial))
  LR <- -2*(num - den)
}
(upper <- cf)
##    ldose 
## 1.339214

To begin we save the original coefficient to cf, store the cutoff value to cut, define our increment of 0.001 as e, and set LR to an initial value of 0. In the loop we increment our coefficient estimate which is used in the offset function in the estimation step. There we extract the log likelihood and then calculate LR. If LR is less than cut (3.84), the loop starts again with a new coefficient that is 0.001 higher. We see that our upper bound of 1.339214 is very close to what we got above using confint (1.3390581). If we set e to smaller values we'll get closer.

We can find the LR profile lower bound in a similar way. Instead of adding the increment we subtract it:

cf <- budworm.lg$coefficients[3]  # reset cf
LR <- 0                           # reset LR 
while(LR < cut){
  cf <- cf - e
  num <- logLik(glm(SF ~ sex + offset(cf*ldose), family = binomial))
  LR <- -2*(num - den)
}
(lower <- cf)
##    ldose 
## 0.822214

The result, 0.822214, is very close to the lower bound we got from confint (0.8228708).

This is a very basic implementation of calculating a likelihood ratio confidence interval. It is only meant to give a general sense of what's happening when you see that message Waiting for profiling to be done.... I hope you found it helpful. To see how R does it, enter getAnywhere(profile.glm) in the console and inspect the code. It's not for the faint of heart.

I have to mention the book Analysis of Categorical Data with R, from which I gained a better understanding of the material in this post. The authors have kindly shared their R code at the following web site if you want to have a look: http://www.chrisbilder.com/categorical/

To see how they “manually” calculate likelihood ratio confidence intervals, go to the following R script and see the section “Examples of how to find profile likelihood ratio intervals without confint()”: http://www.chrisbilder.com/categorical/Chapter2/Placekick.R

Testing Composite Hypotheses about Fixed Effects (Ch 4 of ALDA)

I’m still on my ALDA kick here, this time posting about section 4.7 of Chapter 4. In my last post I talked deviance-based hypothesis tests in the context of model building. Recall you have to be aware of which method of estimation you used when you deploy the deviance statistic. Namely, if you used Restricted Maximum Likelihood (RML) to estimate model parameters, you can only use the deviance statistic to test hypotheses about variance components. This is an important point as many programs default to RML, such as the lme4 package in R and SAS PROC MIXED. But the deviance statistic is not the only tool for testing composite hypotheses.

That leads us to section 4.7 and the Wald statistic. The Wald statistic allows you to “test composite hypotheses about multiple effects regardless of the method of estimation used. This means that if you use restricted methods of estimation, which prevent you from using deviance-based tests to compare models with different fixed effects, you still have a means of testing composite hypotheses about sets of fixed effects.” (p. 122) Sounds good to me!

The authors give two examples, one of which I want to review in this post. As usual they don’t show you how to do the test using statistical software. Unfortunately either does the UCLA stats consulting page for ALDA. So I had to figure it out on my own.

Let’s reset the example study motivating the work in this chapter. 82 adolescents were surveyed on alcohol use. Some of the variables collected included:

  • alcuse, a rating-scale measure of alcohol use (the response)
  • age_14, age of participant centered about 14 (14 = 0, 15 = 1, 16 = 2)
  • coa, an indicator whether or not participant is a child of an alcoholic (1 = yes, 0 = no)
  • cpeer, a rating-scale measure of alcohol use among peers centered on its sample mean of 1.018
  • id, an arbitrary level to group persons

These variables are part of Model F, the model of interest in section 4.7, which aims to explain the variability in alcuse. (Models A through E precede Model F in the earlier model-building portion of the chapter.) Here’s Model F in its multilevel form:

level 1
Y_{ij} = \pi_{0i} + \pi_{1i}*age14_{ij} + \epsilon_{ij}

level 2
\pi_{0i} = \gamma_{00} + \gamma_{01}*coa + \gamma_{02}*cpeer + \zeta_{0i}
\pi_{1i} = \gamma_{10} + \gamma_{12}*cpeer + \zeta_{1i}

So this model posits that individuals have a liner trajectory over time (level 1), and that the parameters themselves of that linear trajectory differ between individuals based on coa and cpeer (level 2).

We can combine the two levels into one scary-looking composite representation of the model:
Y_{ij} = \gamma_{00} + \gamma_{01}*coa + \gamma_{02}*peer + \gamma_{10}*age14 + \gamma_{12}*peer*age14 + \zeta_{0i} + \zeta_{1i}*age14 + \epsilon_{ij}

Then we can estimate the parameters of that model in R with the following code:

alcohol1 <- read.table("http://www.ats.ucla.edu/stat/r/examples/alda/data/alcohol1_pp.txt", 
                       header=T, sep=",")
attach(alcohol1)
library(lme)
model.f1 <- lmer(alcuse ~ coa + cpeer*age_14 + (age_14 | id), alcohol1, REML = FALSE)
summary(model.f1)

And now we are ready to test composite hypotheses about this model. The first example in the book asks whether the average child of non-alcoholic parents - with an average value of peer - drinks no alcohol at age 14 (intercept = 0) and remains abstinent over time (slope = 0). So we set coa and cpeer both equal to 0 in our composite model and we're left with this:

Y_{ij} = \gamma_{00} + \gamma_{10}*age14 + \zeta_{0i} + \zeta_{1i}*age14 + \epsilon_{ij}

Thus our question is essentially asking if the slope and intercept in this model are 0. Or to state it formally, our composite null hypothesis is as follows:

H_{0}: \gamma_{00} = 0 \: and \: \gamma_{10} = 0

Now to carry out this test, we need to express this hypothesis as a general linear hypothesis in matrix notation. First let's restate the hypothesis using our fixed effects:

1\gamma_{00} + 0\gamma_{01} + 0\gamma_{02} + 0\gamma_{10} + 0\gamma_{12} = 0
0\gamma_{00} + 0\gamma_{01} + 0\gamma_{02} + 1\gamma_{10} + 0\gamma_{12} = 0

We have weighted our coefficients so that the only two we're interested in are viable. Now we create a matrix of the weights. This is easier and faster to show in R than trying to use LaTeX, which I'm not even sure I can pull off with the Word Press plugin I'm using.

C <- matrix(c(1,0,0,0,0,0,0,0,1,0), nrow=2, byrow=TRUE)
C
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    0    0    0
[2,]    0    0    0    1    0

Now using the matrix we just created we can conduct a linear hypothesis test using the linearHypothesis function available in the car package, like so:

library(car)
linearHypothesis(model.f1, C)

This returns a Wald statistic of 51.03 on 2 degrees of freedom, almost matching the book which reports 51.01. The p-value is practically 0, which means we reject this composite hypothesis.

Now it's nice to know there's an R function that will calculate the Wald statistic, but what is it? How can we find out? The following code reveals the source of linearHypothesis:

print(getAnywhere(linearHypothesis.lme))

In it we see the following calculation:

SSH <- as.vector(t(L %*% b - rhs) %*% solve(L %*% V %*% t(L)) %*% 
        (L %*% b - rhs))

That's it. So we have some matrix multiplication happening. In this calculation L is our hypothesis matrix, b is our fixed effects, rhs means "right hand side" (which in our example is 0) and V is the variance-covariance matrix of the model parameters.

If we wanted to calculate the Wald statistic by hand, we could do the following:

# extract the model coefficients
b <- matrix(summary(model.f1)@coefs[,1],nrow=5)

# create the "right-hand side"
q <- matrix(c(0,0),nrow=2)

# extract the variance-covariance matrix
V <- vcov(model.f1)

# calculate the Wald statistic
W <- t(C%*%b - q) %*% solve(C%*% V %*%t(C)) %*% (C%*%b - q)

To calculate the p-value, use the pchisq function:

pchisq(as.numeric(W),2,lower.tail=FALSE)