Comparing Multilevel Models using Deviance Statistics (Ch 4 of ALDA)

The tour of Applied Longitudinal Data Analysis (ALDA) by Singer and Willett continues today with section 4.6, Comparing Models using Deviance Statistics. In the section prior to this they walk through building a model by way of examining hypothesis tests for fixed effects and variance components. While the former will be familiar to those who’ve done classical linear regression, the latter is likely something new. And perhaps not advised. As I mentioned in my previous post (last paragraph), they state in section 3.6.2 that “statisticians disagree as to the nature, form, and effectiveness of these tests.” They also “suggest you examine them only with extreme caution.” Therefore I decided not to blog about that particular tactic and instead focus on “a superior method for testing hypotheses about variance components.” (Also their words.) Of course I refer to the title of this post: Deviance Statistics.

As I’ve done in my other ALDA posts, I’m going to forgo exposition and get down to business. This post is meant to serve as a reference guide for my future self and maybe yours as well.

  • The Deviance Statistic is used to test the hypothesis that additional model predictors do not improve the fit of the model. The null hypothesis is that the coefficients of the additional predictors are 0.
  • To use the Deviance Statistic, one model must be nested in the other. That is, the smaller model can be derived from the bigger model by setting certain coefficients in the bigger model equal to 0.
  • Deviance = -2 * (Log Likelihood (LL) of model)
  • Deviance Statistic = -2 * (LL of model nested in bigger model – LL of bigger model)
  • Smaller Deviance is better. If adding more predictors to a model reduces deviance, that may be a good thing. The hypothesis test using the Deviance Statistic helps us determine whether or not the reduction in deviance is significant. A large p-value tells us no, it is not significant and that our model is not improved by the additional predictors. A small p-value tells us to reject the null and keep the extra predictors.
  • The distribution of the deviance statistic is chi-square with DF equal to the number of extra parameters in the bigger model.
  • Deviance obtained under Restricted Maximum Likelihood (REML) should only be used if the two models compared have the same fixed effects and differ only in their random effects. If this is not the case, the deviance obtained using Full ML should be used instead.

Example

The example in Chapter 4 of ALDA involves alcohol use by adolescents. 82 were surveyed over time (3 waves). Some of the data collected include:

alcuse, a continuous measure of alcohol use based on a rating scale (the response)
age_14, age of participant centered about 14 (14 = 0, 15 = 1, 16 = 2)
coa, an indicator whether or not participant is a child of an alcoholic (1 = yes, 0 = no)
id, an arbitrary level to group persons

The model building process is reproduced in R on the UCLA stats consulting site. They use the nlme package. I will use the lme4 package below to demonstrate the use of the deviance statistic.

# read in and attach the data
alcohol1 <- read.table("http://www.ats.ucla.edu/stat/r/examples/alda/data/alcohol1_pp.txt", header=T, sep=",")
attach(alcohol1)
library(lme4)

We're going to fit a model that only has age_14 as a predictor. Then we're going to build a model that has age_14 and coa as predictors. Notice the first model is "nested" in the second. In other words we can get the first model from the second model by setting the coa coefficients to 0.

FIRST MODEL
alcuse = \gamma_{00} + \gamma_{10}*age14 + \zeta + \zeta*age14 + \epsilon

SECOND MODEL
alcuse = \gamma_{00} + \gamma_{10}*age14 + \gamma_{01}*coa + \gamma_{11}*age14*coa + \zeta + \zeta_{1i}*age14 + \epsilon

Is the second model better than the first? The null hypothesis is no, it is not better.

H_{0}: \gamma_{01} = \gamma_{11} = 0

The second model has two additional fixed effects and no change in the random effects. Therefore to carry out this test, both models need to be fitted using Full Maximum Likelihood. (Note the argument "REML = FALSE" in the calls to lmer() below.)

# FIRST MODEL
model.b1 <- lmer(alcuse ~ age_14 + (age_14 | id), alcohol1, REML = FALSE)
summary(model.b1)

# SECOND MODEL
model.c1 <- lmer(alcuse ~ age_14*coa + (age_14 | id), alcohol1, REML = FALSE)
summary(model.c1)

Now we're ready to carry out the test. We can access the deviance of each model from the summary object, like so:

summary(model.b1)@AICtab$deviance
[1] 636.6111
summary(model.c1)@AICtab$deviance
[1] 621.2026

Notice the deviance of the bigger model is smaller than the deviance of the nested model. Is the reduction in deviance significant? To carry out the test we take the deviance of the smaller nested model and subtract from it the deviance of the bigger model. The difference is then compared to a chi-square distribution for significance. In this case, we'll compare the difference to a chi-square distribution with 2 degrees of freedom since the bigger model has two extra coefficients.

dev <- summary(model.b1)@AICtab$deviance - summary(model.c1)@AICtab$deviance
dev
[1] 15.40846
1 - pchisq(dev,2)
[1] 0.0004509163

Now that's a small p-value. That's the probability we would see a difference this large (or larger) in deviance if the two coefficients really were 0. We reject the null hypothesis and conclude our model is improved by including the two coefficients associated with the coa predictor. If we were planning to do several such tests, we could write a function to make the process go a little faster.

# function to calculate deviance statistic and return p-value
# a = nested model object, b = bigger model object, df = degrees of freedom
dev <- function(a,b,df){
return(1 - pchisq(
	         summary(a)@AICtab$deviance - 
      	         summary(b)@AICtab$deviance, 
		 df))
}

dev(model.b1,model.c1,2)
[1] 0.0004509163

3 thoughts on “Comparing Multilevel Models using Deviance Statistics (Ch 4 of ALDA)

  1. Ravindi Jayasundara

    Can there be higher deviance resulted for a bigger model compared to that of the nested one?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.