Tag Archives: ALDA

Modeling Discontinuous Change (Ch 6 of ALDA)

Chapter 6 of ALDA introduces strategies for fitting models in which individual change is discontinuous. This means the linear trajectory has a shift in the elevation and/or slope. To fit a model with discontinuous change we need to “include one (or more) time-varying predictor(s) that specify whether and, if so, when each person experiences the hypothesized shift.” (p. 191)

Some of the forms discontinuous change can take include:

an immediate shift in elevation, but no shift in slope
an immediate shift in slope, but no shift in elevation
immediate shifts in both elevation and slope
shifts in elevation (or slope) that differ in magnitude by time
multiple shifts in elevation (or slope) during multiple epochs of time

The example in the book replicates work done by Murnane, et al. (1999). In their paper they analyzed wage data for high school dropouts and investigated whether (log) wage trajectories remained smooth functions of work experience. Their idea was that obtaining a GED might command a higher wage and thus cause a discontinuity in the linear model fit to the data. The authors partially replicate this work by fitting a taxonomy of multilevel models.

Here’s the data (courtesy of UCLA IDRE). The variables of interest are:

id – person ID
lnw – natural log of wages (the response)
ged – indicator (1 = attained GED; 0 otherwise)
exper – years in labor force to nearest day
postexp – years in labor force from day of GED attainment
hgc.9 – highest grade completed, centered on grade 9
black – indicator (1 = black; 0 otherwise)
ue.7 – unemployment rate, centered on 7%

The initial model is rather elaborate due to earlier work in the book. Using the book’s notation we state the level-1 and level-2 models as follows:

Level-1
$Y_{ij} = \pi_{0i} + \pi_{1i}EXPER_{ij} + \pi_{2i}(UE_{ij} - 7) + \epsilon_{ij}$
Level-2
$\pi_{0i} = \gamma_{00} + \gamma_{01}(HGC_{i}-9) + \xi_{0i}$
$\pi_{1i} = \gamma_{10} + \gamma_{12}BLACK_{i} + \xi_{1i}$
$\pi_{2i} = \gamma_{20}$
Where,
$\epsilon_{ij} \sim N(0,\sigma_{\epsilon}^{2}$
and
$\begin{bmatrix} \xi_{0i}\\ \xi_{1i} \end{bmatrix} \sim N \begin{pmatrix} \begin{bmatrix} 0\\ 0 \end{bmatrix}, \begin{bmatrix} \sigma_{0}^{2} \sigma_{01}\\ \sigma_{10} \sigma_{1}^{2} \end{bmatrix} \end{pmatrix}$

What a mess. Let’s break this down. The level-1 model is the individual growth model. It posits an individual’s wages can be explained by his years in the labor force (EXPER) and the unemployment rate (UE). The level-2 model says

the intercept in the level-1 model varies by highest grade completed (HGC)
the EXPER coefficient in the level-1 model varies based on whether or not your race is black (BLACK)
the UE coefficient is fixed for all individuals

If we collapse the two levels into one model we get

$Y_{ij} = \gamma_{00} + \gamma_{01}(HGC_{i}-9) + \gamma_{10}EXPER_{ij} + \gamma_{12}EXPER_{ij} \times BLACK_{i} + \gamma_{20}(UE_{ij}-7) + \xi_{0i} + \xi_{1i}EXPER_{ij} + \epsilon_{ij}$

That’s not exactly fun to look at either, but the last few terms reveal the random effects. The $\xi_{0i}$ is the random effect for the intercept, $\xi_{1i}$ is the random effect for the EXPER slope parameter and $\epsilon_{ij}$ is the residual error.

We can fit this model in R as follows:

wages <- read.table("http://www.ats.ucla.edu/stat/r/examples/alda/data/wages_pp.txt", 
          header=T, sep=",")
library(lme4)
model.a <- lmer(lnw ~ exper + hgc.9 + exper:black + ue.7 + (exper|id), 
          wages, REML=FALSE)

The key part is the stuff in the parentheses. It says EXPER - and the intercept by default - are the random effects, and that they're grouped by ID (ie, the individuals). This means that each individual has his own intercept and EXPER coefficient in the fitted model. Let's look at the model's fixed effects and the random effects for individual 1.

The model's fixed effects:

fixef(model.a)
(Intercept)       exper       hgc.9        ue.7 exper:black 
 1.74898840  0.04405394  0.04001100 -0.01195050 -0.01818322

Random effects for first individual (ID = 31) in data:

ranef(model.a)$id[1,]
   (Intercept)      exper
31  -0.1833142 0.02014632

To see the final model for this individual we add his random effects to the fixed effects:

ranef(model.a)$id[1,] + fixef(model.a)
   (Intercept)      exper
31    1.565674 0.06420026

Or we can just do this:

coef(model.a2)$id[1,]
   (Intercept)      exper    hgc.9       ue.7 exper:black
31    1.565674 0.06420026 0.040011 -0.0119505 -0.01818322

Notice how the intercept and EXPER coefficient are different for the individual versus the fixed effects. Now we're usually less interested in the specific random effects (in this case there are 888 of them!) and more interested in their variances (or variance components). The variance component for the intercept is 0.051. The variance component for EXPER is 0.001. Those are pretty small but not negligible.

Having said all that, the goal of this exercise is to build the best model with discontinuities, which is largely done by deviance statistics. So let's work through the book's example and remember that everything I explained above is the baseline model. All subsequent models will build upon it.

The first model up adds a discontinuity in the intercept by including fixed and random effect for GED:

model.b <- lmer(lnw ~ exper + hgc.9 + exper:black + ue.7 + ged + 
                (exper + ged|id), wages, REML=FALSE)
anova(model.a,model.b)

To get a feel how GED affects the model, look at the records for individual 53:

wages[wages$id == 53,1:4]
   id   lnw exper ged
19 53 1.763 0.781   0
20 53 1.538 0.943   0
21 53 3.242 0.957   1
22 53 1.596 1.037   1
23 53 1.566 1.057   1
24 53 1.882 1.110   1
25 53 1.890 1.185   1
26 53 1.660 1.777   1

Notice how GED flips from 0 to 1 over time. Model B allows an individual's wage trajectory to shift in "elevation" at the point GED changes to 1. Hence the discontinuity. Should we allow this? Calling anova(model.a,model.b) helps us decide. In the output you'll see the p-value is less than 0.001. The null here is no difference between the models, i.e., the new explanatory variable in model B (GED) has no effect. So we reject the null and determine that an individual's wage trajectory may indeed display a discontinuity in elevation upon receipt of a GED.

Our next model is model B without GED random effects:

model.c <- lmer(lnw ~ exper + hgc.9 + exper:black + ue.7 + 
                ged + (exper|id), wages, REML=FALSE)
anova(model.c,model.b)

This is our baseline model with an additional fixed effect for GED. Should we include random effects for GED? Again we test the null that there is no difference between model B and C by calling anova(model.c,model.b). Since model C is nested in model B and the p-value returned is about 0.005, we reject the null and decide to keep the GED random components.

The next two models explore a discontinuity in the slope of the wages trajectory but not the elevation. We do this by removing the GED fixed and random effects and replace them with POSTEXP fixed and random effects. Recall that POSTEXP is years in labor force from day of GED attainment. To see how it works, look at the records for individual 4384:

wages[wages$id == 4384,3:5]
     exper ged postexp
2206 0.096   0   0.000
2207 1.039   0   0.000
2208 1.726   1   0.000
2209 3.128   1   1.402
2210 4.282   1   2.556
2211 5.724   1   3.998
2212 6.024   1   4.298

The record where GED flips to 1 is the day he obtains his GED. The next record clocks the elapsed time in the workforce since obtaining his GED: 1.402 years. POSTEXP records this explicitly. But that's what EXPER records as well: 3.128 - 1.726 = 1.402. So these two variables record the passage of time in lockstep. It's just that EXPER records from the "beginning" and POSTEXP records from the day of GED attainment. Allowing this additional variable into the model allows the slope of the wages trajectory to suddenly change when a GED is obtained. OK, enough talking. Let's see if we need it.

model.d <- lmer(lnw ~ exper + hgc.9 + exper:black + ue.7 + postexp + 
                (exper + postexp|id), wages, REML=FALSE)
anova(model.d,model.a)

The p-value returned from the anova function is about 0.01. This says model D is an improvement over model A and that the trajectory slope of wages indeed changes upon receipt of GED.

The next model is model D without random effects for POSTEXP. So whereas previously we allowed the change in slope to vary across individuals (random), now we're saying the change in slope is uniform (fixed) for all individuals. In the R code this means removing POSTEXP from the random effects portion but keeping it as a fixed effect.

model.e <- lmer(lnw ~ exper + hgc.9 + exper:black + ue.7 + postexp + 
               (exper|id), wages, REML=FALSE)
anova(model.d,model.e)

The results from this anova test conclude no difference between the models (p-value = 0.34). This means we may not need to allow for POSTEXP random effects.

But before we go with that, let's fit a model that allows for discontinuity in both the slope and elevation of the individuals' wages trajectory. In other words, let's throw both GED and POSTEXP in the model as both fixed and random effects. And then let's compare the model with previous models to determine whether or not to keep each predictor.

model.f <- lmer(lnw ~ exper + hgc.9 + exper:black + ue.7 + 
                postexp + ged + (postexp + ged + exper|id), 
                wages, REML=FALSE)

# compare models f and b to evaluate POSTEXP effect (NULL: no POSTEXP effect)
anova(model.b,model.f)

# compare models f and d to evaluate GED effect  (NULL: no GED effect)
anova(model.d,model.f)

In both anova tests, the p-values are very small, thus we reject the null in each case and retain the predictors. Therefore it appears that in the presence of the GED predictor that we do actually want to retain random effects for POSTEXP. But we're not done yet. Let's fit two more models each without the POSTEXP and GED random effects, respectively, and compare them to model F:

# MODEL G - Model F without POSTEXP variance component
model.g <- lmer(lnw ~ exper + hgc.9 + exper:black + ue.7 + 
                postexp + ged + (ged + exper|id), wages, REML=FALSE)

# MODEL H - Model F without GED variance component
model.h <- lmer(lnw ~ exper + hgc.9 + exper:black + ue.7 + 
                postexp + ged + (postexp + exper|id), wages, REML=FALSE)
deviance(model.h)

# compare models g and h to model f to see if we should 
# keep POSTEXP and GED variance components
# NULL: do not need variance components
anova(model.g,model.f)
anova(model.h,model.f)

In both anova tests we get p-values less than 0.01 and reject the null. We should indeed keep the random effects. So our final model allows for both discontinuities in elevation and slope in the individuals' trajectories. Here's a super rough way we can visualize this in R:

# visual aid of discontinuity in slope and elevation
# made up model coefficients
b0 <- 3
b1 <- 5 
b2 <- 2
b3 <- 4
# variables
ind <- c(rep(0,10),rep(1,11)) # indicator of event
time <- c(1:10,10,11:20) # time
time2 <- c(rep(0,11),1:10)  # additional time tracking after event occurs
# create response
y <- b0 + b1*ind + b2*time + b3*time2
# plot response versus time
plot(time, y, type="l")

This gives us the following line plot:

Notice the shift in elevation at 10 and then the change in slope after the shift in elevation.

Treating Time More Flexibly (Ch 5 of ALDA)

4 Replies

Through 4 chapters of Applied Longitudinal Data Analysis (ALDA), the data sets have had the following constraints:

Balanced – all subjects have the same number of measurements.
Time structured – all subjects measured at the same time.
Time-invariant predictors – predictors that do not change over time, such as gender or treatment group.

In chapter 5 these constraints are relaxed. We work with unbalanced datasets with variably-spaced measurements and time-varying predictors. As usual, the UCLA stats consulting site replicates the chapter’s examples in 18 different stats programs. I won’t redo their work, but I will give you my boiled-down-most-important-points that I took away from this chapter. I’ll also show a couple of examples using the lmer() function from the lme4 package.

Section 5.1 Variably Spaced Measurement Occasions
Analyzing data sets with variably spaced measurement occasions is no different than analyzing data sets with identical occasions across individuals (time structured).

Example with unstructured data set (variably spaced measurements)
Data: reading scores recorded at three different times (i.e., 3 waves of data)
Fit two unconditional growth models

reading <- read.table("http://www.ats.ucla.edu/stat/r/examples/
                      alda/data/reading_pp.txt", header=T, sep=",")
mat2 <- reading[ ,3:4]-6.5
dimnames(mat2)[[2]] <- c("agegrp.c", "age.c")  
reading <- cbind(reading, mat2)

library(lme4)
# forcing structure on data
lmer.agegrp <- lmer(piat ~ agegrp.c + (agegrp.c | id), reading, REML = FALSE)
summary(lmer.agegrp)
# using unstructured data
lmer.age <- lmer(piat ~ age.c + (age.c | id), reading, REML = FALSE)
summary(lmer.age)

The first model treats the data as structured. Instead of using child’s precise age, we are using their age classification group (6.5, 8.5, 10.5). The second model uses the child’s precise age. Notice the second model’s lower deviance: 1803 versus 1820. “Treating the unstructured data as though it is time-structured introduces error in the analysis – error that we can reduce by using the child’s age at testing as the temporal predictor.” (p. 145)

Lesson: never force an unstructured data set to be structured.

Section 5.2 Varying Numbers of Measurement Occasions
Section 5.1 concerned varying spacing of measurements. This section concerns varying number of measurements . AKA Unbalanced data. Multilevel modeling allows analysis of data sets with varying numbers of waves of data.

All subjects can contribute to a multilevel model regardless of how many waves of data they contribute. No special procedures are needed to fit a multilevel model to unbalanced data, provided it’s not too unbalanced (i.e., too many people with too few waves with respect to the complexity of your specified model).

Potential Problems with unbalanced data

The iterative estimation algorithms may not converge. This affects variance components, not fixed effects. “Estimation of variance components requires that enough people have sufficient data to allow quantification of within-person residual variation.” (p. 152)
Exceeding boundary constraints, such as negative variance components. Your output may have an estimate of 0 to indicate this. Simplifying your model by removing random effects is usually the fix.
Nonconvergence. This can result from poorly specified models and insufficient data. Can also result from the outcome variable’s scale (too small, make larger) or the temporal predictor’s variable scale (too brief, make longer)

5.3 Time-Varying Predictors

A time-varying predictor is a variable whose values may differ over time. Examples: hours worked per week, money earned per year, employment status. No special strategies are needed to include a time-varying predictor in a multilevel model.

Examples with time-varying predictor
Data: depression scores (cesd) for unemployed; status of employment (unemp; 0 or 1) changes over time

unemployment <- read.table("http://www.ats.ucla.edu/stat/r/examples/
                           alda/data/unemployment_pp.txt", header=T, sep=",")

# time-varying predictor is unemp
lmer.unb <- lmer(cesd ~ months + unemp + (months | id), 
                 unemployment, REML = FALSE)
summary(lmer.unb)

# allow effect of time-varying predictor (unemp) to vary over time
lmer.unc <- lmer(cesd ~ months + unemp*months + (months | id), 
                 unemployment, REML = FALSE)
summary(lmer.unc)

# constant slope for unemp=0, changing slope for unemp=1
lmer.und <- lmer(cesd ~ unemp + unemp:months + (unemp + unemp:months | id), 
                 unemployment, REML = FALSE)
summary(lmer.und)

5.3 Recentering the Effect of Time

Recentering time can produce interpretive advantages such as an intercept that represents initial status. Time can also be recentered in such a way to produce an intercept that represents final status. This is useful when final status is of special concern. Changes in recentering produce different intercept parameters but leave slope and deviance statistics unchanged. It can also lead to an intercept being significant when it previously was not (and vice versa).

Testing Composite Hypotheses about Fixed Effects (Ch 4 of ALDA)

Comparing Multilevel Models using Deviance Statistics (Ch 4 of ALDA)

3 Replies

The tour of Applied Longitudinal Data Analysis (ALDA) by Singer and Willett continues today with section 4.6, Comparing Models using Deviance Statistics. In the section prior to this they walk through building a model by way of examining hypothesis tests for fixed effects and variance components. While the former will be familiar to those who’ve done classical linear regression, the latter is likely something new. And perhaps not advised. As I mentioned in my previous post (last paragraph), they state in section 3.6.2 that “statisticians disagree as to the nature, form, and effectiveness of these tests.” They also “suggest you examine them only with extreme caution.” Therefore I decided not to blog about that particular tactic and instead focus on “a superior method for testing hypotheses about variance components.” (Also their words.) Of course I refer to the title of this post: Deviance Statistics.

As I’ve done in my other ALDA posts, I’m going to forgo exposition and get down to business. This post is meant to serve as a reference guide for my future self and maybe yours as well.

The Deviance Statistic is used to test the hypothesis that additional model predictors do not improve the fit of the model. The null hypothesis is that the coefficients of the additional predictors are 0.
To use the Deviance Statistic, one model must be nested in the other. That is, the smaller model can be derived from the bigger model by setting certain coefficients in the bigger model equal to 0.
Deviance = -2 * (Log Likelihood (LL) of model)
Deviance Statistic = -2 * (LL of model nested in bigger model – LL of bigger model)
Smaller Deviance is better. If adding more predictors to a model reduces deviance, that may be a good thing. The hypothesis test using the Deviance Statistic helps us determine whether or not the reduction in deviance is significant. A large p-value tells us no, it is not significant and that our model is not improved by the additional predictors. A small p-value tells us to reject the null and keep the extra predictors.
The distribution of the deviance statistic is chi-square with DF equal to the number of extra parameters in the bigger model.
Deviance obtained under Restricted Maximum Likelihood (REML) should only be used if the two models compared have the same fixed effects and differ only in their random effects. If this is not the case, the deviance obtained using Full ML should be used instead.

Example

The example in Chapter 4 of ALDA involves alcohol use by adolescents. 82 were surveyed over time (3 waves). Some of the data collected include:

alcuse, a continuous measure of alcohol use based on a rating scale (the response)
age_14, age of participant centered about 14 (14 = 0, 15 = 1, 16 = 2)
coa, an indicator whether or not participant is a child of an alcoholic (1 = yes, 0 = no)
id, an arbitrary level to group persons

The model building process is reproduced in R on the UCLA stats consulting site. They use the nlme package. I will use the lme4 package below to demonstrate the use of the deviance statistic.

# read in and attach the data
alcohol1 <- read.table("http://www.ats.ucla.edu/stat/r/examples/alda/data/alcohol1_pp.txt", header=T, sep=",")
attach(alcohol1)
library(lme4)

We're going to fit a model that only has age_14 as a predictor. Then we're going to build a model that has age_14 and coa as predictors. Notice the first model is "nested" in the second. In other words we can get the first model from the second model by setting the coa coefficients to 0.

FIRST MODEL
$alcuse = \gamma_{00} + \gamma_{10}*age14 + \zeta + \zeta*age14 + \epsilon$

SECOND MODEL
$alcuse = \gamma_{00} + \gamma_{10}*age14 + \gamma_{01}*coa + \gamma_{11}*age14*coa + \zeta + \zeta_{1i}*age14 + \epsilon$

Is the second model better than the first? The null hypothesis is no, it is not better.

$H_{0}: \gamma_{01} = \gamma_{11} = 0$

The second model has two additional fixed effects and no change in the random effects. Therefore to carry out this test, both models need to be fitted using Full Maximum Likelihood. (Note the argument "REML = FALSE" in the calls to lmer() below.)

# FIRST MODEL
model.b1 <- lmer(alcuse ~ age_14 + (age_14 | id), alcohol1, REML = FALSE)
summary(model.b1)

# SECOND MODEL
model.c1 <- lmer(alcuse ~ age_14*coa + (age_14 | id), alcohol1, REML = FALSE)
summary(model.c1)

Now we're ready to carry out the test. We can access the deviance of each model from the summary object, like so:

summary(model.b1)@AICtab$deviance
[1] 636.6111
summary(model.c1)@AICtab$deviance
[1] 621.2026

Notice the deviance of the bigger model is smaller than the deviance of the nested model. Is the reduction in deviance significant? To carry out the test we take the deviance of the smaller nested model and subtract from it the deviance of the bigger model. The difference is then compared to a chi-square distribution for significance. In this case, we'll compare the difference to a chi-square distribution with 2 degrees of freedom since the bigger model has two extra coefficients.

dev <- summary(model.b1)@AICtab$deviance - summary(model.c1)@AICtab$deviance
dev
[1] 15.40846
1 - pchisq(dev,2)
[1] 0.0004509163

Now that's a small p-value. That's the probability we would see a difference this large (or larger) in deviance if the two coefficients really were 0. We reject the null hypothesis and conclude our model is improved by including the two coefficients associated with the coa predictor. If we were planning to do several such tests, we could write a function to make the process go a little faster.

# function to calculate deviance statistic and return p-value
# a = nested model object, b = bigger model object, df = degrees of freedom
dev <- function(a,b,df){
return(1 - pchisq(
	         summary(a)@AICtab$deviance - 
      	         summary(b)@AICtab$deviance, 
		 df))
}

dev(model.b1,model.c1,2)
[1] 0.0004509163

Unconditional Multilevel Models for Change (Ch 4 of ALDA)

5 Replies

In Chapter 4 (section 4.4) of Applied Longitudinal Data Analysis (ALDA), Singer and Willett recommend fitting two simple unconditional models before you begin multilevel model building in earnest. These two models “allow you to establish: (1) whether there is systematic variation in your outcome that is worth exploring; and (2) where that variation resides (within or between people).” (p. 92) This is a great idea. Why start building models if there is no variation to explain? In this post I want to summarize these two models for reference purposes.

Model 1: The Unconditional Means Model

The keyword here is “means”. This model has one fixed effect that estimates the grand mean of the response across all occasions and individuals.
The main reason to fit this model is to examine the random effects (i.e., the within-person and between-person variance components). This tells us the amount of variation that exists at the within-person level and the between-person level.
Model specification: $ Y_{ij} = \gamma_{00} + \zeta_{0i} + \epsilon_{ij} $
- $ \gamma_{00} $ = grand mean (fixed effect)
- $ \zeta_{0i} $ = the amount person i’s mean deviates from the population mean (between-person)
- $ \epsilon_{ij} $ = the amount the response on occasion j deviates from person i’s mean (within-person)
- $ \epsilon_{ij} \sim N(0,\sigma_{\epsilon}^{2}) $
- $ \zeta_{0i} \sim N(0, \sigma_{0}^{2}) $
Use the intraclass correlation coefficient to describe the proportion of the total outcome variation that lies “between” people: $ \rho = \sigma_{0}^{2} / (\sigma_{0}^{2} + \sigma_{\epsilon}^{2}) $
In the unconditional means model the intraclass correlation coefficient is also the “error autocorrelation coefficient”, which estimates the average correlation between any pair of composite residuals: $ \zeta_{0i} + \epsilon_{ij} $

Sample R code for fitting the unconditional means model (where “id” = person-level grouping indicator):

library(nlme)
lme(response ~ 1, data=dataset, random= ~ 1 | id)

Or this:

library(lme4)
lmer(response ~ 1 + (1 | id), dataset)

To replicate the Unconditional Means Model example in ALDA, the UCLA stats page suggests the following:

alcohol1 <- read.table("http://www.ats.ucla.edu/stat/r/examples/alda/data/alcohol1_pp.txt", 
                       header=T, sep=",")
library(nlme)
model.a <- lme(alcuse~ 1, alcohol1, random= ~1 |id)
summary(model.a)

This works OK, but returns slightly different results because it fits the model using REML (Restricted Maximum Likelihood) instead of ML (Maximum Likelihood). It also does not return the estimated between-person variance $ \sigma_{0}^{2} $. We can "fix" the first issue by including the argument method="ML". There doesn't appear to be anything we can do about the second. However, the lmer() function allows us to replicate the example and obtain the same results presented in the book, as follows (notice we have to specify ML implicitly with the argument REML = FALSE):

model.a1 <- lmer(alcuse ~ 1 + (1 | id), alcohol1, REML = FALSE)
summary(model.a1)

The output provides the values discussed in the book in the "Random effects" section under the variance column:

> summary(model.a1)
Linear mixed model fit by maximum likelihood 
Formula: alcuse ~ 1 + (1 | id) 
   Data: alcohol1 
   AIC   BIC logLik deviance REMLdev
 676.2 686.7 -335.1    670.2     673
Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 0.56386  0.75091 
 Residual             0.56175  0.74950 
Number of obs: 246, groups: id, 82

Fixed effects:
            Estimate Std. Error t value
(Intercept)   0.9220     0.0957   9.633

The "Random effect" id has variance = 0.564. That's the between-person variance. The "Random effect" Residual has variance = 0.562. That's the within-person variance. We can access these values using "summary(model.a1)@REmat" and calculate the intraclass correlation coefficient like so:

icc_n <- as.numeric(summary(model.a1)@REmat[1,3])
icc_d <- as.numeric(summary(model.a1)@REmat[1,3]) + 
         as.numeric(summary(model.a1)@REmat[2,3])
icc_n / icc_d
[1] 0.5009373

Model 2: The Unconditional Growth Model

This model partitions and quantifies variance across people and time.
The fixed effects estimate the starting point and slope of the population average change trajectory.
Model specification: $ Y_{ij} = \gamma_{00} + \gamma_{10}*time_{ij} + \zeta_{0i} + \zeta_{1i}*time_{ij} + \epsilon_{ij} $
- $ \gamma_{00} $ = average intercept (fixed effect)
- $ \gamma_{10} $ = average slope (fixed effect)
- $ \zeta_{0i} $ = the amount person i's intercept deviates from the population intercept
- $ \zeta_{1i} $ = the amount person i's slope deviates from the population slope
- $ \epsilon_{ij} $ = the amount the response on occasion j deviates from person i's true change trajectory
- $ \epsilon_{ij} \sim N(0,\sigma_{\epsilon}^{2}) $
- $ \zeta_{0i} \sim N(0, \sigma_{0}^{2}) $
- $ \zeta_{1i} \sim N(0, \sigma_{1}^{2}) $
- $ \zeta_{0i}$ and $ \zeta_{1i} $ have covariance $ \sigma_{1}^{2} $
The residual variance $ \sigma_{\epsilon}^{2} $ summarizes the average scatter of an individual's observed outcome values around his/her own true change trajectory. Compare this to the same value in the unconditional means model to see if within-person variation is systematically associated with linear time.
The level-2 variance components, $ \sigma_{0}^{2} $ and $ \sigma_{1}^{2} $ quantify the unpredicted variability in the intercept and slope of individuals. That is, they assess the scatter of a person's intercept and slope about the population average change trajectory. DO NOT compare to the same values in the unconditional means model since they have a different interpretation.
The level-2 covariance $ \sigma_{01} $ quantifies the population covariance between the true initial status (intercept) and true change (slope). Interpretation is easier if we re-express the covariance as a correlation coefficient: $ \hat{\rho}_{01} = \hat{\sigma}_{01} / \sqrt{\hat{\sigma}_{0}^{2}\hat{\sigma}_{1}^{2}} $
Sample R code for fitting the unconditional growth model (where "id" = person-level grouping indicator):
```
lme(response ~ time , data=dataset, random= ~ time | id)
```
Or this:
```
lmer(alcuse ~ time + (time | id), dataset)
```

To replicate the Unconditional Growth Model example in ALDA, the UCLA stats page suggests the following:

alcohol1 <- read.table("http://www.ats.ucla.edu/stat/r/examples/alda/data/alcohol1_pp.txt", 
                       header=T, sep=",")
library(nlme)
model.b <- lme(alcuse ~ age_14 , data=alcohol1, random= ~ age_14 | id, method="ML")
summary(model.b)

However I think the following is better as it gives the same values in the book:

model.b1 <- lmer(alcuse ~ age_14 + (age_14 | id), alcohol1, REML = FALSE)
summary(model.b1)

For instance it provides variance values instead of standard deviation values. It doesn't really matter in the long run, but it makes it easier to quickly check your work against the book. Here's the output:

> summary(model.b1)
Linear mixed model fit by maximum likelihood 
Formula: alcuse ~ age_14 + (age_14 | id) 
   Data: alcohol1 
   AIC   BIC logLik deviance REMLdev
 648.6 669.6 -318.3    636.6   643.2
Random effects:
 Groups   Name        Variance Std.Dev. Corr   
 id       (Intercept) 0.62436  0.79017         
          age_14      0.15120  0.38885  -0.223 
 Residual             0.33729  0.58077         
Number of obs: 246, groups: id, 82

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.65130    0.10508   6.198
age_14       0.27065    0.06246   4.334

Correlation of Fixed Effects:
       (Intr)
age_14 -0.441

Again the main section to review is the "Random effects". The Residual variance (within-person) has decreased to 0.337 from 0.562. We can conclude that $ (0.562 - 0.337)/0.562 = 0.40 $ (i.e., 40%) of the within-person variation in the response is systematically associated with linear time. We also see the negative correlation (-0.223) between the true initial status (intercept) and true change (slope). However, the book notes this correlation is not statistically significant. As you can see this is not something the output of the lmer object reports. The book mentions in chapter 3 (p. 73) that statisticians disagree about the effectiveness of such significance tests on variance components, and I can only assume the authors of the lme4 package question their use. Finally, we notice the level-2 variance components: 0.624 and 0.151. These provide a benchmark for predictors' effects as the authors proceed to build models.

The Multilevel Model for Change (Ch 3 of ALDA) – revisited

The Multilevel Model for Change (Ch 3 of ALDA)

10 Replies

ALDA stands for the book, Applied Longitudinal Data Analysis, by Singer and Willett. I’ve had this lying around for a while and decided to give it another go. If you know a little about regression and want to get started with data analysis using multilevel models, this is probably a good place to start. It’s light on the math and heavy on application. Well, maybe not that heavy. They don’t actually provide anything in the way of computer code when it comes time to fit a model. They just do it and produce a table of results. Fortunately, our friends at the UCLA Statistical Consulting Group have generously provided programs to reproduce the examples in the book using R, SAS, and Stata, among others. So that’s where I turned when I wanted to replicate the example running through chapter 3.

The example involves 103 African-American infants born into low-income families. When the children were 6 months old, about half were randomly assigned to an early intervention program designed to enhance cognitive functioning. The other half received no intervention and acted as a control. Cognitive performance was measured at ages 12, 18 and 24 months by way of a test. The researches wanted to examine the effects of program participation on the test results. There’s actually a lot more to this study. Here’s the abstract. But the authors scaled it back and took a portion of the data to create a friendly introductory example to multilevel modeling. And I have to say, they succeeded. It is indeed an easy-to-follow example. The chapter can be read in one sitting and gives you a sense of what multilevel modeling is all about. In this case we want to model individual change in cognitive ability over time with a straight line,

$y = \beta_{0} + \beta_{1}(time)$

but then also model the intercept and slope of that line with respect to program:

$\beta_{0} = \gamma_{00} + \gamma_{01}(program)$
$\beta_{1} = \gamma_{10} + \gamma_{11}(program)$

So we have two levels of modeling going on. Hence, “multilevel” modeling.

Naturally I wanted to replicate the Chapter 3 results in R. That’s why I’m reading the book, so I can learn how to do this stuff. So I hit up the UCLA site to get the necessary R code. And there at the top the page, before I can even get started, is a roadblock:

Please note that the “early_int” data file (which is used in Chapter 3) is not included among the data files. This was done at the request of the researchers who contributed this data file to ensure the privacy of the participants in the study. Although the web page shows how to obtain the results with this data file, we regret that visitors do not have access to this file to be able to replicate the results for themselves.

Seriously? C’mon. Did the authors not think about this when writing the book? Couldn’t they come up with a different example that can be replicated? Very frustrating. So I did what any honorable statistician would do and ~~gave up, turned off the computer and went outside to enjoy nature~~ Googled the name of the dataset used in the R code on the UCLA web site (earlyint_pp.txt). The very first result takes you to this Harvard web page, where the data in question is available in SAS and Stata formats. Now the authors of ALDA are both Harvard professors. And here is the data – that I’m not supposed to have access to – available on a Harvard web page for an old statistics class taught by one of the authors of the book. I guess the researchers changed their mind? Anyway, I now had the data and could try to replicate the results. And you’ll be happy to know the data contained no identifying information. It had four variables: (1) an ID number for the infant, (2) the test score, (3) age at test, and (4) an indicator for program participation.

I downloaded the Stata dataset because I can read that into R using the foreign package:

library(foreign)
early.int <- read.dta("earlyint_pp.dta")

With the data loaded, I was ready to start replicating the results. This required me to copy and paste the R code from the UCLA web site. I don’t usually go for copying-and-pasting, but in this case I was OK with it. It was mostly exploratory analysis. And it worked like a charm. That is until I got to the part where you actually fit a multilevel model to the data. Running the following R code produced an error:

library(nlme)
lme(cog ~ time*program, data=early.int, random = ~ time | id, , method="ML")

Error in lme.formula(cog ~ time * program, data = early.int, random = ~time |  : 
  nlminb problem, convergence error code = 1
  message = iteration limit reached without convergence (10)

Hmm. Seemed to work for the UCLA person. What to do? Long story short, I needed to specify a different optimizer. The default optimizer for lme is “nlminb”. I needed to specify “optim”, like so:

model1 <- lme(cog ~ time*program, data=early.int, random = ~ time | id, 
              method="ML", control=list(opt = "optim"))
summary(model1)

That worked! Or at least it gave me some output. Upon closer inspection the random effects were slightly different, but everything else looked about the same. Why does “optim” work and not “nlminb”? I have no idea. Obviously they go about optimization in different ways. But I’m not too worried at the moment since I’m just getting my feet wet with multilevel modeling.

In researching the error above I discovered I could also fit the multilevel model using the following code:

library(lme4)
mod1 <- lmer(cog ~ time*program + (time | id), early.int)
summary(mod1)

The output from this is also slightly different from the book as well as the output from the lme call above using “optim”. The coefficients of the fixed effects are all the same, but the random effects are slightly different.

So what does all this mean? How do you interpret the output? What is multilevel modeling? Well, that wasn’t quite the point of this post. I wanted to document for posterity how to replicate the results of ALDA Chapter 3 using R despite the message to the contrary on the UCLA site. Besides, if you’ve read this far then you're probably also working through ALDA and don't need me to repeat what's in the book. However, if you insist, I point you to this solid PowerPoint presentation that boils Chapter 3 of ALDA to its essence. It tells you everything you need to know about modeling this example and interpreting the results.

statistics you can probably trust

Help with learning statistics