The Multilevel Model for Change (Ch 3 of ALDA)

ALDA stands for the book, Applied Longitudinal Data Analysis, by Singer and Willett. I’ve had this lying around for a while and decided to give it another go. If you know a little about regression and want to get started with data analysis using multilevel models, this is probably a good place to start. It’s light on the math and heavy on application. Well, maybe not that heavy. They don’t actually provide anything in the way of computer code when it comes time to fit a model. They just do it and produce a table of results. Fortunately, our friends at the UCLA Statistical Consulting Group have generously provided programs to reproduce the examples in the book using R, SAS, and Stata, among others. So that’s where I turned when I wanted to replicate the example running through chapter 3.

The example involves 103 African-American infants born into low-income families. When the children were 6 months old, about half were randomly assigned to an early intervention program designed to enhance cognitive functioning. The other half received no intervention and acted as a control. Cognitive performance was measured at ages 12, 18 and 24 months by way of a test. The researches wanted to examine the effects of program participation on the test results. There’s actually a lot more to this study. Here’s the abstract. But the authors scaled it back and took a portion of the data to create a friendly introductory example to multilevel modeling. And I have to say, they succeeded. It is indeed an easy-to-follow example. The chapter can be read in one sitting and gives you a sense of what multilevel modeling is all about. In this case we want to model individual change in cognitive ability over time with a straight line,

y = \beta_{0} + \beta_{1}(time)

but then also model the intercept and slope of that line with respect to program:

\beta_{0} = \gamma_{00} + \gamma_{01}(program)
\beta_{1} = \gamma_{10} + \gamma_{11}(program)

So we have two levels of modeling going on. Hence, “multilevel” modeling.

Naturally I wanted to replicate the Chapter 3 results in R. That’s why I’m reading the book, so I can learn how to do this stuff. So I hit up the UCLA site to get the necessary R code. And there at the top the page, before I can even get started, is a roadblock:

Please note that the “early_int” data file (which is used in Chapter 3) is not included among the data files. This was done at the request of the researchers who contributed this data file to ensure the privacy of the participants in the study. Although the web page shows how to obtain the results with this data file, we regret that visitors do not have access to this file to be able to replicate the results for themselves.

Seriously? C’mon. Did the authors not think about this when writing the book? Couldn’t they come up with a different example that can be replicated? Very frustrating. So I did what any honorable statistician would do and gave up, turned off the computer and went outside to enjoy nature Googled the name of the dataset used in the R code on the UCLA web site (earlyint_pp.txt). The very first result takes you to this Harvard web page, where the data in question is available in SAS and Stata formats. Now the authors of ALDA are both Harvard professors. And here is the data – that I’m not supposed to have access to – available on a Harvard web page for an old statistics class taught by one of the authors of the book. I guess the researchers changed their mind? Anyway, I now had the data and could try to replicate the results. And you’ll be happy to know the data contained no identifying information. It had four variables: (1) an ID number for the infant, (2) the test score, (3) age at test, and (4) an indicator for program participation.

I downloaded the Stata dataset because I can read that into R using the foreign package:

library(foreign) <- read.dta("earlyint_pp.dta")

With the data loaded, I was ready to start replicating the results. This required me to copy and paste the R code from the UCLA web site. I don’t usually go for copying-and-pasting, but in this case I was OK with it. It was mostly exploratory analysis. And it worked like a charm. That is until I got to the part where you actually fit a multilevel model to the data. Running the following R code produced an error:

lme(cog ~ time*program,, random = ~ time | id, , method="ML")

Error in lme.formula(cog ~ time * program, data =, random = ~time |  : 
  nlminb problem, convergence error code = 1
  message = iteration limit reached without convergence (10)

Hmm. Seemed to work for the UCLA person. What to do? Long story short, I needed to specify a different optimizer. The default optimizer for lme is “nlminb”. I needed to specify “optim”, like so:

model1 <- lme(cog ~ time*program,, random = ~ time | id, 
              method="ML", control=list(opt = "optim"))

That worked! Or at least it gave me some output. Upon closer inspection the random effects were slightly different, but everything else looked about the same. Why does “optim” work and not “nlminb”? I have no idea. Obviously they go about optimization in different ways. But I’m not too worried at the moment since I’m just getting my feet wet with multilevel modeling.

In researching the error above I discovered I could also fit the multilevel model using the following code:

mod1 <- lmer(cog ~ time*program + (time | id),

The output from this is also slightly different from the book as well as the output from the lme call above using “optim”. The coefficients of the fixed effects are all the same, but the random effects are slightly different.

So what does all this mean? How do you interpret the output? What is multilevel modeling? Well, that wasn’t quite the point of this post. I wanted to document for posterity how to replicate the results of ALDA Chapter 3 using R despite the message to the contrary on the UCLA site. Besides, if you’ve read this far then you're probably also working through ALDA and don't need me to repeat what's in the book. However, if you insist, I point you to this solid PowerPoint presentation that boils Chapter 3 of ALDA to its essence. It tells you everything you need to know about modeling this example and interpreting the results.

10 thoughts on “The Multilevel Model for Change (Ch 3 of ALDA)

  1. Pingback: The Multilevel Model for Change (Ch 3 of ALDA) – revisited | Statistics you can Probably Trust

  2. John Muschelli

    Thanks for the data link, nice post.

    In case anyone comes looking around here,
    lmer from lme4 uses REML = TRUE as default, you are fitting with ML, so you may different results for both. Moreover, you may get slight differences in the SE of the beta coefficients:

    > model1
    > model1.lme summary(model1)
    Linear mixed-effects model fit by REML
    AIC BIC logLik
    2374.784 2404.547 -1179.392

    Random effects:
    Formula: ~time | id
    Structure: General positive-definite, Log-Cholesky parametrization
    StdDev Corr
    (Intercept) 11.295632 (Intr)
    time 3.650461 -0.91
    Residual 8.646211

    Fixed effects: cog ~ time * program
    Value Std.Error DF t-value p-value
    (Intercept) 107.84074 2.054203 204 52.49760 0.0000
    time -21.13333 1.902278 204 -11.10949 0.0000
    program 6.85466 2.737461 101 2.50402 0.0139
    time:program 5.27126 2.535004 204 2.07939 0.0388
    (Intr) time progrm
    time -0.638
    program -0.750 0.479
    time:program 0.479 -0.750 -0.638

    Standardized Within-Group Residuals:
    Min Q1 Med Q3 Max
    -2.30359362 -0.55634165 0.02384933 0.56399047 2.32448601

    Number of Observations: 309
    Number of Groups: 103
    > summary(model1.lme)
    Linear mixed model fit by REML [‘lmerMod’]
    Formula: cog ~ time * program + (time | id)

    REML criterion at convergence: 2358.7

    Scaled residuals:
    Min 1Q Median 3Q Max
    -2.32496 -0.56664 0.02557 0.56998 2.31593

    Random effects:
    Groups Name Variance Std.Dev. Corr
    id (Intercept) 126.76 11.259
    time 10.32 3.213 -1.00
    Residual 75.49 8.689
    Number of obs: 309, groups: id, 103

    Fixed effects:
    Estimate Std. Error t value
    (Intercept) 107.841 2.053 52.53
    time -21.133 1.893 -11.16
    program 6.855 2.736 2.51
    time:program 5.271 2.523 2.09

    Correlation of Fixed Effects:
    (Intr) time progrm
    time -0.638
    program -0.750 0.479
    time:progrm 0.479 -0.750 -0.638

  3. Lynette Bikos

    Although a few years later, I am working through ALDA. And like you, am completely bummed that the data for Chapter 3 is not available. Unfortunately, the website you linked is no longer available. Is there any chance you would share the datset? Thanks.

    1. Rommel Bunuan

      Do you mind sharing the data with me Lynette? I had a hard time reading it into R.

      1. Clay Ford Post author

        The data on the GitHub site appears to be in Stata format (DTA). You can read that into R using the haven package. For example, to read it in and create a data frame called “dat”:

        dat <- read_dta("earlyint_pp.dta")

        Just make sure you have the DTA file in your working directory. If that confuses you, then you can try using RStudio's Import dialog. Click the Import Dataset button and select "From Stata".

  4. Amber Ward

    Hi, this is great. Have you done anything on Ch2? I am not able to apply the code from here ( to my own dataset, as the r code isn’t explained at all. I am trying to create plots of fitted trajectories by level of potential predictor variables (e.g., males vs female). The code provided is here:

    # fitting the linear model by id, males only
    tolm <- tolerance.pp[male == 1 , ]
    fitmlist <- by(tolm, tolm$id, function(bydata) fitted.values(lm(tolerance ~ time, data=bydata)))
    fitm <- unlist(fitmlist)

    #appending the average for the whole group
    lm.m <- fitted( lm(tolerance ~ time, data=tolm) )
    names(lm.m) <- NULL
    fit.m2 <- c(fitm, lm.m[1:5])
    age.m <- c(tolm$age, seq(11,15))
    id.m <- c(tolm$id, rep(111, 5))

    #plotting the linear fit by id, males
    #id.m=111 denotes the average value for males
    interaction.plot(age.m, id.m, fit.m2, ylim=c(0, 4), xlab="AGE", ylab="TOLERANCE", lwd=1)

    I'm getting lost at the numbers – 111, 11, 15 etc – there's no indication of what these mean/do in this code (and thus what I should be inputting to run this code on my own dataset). Any help would be appreciated!

    1. Clay Ford Post author

      Hi Amber.

      Yeah, the UCLA code is rather complex, though explicit, which can be good when you’re learning this stuff. There are easier ways to do recreate Fig 2.7, which I share below. First, I’ll try to explain what each line is doing.

      # create a data frame of just the males
      tolm <- tolerance.pp[male == 1 , ]  
      # fit a linear model to each person (ie, by id) and get the fitted values  
      fitmlist <- by(tolm, tolm$id, function(bydata) 
                                    fitted.values(lm(tolerance ~ time, data=bydata)))   
      # the result of the previous line of code is a list object. The unlist()
      # function turns it into a vector.
      fitm <- unlist(fitmlist)     
      # now fit a linear model to the entire data set of males and get the fitted values
      lm.m <- fitted( lm(tolerance ~ time, data=tolm) )
      # strip out the "names" from the lm.m object. To see the names, enter lm.m in
      # the R console before running the next line.
      names(lm.m) <- NULL
      # combine the fitm vector with the first 5 values of the lm.m vector. Why only
      # the first 5? Because the first 5 are all the unique values. The just repeat
      # after that. The fit.m2 vector now contains fitted values for each male as well
      # as the overall average.
      fit.m2 <- c(fitm, lm.m[1:5])
      # combine the ages of the males with an extra vector of ages for the "average"
      # male. seq(11,15) returns 11, 12, 13, 14, 15
      age.m <- c(tolm$age, seq(11,15))
      # combine the male id numbers with an arbitrary id, 111. The makes it appear in
      # the middle of the legend, generated in the next line of code.
      id.m <- c(tolm$id, rep(111, 5))
      # Now make the plot. The first argument is the x-axis (age), the second argument
      # is the grouping variable for the separate lines (id), the third argument is
      # the y-axis. This isn't really what the interaction.plot() function was
      # intended to do, but it works.
      interaction.plot(age.m, id.m, fit.m2, ylim=c(0, 4), 
                       xlab="AGE", ylab="TOLERANCE", lwd=1)

      An easier and modern way to make the plot is as follows:

      # first load the ggplot2 and dplyr packages
      # get means by age for males and females
      mean_tol1 <- tolerance.pp %>% 
        group_by(male, age) %>% 
        summarise(tolerance = mean(tolerance))
      # create plot, grouped by id; let ggplot handle all the fitting of linear
      # models. The first geom_smooth() plots the individuals, the second plots the
      # mean. The facet_wrap() function breaks the plots into male versus female.
      ggplot(tolerance.pp, aes(x = age, y = tolerance, group = id)) +
        geom_smooth(color = "black", size = 0.5, se = F, method = "lm") +
        geom_smooth(aes(x = age, y = tolerance, group = 1), 
                    data = mean_tol1, 
                    size = 2, se = F, method = "lm") +
        facet_wrap(~ male) +

      Hope this helps!


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.