Tag Archives: partial residuals

Partial Residuals and the termplot function

Earlier this year I taught an Intro to R Graphics workshop. While preparing for it I came across the termplot function. I had no idea it existed. Of course now that I know about it, I come across it fairly regularly. (That's the Baader-Meinhof Phenomenon for you.)

?termplot will tell you what it does: “Plots regression terms against their predictors, optionally with standard errors and partial residuals added.” If you read on, you'll see “Nothing sensible happens for interaction terms, and they may cause errors.” Duly noted.

Let's try this out and see what it does.

First we'll fit a model using the mtcars dataset that comes with R. Below mpg = Miles/(US) gallon, drat = Rear axle ratio, qsec = ¼ mile time, disp = Displacement (cu.in.), and wt = Weight (lb/1000).

lm.cars <- lm(mpg ~ wt + disp + drat + qsec, data=mtcars)
## Call:
## lm(formula = mpg ~ wt + disp + drat + qsec, data = mtcars)
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.143 -1.933 -0.149  0.919  5.541 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.03223    9.58829    0.94  0.35454    
## wt          -4.86768    1.20872   -4.03  0.00041 ***
## disp         0.00522    0.01105    0.47  0.64021    
## drat         1.87086    1.32462    1.41  0.16926    
## qsec         1.05248    0.34778    3.03  0.00539 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 2.6 on 27 degrees of freedom
## Multiple R-squared:  0.838,  Adjusted R-squared:  0.814 
## F-statistic:   35 on 4 and 27 DF,  p-value: 2.55e-10

plot of chunk unnamed-chunk-1

Compare the coefficients in the summary output to the slopes of the lines in the graphs. See how the coefficients match the slopes of the lines? That's because they are the slopes of the lines. For example, wt has a coefficient of about -5. Likewise the termplot for that coefficient has a negative slope that appears to be about -5. Using these plots we can see at a glance the respective contributions of the predictors. wt and qsec seem to contribute to the model due to the steepness of their slopes, but disp and drat not so much. This matches the summary output that shows wt and qsec as being significant.

Now termplot also has an argument called partial.resid that will add partial residuals if set to TRUE. Let's try that out, with color set to “purple” since the default gray can be hard to see (at least to my eyes):

termplot(lm.cars, partial.resid=TRUE, col.res = "purple")

plot of chunk unnamed-chunk-2

How are these points generated? The x-axis is clear enough. That's just the predictor values. But the y-axis says “Partial for [variable name]”, i.e. partial residuals. What does that mean? Put as simply as possible, partial residuals are the sum of the residuals and predictor terms. Hopefully you know what residuals are. Those are just the difference between the observed and fitted response values. “Predictor terms” is a little trickier.

To understand them, it helps to know that

\[y = \bar{y} + b_{1}(x_{1} – \bar{x}_{1}) + b_{2}(x_{2} – \bar{x}_{2}) + b_{3}(x_{3} – \bar{x}_{3}) + b_{4}(x_{4} – \bar{x}_{4}) + e\]

is another way of writing

\[y = b_{0} + b_{1}x_{1} + b_{2}x_{2}+ b_{3}x_{3}+ b_{4}x_{4} + e\]

So the first equation above has \(\bar{y}\) as the intercept and centered predictors. If we center the predictors and fit a model, we'll see that's precisely what we get:

# center the variables
mtcars <- transform(mtcars, wtC = wt-mean(wt), dispC = disp-mean(disp),
                    dratC=drat-mean(drat), qsecC=qsec-mean(qsec))
# now fit model using centered predictors
lm.terms.cars <- lm(mpg ~ wtC + dispC + dratC + qsecC, data=mtcars)
## (Intercept) 
##       20.09
## [1] 20.09

Now if we take the centered predictor values and multiply them by their respective coefficients, we get what are referred to in R as terms. For example, the terms for the first record in the data set are as follows:

# take all coefficients but the intercept...
##       wtC     dispC     dratC     qsecC 
## -4.867682  0.005222  1.870855  1.052478
# ...and first record of data...
##               wtC  dispC  dratC  qsecC
## Mazda RX4 -0.5972 -70.72 0.3034 -1.389
# ...and multiply:
##       Mazda RX4
## wtC      2.9072
## dispC   -0.3693
## dratC    0.5677
## qsecC   -1.4616

So for the first record, the term due to wt is 2.91, the term due to disp is -0.37, and so on. These are the predictor terms I mentioned earlier. These are what we add to the residuals to make partial residuals.

You'll be happy to know we don't have to center predictors and fit a new model to extract terms. R can do that for us on the original model object when you specify type="terms" with the predict function, like so:

# notice we're calling this on the original model object, lm.cars
pterms <- predict(lm.cars, type="terms")
pterms[1,] # compare to above; the same
##      wt    disp    drat    qsec 
##  2.9072 -0.3693  0.5677 -1.4616

To calculate the partial residuals ourselves and make our own partial residual plots, we can do this:

# add residuals to each column of pterms (i.e., the terms for each predictor)
partial.residuals <- apply(pterms,2,function(x)x+resid(lm.cars))
# create our own termplot of partial residuals
# the model in lm.cars includes the response in first column, so we index with i + 1
for(i in 1:4){
  plot(x=lm.cars$model[,(i+1)],y=partial.residuals[,i], col="purple", ylim=c(-10,10))
  abline(lm(partial.residuals[,i] ~ lm.cars$model[,(i+1)]), col="red")

plot of chunk unnamed-chunk-6

So that's what termplot does for us: it takes the terms for each predictor, adds the residuals to the terms to create partial residuals, and then plots partial residuals versus their respective predictor, (if you specify partial.resid=TRUE). And of course it plots a fitted line, the result of regressing the predictor's partial residuals on itself. Recall ealier I mentioned the slope of the line in the termplot graphs is the coefficient in the summary output. We can indeed verify that:

# coefficient for wt
##     wt 
## -4.868
coef(lm(partial.residuals[,1] ~ mtcars$wt))[2]
## mtcars$wt 
##    -4.868

Ultimately what this whole partial residual/termplot thing is doing is splitting the response value into different parts: an overall mean, a term that is due to wt, a term that is due to disp, a term that is due to drat, and a term that is due to qsec. And you have the residual. So when we create the partial residual for wt, what we're doing is adding the wt term and the residual. This sum accounts for the part of the response not explained by the other terms.

Again let's look at the overall mean, the terms for the first observation and its residual:

# overall mean of response
## [1] 20.09
# terms of first observation
##      wt    disp    drat    qsec 
##  2.9072 -0.3693  0.5677 -1.4616
# the residual of the first observation
## Mazda RX4 
##   -0.7346

If we add those up, we get the observed value

# add overall mean, terms and residual for first observation
mean(mtcars$mpg) + sum(pterms[1,]) + resid(lm.cars)[1]
## Mazda RX4 
##        21
# same as observed in data
## [1] 21

So the wt term value of 2.9072 represents the part of the response that is not explained by the other terms.

Finally, we can add another argument to termplot, smooth=panel.smooth that will draw a smooth lowess line through the points. This can help us assess departures from linearity.

termplot(lm.cars, partial.resid=TRUE, col.res = "purple", smooth=panel.smooth)

plot of chunk unnamed-chunk-10

Notice how the dashed line bends around the straight line for wt. Perhaps wt requires a quadratic term? Let's try that using the poly function, which creates orthogonal polynomials.

lm.cars2 <- lm(mpg ~ poly(wt,2) + disp + drat + qsec, data=mtcars)
termplot(lm.cars2, partial.resid=TRUE, col.res = "purple", smooth=panel.smooth)

plot of chunk unnamed-chunk-11

We now see a better fit with the regressed line matching closely to the smooth line.