Predicted means for a PO model

I was reading section 7.6 of Harrell’s BBR and came across this example that shows how a proportional odds (PO) model can be used in place of a Kruskal-Wallis test.

Here’s the code presented to generate some data:

set.seed(1)
group <- rep(c('A','B','C','D'), 100)
y <- rnorm(400, 100, 15) + 10*(group == 'B') + 20*(group=='C') + 30*(group=='D')

Notice these are four samples from normal distributions with the same variances but different means. No special test is needed. But Harrell uses this data to demonstrate how well the PO model performs even when it isn’t needed.

Here’s the analysis using the orm() function from the rms package.

library(rms)
dd <- datadist(group, y); options(datadist='dd')
f <- orm(y ~ group)
f    # use LR chi-square test as replacement for Kruskal-Wallis

## Logistic (Proportional Odds) Ordinal Regression Model
## 
## orm(formula = y ~ group)
## 
##                           Model Likelihood               Discrimination    Rank Discrim.    
##                                 Ratio Test                      Indexes          Indexes    
## Obs              400    LR chi2     193.31    R2                  0.383    rho     0.633    
## ESS              400    d.f.             3    R2(3,400)           0.379    Dxy     0.426    
## Distinct Y       400    Pr(> chi2) <0.0001    R2(3,400)           0.379                     
## Median Y    115.4143    Score chi2  193.21    |Pr(Y>=median)-0.5| 0.256                     
## max |deriv|    2e-11    Pr(> chi2) <0.0001                                                  
## 
##         Coef   S.E.   Wald Z Pr(>|Z|)
## group=B 1.4221 0.2579  5.51  <0.0001 
## group=C 2.6624 0.2762  9.64  <0.0001 
## group=D 3.6606 0.2925 12.52  <0.0001

There’s a lot of output, but the LR chi-square test is the replacement for Kruskal-Wallis. LR chi2 = 193.31, p < 0.0001.

Now here comes the part that was new to me. The PO model can be used to estimate means from the model. Here’s how it’s done with rms.

M <- Mean(f)
Predict(f, group, fun=M)

##   group      yhat     lower    upper
## 1     A  99.32328  96.46657 102.1800
## 2     B 111.21326 108.27169 114.1548
## 3     C 121.63880 118.78670 124.4909
## 4     D 129.70290 127.07203 132.3338
## 
## Response variable (y):  
## 
## Limits are 0.95 confidence limits

These compare favorably to the sample means.

summarize(y, group, smean.cl.normal)

##   group         y     Lower    Upper
## 1     A  98.72953  95.81508 101.6440
## 2     B 111.69464 108.61130 114.7780
## 3     C 121.80841 118.93036 124.6865
## 4     D 130.05275 127.40318 132.7023

I thought a PO model returned probabilities. How does this work? Harrell explains it in a footnote:

The predicted mean for a set of covariate settings is obtained by using all the intercepts and betas to get exceedance probabilities for Y >= y, taking successive differences in those probabilities to get cell probabilities that Y = y, then multiplying cell probabilities by the value attached to them, and summing. This is the formula for the mean for a discrete distribution.

This blog post shows how to implement this.

The first thing we need to do is get the linear predictors. These are just the three model coefficients. We also need to append 0 to account for the reference level. But notice the indexing. There are 399 intercepts! That’s right, the orm() function works with continuous responses. Since we have 400 unique observations, we have 399 intercepts.

lp <- c(0, coef(f)[400:402])
lp

##           group=B  group=C  group=D 
## 0.000000 1.422081 2.662447 3.660646

Next we get the intercepts.

intercepts <- coef(f)[1:399]
head(intercepts)

## y>=63.9535568 y>=65.6631403  y>=65.721467 y>=66.0266597 y>=71.2846086 
##      4.887868      4.187155      3.774066      3.478702      3.247817 
## y>=71.4256683 
##      3.057693

Now add intercepts to all linear predictors. This creates a 4 x 399 matrix with one column per intercept. Each column contains log odds of each group exceeding the given intercept value.

xb <- sapply(intercepts, "+", lp)
xb[,1:4]

##         y>=63.9535568 y>=65.6631403 y>=65.721467 y>=66.0266597
##              4.887868      4.187155     3.774066      3.478702
## group=B      6.309950      5.609236     5.196147      4.900783
## group=C      7.550316      6.849603     6.436514      6.141149
## group=D      8.548514      7.847801     7.434712      7.139347

After that, we convert to exceedance probabilities (\(Y \ge y\)) using the inverse logit. We see in the first column that each group has very high probability of exceeding 63.9535568.

P <- matrix(plogis(xb), ncol = 399)
colnames(P) <- names(intercepts)
P[,1:4]

##      y>=63.9535568 y>=65.6631403 y>=65.721467 y>=66.0266597
## [1,]     0.9925189     0.9850378    0.9775567     0.9700757
## [2,]     0.9981852     0.9963495    0.9944926     0.9926142
## [3,]     0.9994743     0.9989412    0.9984006     0.9978522
## [4,]     0.9998062     0.9996095    0.9994099     0.9992074

To get cell probabilities (\(Y = y\)) we need to take successive differences in exceedance probabilities. We can do this with simple subtraction. Adding a column of 1s adds the probability ceiling of 1. Adding a column of 0s adds the probability floor of 0. The first column shows group probabilities for y = 61.11508, the “baseline” y value. The second column shows the group probabilities for y = 63.95356. And so on.

P <- cbind(1, P) - cbind(P, 0)
P[,1:4]

##                   y>=63.9535568 y>=65.6631403 y>=65.721467
## [1,] 0.0074810843  0.0074810843  0.0074810843 0.0074810843
## [2,] 0.0018148255  0.0018356660  0.0018568676 0.0018784386
## [3,] 0.0005256675  0.0005330871  0.0005406649 0.0005484054
## [4,] 0.0001937954  0.0001966622  0.0001995931 0.0002025900

Finally we multiply cell probabilities by the value attached to them and sum. This produces four means, one for each group.

m <- P %*% sort(y)
m

##           [,1]
## [1,]  99.32328
## [2,] 111.21326
## [3,] 121.63880
## [4,] 129.70290

We can also do this with the MASS function polr() and car function Anova(). The catch is we need to define y as an ordered factor first.

library(MASS) # for polr()
library(car) # for Anova()
y_ord <- ordered(y)
f2 <- polr(y_ord ~ group)
car::Anova(f2)

## Analysis of Deviance Table (Type II tests)
## 
## Response: y_ord
##       LR Chisq Df Pr(>Chisq)    
## group   193.31  3  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

To get predicted means we follow the same steps above. The big difference is we need to reverse the sign on the intercepts, which are stored in the “zeta” element of the polr object. This is because the polr() function models \(P(Y \le y)\). This time let’s make a function for this.

mf <- function(model){
  lp <- c(0, coef(model))
  intercepts <- model$zeta
  xb <- sapply(-intercepts, "+", lp)
  P <- matrix(plogis(xb), ncol = length(f2$zeta))
  P <- cbind(1, P) - cbind(P, 0)
  m <- P %*% sort(y)
  m  
}

mf(f2)

##           [,1]
## [1,]  99.32265
## [2,] 111.21298
## [3,] 121.63863
## [4,] 129.70252

Clay Ford

Leave a Reply Cancel reply