I was reading section 7.6 of Harrell’s BBR and came across this example that shows how a proportional odds (PO) model can be used in place of a Kruskal-Wallis test.
Here’s the code presented to generate some data:
set.seed(1)
group <- rep(c('A','B','C','D'), 100)
y <- rnorm(400, 100, 15) + 10*(group == 'B') + 20*(group=='C') + 30*(group=='D')
Notice these are four samples from normal distributions with the same variances but different means. No special test is needed. But Harrell uses this data to demonstrate how well the PO model performs even when it isn’t needed.
Here’s the analysis using the orm() function from the rms package.
library(rms)
dd <- datadist(group, y); options(datadist='dd')
f <- orm(y ~ group)
f # use LR chi-square test as replacement for Kruskal-Wallis
## Logistic (Proportional Odds) Ordinal Regression Model
##
## orm(formula = y ~ group)
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 400 LR chi2 193.31 R2 0.383 rho 0.633
## ESS 400 d.f. 3 R2(3,400) 0.379 Dxy 0.426
## Distinct Y 400 Pr(> chi2) <0.0001 R2(3,400) 0.379
## Median Y 115.4143 Score chi2 193.21 |Pr(Y>=median)-0.5| 0.256
## max |deriv| 2e-11 Pr(> chi2) <0.0001
##
## Coef S.E. Wald Z Pr(>|Z|)
## group=B 1.4221 0.2579 5.51 <0.0001
## group=C 2.6624 0.2762 9.64 <0.0001
## group=D 3.6606 0.2925 12.52 <0.0001
There’s a lot of output, but the LR chi-square test is the replacement for Kruskal-Wallis. LR chi2 = 193.31, p < 0.0001.
Now here comes the part that was new to me. The PO model can be used to estimate means from the model. Here’s how it’s done with rms.
M <- Mean(f)
Predict(f, group, fun=M)
## group yhat lower upper
## 1 A 99.32328 96.46657 102.1800
## 2 B 111.21326 108.27169 114.1548
## 3 C 121.63880 118.78670 124.4909
## 4 D 129.70290 127.07203 132.3338
##
## Response variable (y):
##
## Limits are 0.95 confidence limits
These compare favorably to the sample means.
summarize(y, group, smean.cl.normal)
## group y Lower Upper
## 1 A 98.72953 95.81508 101.6440
## 2 B 111.69464 108.61130 114.7780
## 3 C 121.80841 118.93036 124.6865
## 4 D 130.05275 127.40318 132.7023
I thought a PO model returned probabilities. How does this work? Harrell explains it in a footnote:
The predicted mean for a set of covariate settings is obtained by using all the intercepts and betas to get exceedance probabilities for Y >= y, taking successive differences in those probabilities to get cell probabilities that Y = y, then multiplying cell probabilities by the value attached to them, and summing. This is the formula for the mean for a discrete distribution.
This blog post shows how to implement this.
The first thing we need to do is get the linear predictors. These are just the three model coefficients. We also need to append 0 to account for the reference level. But notice the indexing. There are 399 intercepts! That’s right, the orm() function works with continuous responses. Since we have 400 unique observations, we have 399 intercepts.
lp <- c(0, coef(f)[400:402])
lp
## group=B group=C group=D
## 0.000000 1.422081 2.662447 3.660646
Next we get the intercepts.
intercepts <- coef(f)[1:399]
head(intercepts)
## y>=63.9535568 y>=65.6631403 y>=65.721467 y>=66.0266597 y>=71.2846086
## 4.887868 4.187155 3.774066 3.478702 3.247817
## y>=71.4256683
## 3.057693
Now add intercepts to all linear predictors. This creates a 4 x 399 matrix with one column per intercept. Each column contains log odds of each group exceeding the given intercept value.
xb <- sapply(intercepts, "+", lp)
xb[,1:4]
## y>=63.9535568 y>=65.6631403 y>=65.721467 y>=66.0266597
## 4.887868 4.187155 3.774066 3.478702
## group=B 6.309950 5.609236 5.196147 4.900783
## group=C 7.550316 6.849603 6.436514 6.141149
## group=D 8.548514 7.847801 7.434712 7.139347
After that, we convert to exceedance probabilities (\(Y \ge y\)) using the inverse logit. We see in the first column that each group has very high probability of exceeding 63.9535568.
P <- matrix(plogis(xb), ncol = 399)
colnames(P) <- names(intercepts)
P[,1:4]
## y>=63.9535568 y>=65.6631403 y>=65.721467 y>=66.0266597
## [1,] 0.9925189 0.9850378 0.9775567 0.9700757
## [2,] 0.9981852 0.9963495 0.9944926 0.9926142
## [3,] 0.9994743 0.9989412 0.9984006 0.9978522
## [4,] 0.9998062 0.9996095 0.9994099 0.9992074
To get cell probabilities (\(Y = y\)) we need to take successive differences in exceedance probabilities. We can do this with simple subtraction. Adding a column of 1s adds the probability ceiling of 1. Adding a column of 0s adds the probability floor of 0. The first column shows group probabilities for y = 61.11508, the “baseline” y value. The second column shows the group probabilities for y = 63.95356. And so on.
P <- cbind(1, P) - cbind(P, 0)
P[,1:4]
## y>=63.9535568 y>=65.6631403 y>=65.721467
## [1,] 0.0074810843 0.0074810843 0.0074810843 0.0074810843
## [2,] 0.0018148255 0.0018356660 0.0018568676 0.0018784386
## [3,] 0.0005256675 0.0005330871 0.0005406649 0.0005484054
## [4,] 0.0001937954 0.0001966622 0.0001995931 0.0002025900
Finally we multiply cell probabilities by the value attached to them and sum. This produces four means, one for each group.
m <- P %*% sort(y)
m
## [,1]
## [1,] 99.32328
## [2,] 111.21326
## [3,] 121.63880
## [4,] 129.70290
We can also do this with the MASS function polr() and car function Anova(). The catch is we need to define y as an ordered factor first.
library(MASS) # for polr()
library(car) # for Anova()
y_ord <- ordered(y)
f2 <- polr(y_ord ~ group)
car::Anova(f2)
## Analysis of Deviance Table (Type II tests)
##
## Response: y_ord
## LR Chisq Df Pr(>Chisq)
## group 193.31 3 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
To get predicted means we follow the same steps above. The big difference is we need to reverse the sign on the intercepts, which are stored in the “zeta” element of the polr object. This is because the polr() function models \(P(Y \le y)\). This time let’s make a function for this.
mf <- function(model){
lp <- c(0, coef(model))
intercepts <- model$zeta
xb <- sapply(-intercepts, "+", lp)
P <- matrix(plogis(xb), ncol = length(f2$zeta))
P <- cbind(1, P) - cbind(P, 0)
m <- P %*% sort(y)
m
}
mf(f2)
## [,1]
## [1,] 99.32265
## [2,] 111.21298
## [3,] 121.63863
## [4,] 129.70252