{"id":1026,"date":"2025-11-26T09:23:59","date_gmt":"2025-11-26T14:23:59","guid":{"rendered":"https:\/\/www.clayford.net\/statistics\/?p=1026"},"modified":"2025-11-26T09:23:59","modified_gmt":"2025-11-26T14:23:59","slug":"predicted-means-for-a-po-model","status":"publish","type":"post","link":"https:\/\/www.clayford.net\/statistics\/predicted-means-for-a-po-model\/","title":{"rendered":"Predicted means for a PO model"},"content":{"rendered":"<p>I was reading <a href=\"https:\/\/hbiostat.org\/bbr\/nonpar#generalization-of-the-wilcoxonkruskal-wallis-test\">section 7.6<\/a> of Harrell\u2019s BBR and came across this example that shows how a proportional odds (PO) model can be used in place of a Kruskal-Wallis test.<\/p>\n<p>Here\u2019s the code presented to generate some data:<\/p>\n<pre class=\"r\"><code>set.seed(1)\r\ngroup &lt;- rep(c(&#39;A&#39;,&#39;B&#39;,&#39;C&#39;,&#39;D&#39;), 100)\r\ny &lt;- rnorm(400, 100, 15) + 10*(group == &#39;B&#39;) + 20*(group==&#39;C&#39;) + 30*(group==&#39;D&#39;)<\/code><\/pre>\n<p>Notice these are four samples from normal distributions with the same variances but different means. No special test is needed. But Harrell uses this data to demonstrate how well the PO model performs even when it isn\u2019t needed.<\/p>\n<p>Here\u2019s the analysis using the <code>orm()<\/code> function from the rms package.<\/p>\n<pre class=\"r\"><code>library(rms)\r\ndd &lt;- datadist(group, y); options(datadist=&#39;dd&#39;)\r\nf &lt;- orm(y ~ group)\r\nf    # use LR chi-square test as replacement for Kruskal-Wallis<\/code><\/pre>\n<pre><code>## Logistic (Proportional Odds) Ordinal Regression Model\r\n## \r\n## orm(formula = y ~ group)\r\n## \r\n##                           Model Likelihood               Discrimination    Rank Discrim.    \r\n##                                 Ratio Test                      Indexes          Indexes    \r\n## Obs              400    LR chi2     193.31    R2                  0.383    rho     0.633    \r\n## ESS              400    d.f.             3    R2(3,400)           0.379    Dxy     0.426    \r\n## Distinct Y       400    Pr(&gt; chi2) &lt;0.0001    R2(3,400)           0.379                     \r\n## Median Y    115.4143    Score chi2  193.21    |Pr(Y&gt;=median)-0.5| 0.256                     \r\n## max |deriv|    2e-11    Pr(&gt; chi2) &lt;0.0001                                                  \r\n## \r\n##         Coef   S.E.   Wald Z Pr(&gt;|Z|)\r\n## group=B 1.4221 0.2579  5.51  &lt;0.0001 \r\n## group=C 2.6624 0.2762  9.64  &lt;0.0001 \r\n## group=D 3.6606 0.2925 12.52  &lt;0.0001<\/code><\/pre>\n<p>There\u2019s a lot of output, but the LR chi-square test is the replacement for Kruskal-Wallis. LR chi2 = 193.31, p &lt; 0.0001.<\/p>\n<p>Now here comes the part that was new to me. The PO model can be used to estimate <em>means<\/em> from the model. Here\u2019s how it\u2019s done with rms.<\/p>\n<pre class=\"r\"><code>M &lt;- Mean(f)\r\nPredict(f, group, fun=M)<\/code><\/pre>\n<pre><code>##   group      yhat     lower    upper\r\n## 1     A  99.32328  96.46657 102.1800\r\n## 2     B 111.21326 108.27169 114.1548\r\n## 3     C 121.63880 118.78670 124.4909\r\n## 4     D 129.70290 127.07203 132.3338\r\n## \r\n## Response variable (y):  \r\n## \r\n## Limits are 0.95 confidence limits<\/code><\/pre>\n<p>These compare favorably to the sample means.<\/p>\n<pre class=\"r\"><code>summarize(y, group, smean.cl.normal)<\/code><\/pre>\n<pre><code>##   group         y     Lower    Upper\r\n## 1     A  98.72953  95.81508 101.6440\r\n## 2     B 111.69464 108.61130 114.7780\r\n## 3     C 121.80841 118.93036 124.6865\r\n## 4     D 130.05275 127.40318 132.7023<\/code><\/pre>\n<p>I thought a PO model returned probabilities. How does this work? Harrell explains it in a footnote:<\/p>\n<blockquote>\n<p>The predicted mean for a set of covariate settings is obtained by using all the intercepts and betas to get exceedance probabilities for Y &gt;= y, taking successive differences in those probabilities to get cell probabilities that Y = y, then multiplying cell probabilities by the value attached to them, and summing. This is the formula for the mean for a discrete distribution.<\/p>\n<\/blockquote>\n<p>This blog post shows how to implement this.<\/p>\n<p>The first thing we need to do is get the linear predictors. These are just the three model coefficients. We also need to append 0 to account for the reference level. But notice the indexing. There are 399 intercepts! That\u2019s right, the <code>orm()<\/code> function works with continuous responses. Since we have 400 unique observations, we have 399 intercepts.<\/p>\n<pre class=\"r\"><code>lp &lt;- c(0, coef(f)[400:402])\r\nlp<\/code><\/pre>\n<pre><code>##           group=B  group=C  group=D \r\n## 0.000000 1.422081 2.662447 3.660646<\/code><\/pre>\n<p>Next we get the intercepts.<\/p>\n<pre class=\"r\"><code>intercepts &lt;- coef(f)[1:399]\r\nhead(intercepts)<\/code><\/pre>\n<pre><code>## y&gt;=63.9535568 y&gt;=65.6631403  y&gt;=65.721467 y&gt;=66.0266597 y&gt;=71.2846086 \r\n##      4.887868      4.187155      3.774066      3.478702      3.247817 \r\n## y&gt;=71.4256683 \r\n##      3.057693<\/code><\/pre>\n<p>Now add intercepts to all linear predictors. This creates a 4 x 399 matrix with one column per intercept. Each column contains log odds of each group exceeding the given intercept value.<\/p>\n<pre class=\"r\"><code>xb &lt;- sapply(intercepts, &quot;+&quot;, lp)\r\nxb[,1:4]<\/code><\/pre>\n<pre><code>##         y&gt;=63.9535568 y&gt;=65.6631403 y&gt;=65.721467 y&gt;=66.0266597\r\n##              4.887868      4.187155     3.774066      3.478702\r\n## group=B      6.309950      5.609236     5.196147      4.900783\r\n## group=C      7.550316      6.849603     6.436514      6.141149\r\n## group=D      8.548514      7.847801     7.434712      7.139347<\/code><\/pre>\n<p>After that, we convert to <em>exceedance probabilities<\/em> (<span class=\"math inline\">\\(Y \\ge y\\)<\/span>) using the inverse logit. We see in the first column that each group has very high probability of exceeding 63.9535568.<\/p>\n<pre class=\"r\"><code>P &lt;- matrix(plogis(xb), ncol = 399)\r\ncolnames(P) &lt;- names(intercepts)\r\nP[,1:4]<\/code><\/pre>\n<pre><code>##      y&gt;=63.9535568 y&gt;=65.6631403 y&gt;=65.721467 y&gt;=66.0266597\r\n## [1,]     0.9925189     0.9850378    0.9775567     0.9700757\r\n## [2,]     0.9981852     0.9963495    0.9944926     0.9926142\r\n## [3,]     0.9994743     0.9989412    0.9984006     0.9978522\r\n## [4,]     0.9998062     0.9996095    0.9994099     0.9992074<\/code><\/pre>\n<p>To get <em>cell probabilities<\/em> (<span class=\"math inline\">\\(Y = y\\)<\/span>) we need to take successive differences in exceedance probabilities. We can do this with simple subtraction. Adding a column of 1s adds the probability ceiling of 1. Adding a column of 0s adds the probability floor of 0. The first column shows group probabilities for y = 61.11508, the \u201cbaseline\u201d y value. The second column shows the group probabilities for y = 63.95356. And so on.<\/p>\n<pre class=\"r\"><code>P &lt;- cbind(1, P) - cbind(P, 0)\r\nP[,1:4]<\/code><\/pre>\n<pre><code>##                   y&gt;=63.9535568 y&gt;=65.6631403 y&gt;=65.721467\r\n## [1,] 0.0074810843  0.0074810843  0.0074810843 0.0074810843\r\n## [2,] 0.0018148255  0.0018356660  0.0018568676 0.0018784386\r\n## [3,] 0.0005256675  0.0005330871  0.0005406649 0.0005484054\r\n## [4,] 0.0001937954  0.0001966622  0.0001995931 0.0002025900<\/code><\/pre>\n<p>Finally we multiply cell probabilities by the value attached to them and sum. This produces four means, one for each group.<\/p>\n<pre class=\"r\"><code>m &lt;- P %*% sort(y)\r\nm<\/code><\/pre>\n<pre><code>##           [,1]\r\n## [1,]  99.32328\r\n## [2,] 111.21326\r\n## [3,] 121.63880\r\n## [4,] 129.70290<\/code><\/pre>\n<p>We can also do this with the MASS function <code>polr()<\/code> and car function <code>Anova()<\/code>. The catch is we need to define y as an ordered factor first.<\/p>\n<pre class=\"r\"><code>library(MASS) # for polr()\r\nlibrary(car) # for Anova()\r\ny_ord &lt;- ordered(y)\r\nf2 &lt;- polr(y_ord ~ group)\r\ncar::Anova(f2)<\/code><\/pre>\n<pre><code>## Analysis of Deviance Table (Type II tests)\r\n## \r\n## Response: y_ord\r\n##       LR Chisq Df Pr(&gt;Chisq)    \r\n## group   193.31  3  &lt; 2.2e-16 ***\r\n## ---\r\n## Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1<\/code><\/pre>\n<p>To get predicted means we follow the same steps above. The big difference is we need to reverse the sign on the intercepts, which are stored in the \u201czeta\u201d element of the polr object. This is because the <code>polr()<\/code> function models <span class=\"math inline\">\\(P(Y \\le y)\\)<\/span>. This time let\u2019s make a function for this.<\/p>\n<pre class=\"r\"><code>mf &lt;- function(model){\r\n  lp &lt;- c(0, coef(model))\r\n  intercepts &lt;- model$zeta\r\n  xb &lt;- sapply(-intercepts, &quot;+&quot;, lp)\r\n  P &lt;- matrix(plogis(xb), ncol = length(f2$zeta))\r\n  P &lt;- cbind(1, P) - cbind(P, 0)\r\n  m &lt;- P %*% sort(y)\r\n  m  \r\n}\r\n\r\nmf(f2)<\/code><\/pre>\n<pre><code>##           [,1]\r\n## [1,]  99.32265\r\n## [2,] 111.21298\r\n## [3,] 121.63863\r\n## [4,] 129.70252<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>I was reading section 7.6 of Harrell\u2019s BBR and came across this example that shows how a proportional odds (PO)&#8230; <a class=\"read-more\" href=\"https:\/\/www.clayford.net\/statistics\/predicted-means-for-a-po-model\/\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12],"tags":[94,93],"class_list":["post-1026","post","type-post","status-publish","format-standard","hentry","category-regression","tag-po-model","tag-rms"],"_links":{"self":[{"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/posts\/1026","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/comments?post=1026"}],"version-history":[{"count":1,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/posts\/1026\/revisions"}],"predecessor-version":[{"id":1027,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/posts\/1026\/revisions\/1027"}],"wp:attachment":[{"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/media?parent=1026"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/categories?post=1026"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/tags?post=1026"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}