{"id":180,"date":"2013-05-19T11:31:00","date_gmt":"2013-05-19T15:31:00","guid":{"rendered":"http:\/\/www.clayford.net\/statistics\/?p=180"},"modified":"2024-11-17T11:14:17","modified_gmt":"2024-11-17T16:14:17","slug":"a-logistic-regression-checklist","status":"publish","type":"post","link":"https:\/\/www.clayford.net\/statistics\/a-logistic-regression-checklist\/","title":{"rendered":"A Logistic Regression Checklist"},"content":{"rendered":"<p>I recently read <a href=\"http:\/\/www.amazon.com\/gp\/product\/0312430000\/ref=as_li_ss_tl?ie=UTF8&#038;camp=1789&#038;creative=390957&#038;creativeASIN=0312430000&#038;linkCode=as2&#038;tag=curiousanduseful\">The Checklist Manifesto<\/a> by Atul Gawande and was fascinated by how relatively simple checklists can improve performance and results in such complex endeavors as surgery or flying a commercial airplane. I decided I wanted to make a checklist of my own for Logistic regression. It ended up not being a checklist on how to do it per se, but rather a list of important facts to remember. Here&#8217;s what I came up with.<\/p>\n<ul>\n<li>Logistic regression models the probability that <em>y<\/em> = 1, \\( P(y_{i} = 1) = logit^{-1}(X_{i}\\beta) \\) where \\( logit^{-1} = \\frac{e^{x}}{1+e^{x}} \\)<\/li>\n<li>Logistic predictions are probabilistic. It predicts a probability that <em>y<\/em> = 1. It does not make a point prediction. <\/li>\n<li>The function \\( logit^{-1} = \\frac{e^{x}}{1+e^{x}} \\) transforms continuous values to the range (0,1).<\/li>\n<li>Dividing a regression coefficient by 4 will give an upper bound of the predictive difference corresponding to a unit difference in <em>x<\/em>. For example if \\(\\beta = 0.33\\), then \\(0.33\/4 = 0.08\\). This means a unit increase in <em>x<\/em> corresponds to no more than a 8% positive difference in the probability that <em>y<\/em> = 1.<\/li>\n<li>The odds of success (i.e., <em>y<\/em> = 1) increase multiplicatively by \\( e^{\\beta} \\) for every one-unit increase in <em>x<\/em>. That is, exponentiating logistic regression coefficients can be interpreted as odds ratios. For example, let&#8217;s say we have a regression coefficient of 0.497. Exponentiating gives \\( e^{0.497} = 1.64 \\). That means the odds of success increase by 64% for each one-unit increase in <em>x<\/em>. Recall that odds = \\( \\frac{p}{1-p} \\). If our predicted probability at <em>x<\/em> is 0.674, then the odds of success are \\(\\frac{0.674}{0.326} = 2.07 \\). Therefore at x + 1, the odds will increase by 64% from 2.07 to \\(2.07(1.64) = 3.40\\). Notice that \\( 1.64 = \\frac{3.40}{2.70}\\), which is an odds ratio. The ratio of odds of <em>x + 1<\/em> to <em>x<\/em> will always be \\( e^{\\beta}\\), where \\( \\beta\\) is a logistic regression coefficient.<\/li>\n<li>Plots of raw residuals from logistic regression are generally not useful. Instead it&#8217;s preferable to plot binned residuals &#8220;by dividing the data into categories (bins) based on their fitted values, and then plotting the average residual versus the average fitted value for each bin.&#8221; (Gelman &#038; Hill, p. 97). Example R code for doing this can be found <a href=\"http:\/\/www.stat.columbia.edu\/~gelman\/arm\/examples\/arsenic\/arsenic_chap5.R\">here<\/a>. <\/li>\n<li>The <em>error rate<\/em> is the proportion of cases in your model that predicts <em>y<\/em> = 1 when the case is actually <em>y<\/em> = 0 in the data. We predict <em>y<\/em> = 1 when the predicted probability exceeds 0.5. Otherwise we predict <em>y<\/em> = 0. It&#8217;s not good if your error rate equals the null rate. The null rate is usually the proportion of 0&#8217;s in your data. In other words, if you guessed all cases in your data are <em>y<\/em> = 1, then the null rate is the percentage you guessed wrong. Let&#8217;s say your data has 58% of <em>y<\/em> = 1 and 42% of <em>y<\/em> = 0, then the null rate is 42%. Further, let&#8217;s say you do some logistic regression on this data and your model has an error rate of 36%. That is, 36% of the time it predicts the wrong outcome. This means your model does only 4% better than simply guessing that all cases are <em>y<\/em> = 1.<\/li>\n<li>Deviance is a measure of error. Lower deviance is better. When an informative predictor is added to a model, we expect deviance to decrease by more than 1. If not, then we&#8217;ve likely added a non-informative predictor to the model that just adds noise.<\/li>\n<li>If a predictor <em>x<\/em> is completely aligned with the outcome so that <em>y<\/em> = 1 when <em>x<\/em> is above some threshold and <em>y<\/em> = 0 when <em>x<\/em> is below some threshold, then the coefficient estimate will explode to some gigantic value. This means the parameter cannot be estimated. This is an identifiability problem called <em>separation<\/em>.<\/li>\n<\/ul>\n<p>Most of this comes from Chapter 5 of <a href=\"http:\/\/www.amazon.com\/gp\/product\/052168689X\/ref=as_li_ss_tl?ie=UTF8&#038;camp=1789&#038;creative=390957&#038;creativeASIN=052168689X&#038;linkCode=as2&#038;tag=curiousanduseful\">Data Analysis Using Regression and Multilevel\/Hierarchical Models<\/a> by Gellman and Hill. I also pulled a little from chapter 5 of <a href=\"http:\/\/www.amazon.com\/gp\/product\/0471226181\/ref=as_li_ss_tl?ie=UTF8&#038;camp=1789&#038;creative=390957&#038;creativeASIN=0471226181&#038;linkCode=as2&#038;tag=curiousanduseful\">An Introduction to Categorical Data Analysis<\/a> by Agresti.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I recently read The Checklist Manifesto by Atul Gawande and was fascinated by how relatively simple checklists can improve performance&#8230; <a class=\"read-more\" href=\"https:\/\/www.clayford.net\/statistics\/a-logistic-regression-checklist\/\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12,13],"tags":[33],"class_list":["post-180","post","type-post","status-publish","format-standard","hentry","category-regression","category-using-r","tag-logistic-regression"],"_links":{"self":[{"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/posts\/180","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/comments?post=180"}],"version-history":[{"count":8,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/posts\/180\/revisions"}],"predecessor-version":[{"id":990,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/posts\/180\/revisions\/990"}],"wp:attachment":[{"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/media?parent=180"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/categories?post=180"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/tags?post=180"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}