{"id":908,"date":"2023-10-24T07:27:23","date_gmt":"2023-10-24T11:27:23","guid":{"rendered":"https:\/\/www.clayford.net\/statistics\/?p=908"},"modified":"2023-11-11T08:31:22","modified_gmt":"2023-11-11T13:31:22","slug":"some-r-fundamentals","status":"publish","type":"post","link":"https:\/\/www.clayford.net\/statistics\/some-r-fundamentals\/","title":{"rendered":"Some R Fundamentals"},"content":{"rendered":"<p>I recently came across the book <a href=\"https:\/\/www.bioconductor.org\/help\/publications\/books\/r-programming-for-bioinformatics\/\" rel=\"noopener\" target=\"_blank\">R Programming for Bioinformatics<\/a> at my local library and decided to check it out. I don\u2019t do bioinformatics and the book is a little old (published in 2009), but I figured I would browse through it anyway. Chapter 2 is titled R Language Fundamentals. As I was flipping through it I found several little nuggets of information that I had either forgotten about over the years or never knew in the first place. I decided to document them here.<\/p>\n<p><strong>Variable names<\/strong><\/p>\n<p>Variable names cannot begin with a digit or underscore, and if they begin with a period they cannot be followed by a number. But we can bend these rules by quoting the names with backticks.<\/p>\n<pre class=\"r\"><code>`_evil` &lt;- &quot;probably not wise&quot;\r\n`_evil`<\/code><\/pre>\n<pre><code>## [1] &quot;probably not wise&quot;<\/code><\/pre>\n<pre class=\"r\"><code>`.666_number of the beast` &lt;- sqrt(666^2)\r\n`.666_number of the beast`<\/code><\/pre>\n<pre><code>## [1] 666<\/code><\/pre>\n<pre class=\"r\"><code>rm(`_evil`, `.666_number of the beast`)<\/code><\/pre>\n<p><strong>Attributes<\/strong><\/p>\n<p>Attributes can be attached to any R object except NULL. They can be useful for storing metadata among many other things. For example, add a source for a dataset.<\/p>\n<pre class=\"r\"><code>d &lt;- VADeaths\r\nattr(d, &quot;source&quot;) &lt;- &quot;Molyneaux, L., Gilliam, S. K., and Florant, L. C.(1947) Differences in Virginia death rates by color, sex, age, and rural or urban residence. American Sociological Review, 12, 525\u2013535.&quot;<\/code><\/pre>\n<p>To see the source:<\/p>\n<pre class=\"r\"><code>attr(d, &quot;source&quot;)<\/code><\/pre>\n<pre><code>## [1] &quot;Molyneaux, L., Gilliam, S. K., and Florant, L. C.(1947) Differences in Virginia death rates by color, sex, age, and rural or urban residence. American Sociological Review, 12, 525\u2013535.&quot;<\/code><\/pre>\n<p>To see all attributes of an object:<\/p>\n<pre class=\"r\"><code>attributes(d)<\/code><\/pre>\n<pre><code>## $dim\r\n## [1] 5 4\r\n## \r\n## $dimnames\r\n## $dimnames[[1]]\r\n## [1] &quot;50-54&quot; &quot;55-59&quot; &quot;60-64&quot; &quot;65-69&quot; &quot;70-74&quot;\r\n## \r\n## $dimnames[[2]]\r\n## [1] &quot;Rural Male&quot;   &quot;Rural Female&quot; &quot;Urban Male&quot;   &quot;Urban Female&quot;\r\n## \r\n## \r\n## $source\r\n## [1] &quot;Molyneaux, L., Gilliam, S. K., and Florant, L. C.(1947) Differences in Virginia death rates by color, sex, age, and rural or urban residence. American Sociological Review, 12, 525\u2013535.&quot;<\/code><\/pre>\n<p>To remove an attribute:<\/p>\n<pre class=\"r\"><code>attr(d, &quot;source&quot;) &lt;- NULL<\/code><\/pre>\n<p>Not all attributes are displayed when called on an object. For example, after fitting a linear model, it appears there are only two attributes.<\/p>\n<pre class=\"r\"><code>m &lt;- lm(dist ~ speed, data = cars)\r\nattributes(m)<\/code><\/pre>\n<pre><code>## $names\r\n##  [1] &quot;coefficients&quot;  &quot;residuals&quot;     &quot;effects&quot;       &quot;rank&quot;         \r\n##  [5] &quot;fitted.values&quot; &quot;assign&quot;        &quot;qr&quot;            &quot;df.residual&quot;  \r\n##  [9] &quot;xlevels&quot;       &quot;call&quot;          &quot;terms&quot;         &quot;model&quot;        \r\n## \r\n## $class\r\n## [1] &quot;lm&quot;<\/code><\/pre>\n<p>However, elements of the model object also have attributes. For example, the terms element has 10 attributes.<\/p>\n<pre class=\"r\"><code>out &lt;- attributes(m$terms)\r\nlength(out)<\/code><\/pre>\n<pre><code>## [1] 10<\/code><\/pre>\n<pre class=\"r\"><code>names(out)<\/code><\/pre>\n<pre><code>##  [1] &quot;variables&quot;    &quot;factors&quot;      &quot;term.labels&quot;  &quot;order&quot;        &quot;intercept&quot;   \r\n##  [6] &quot;response&quot;     &quot;class&quot;        &quot;.Environment&quot; &quot;predvars&quot;     &quot;dataClasses&quot;<\/code><\/pre>\n<pre class=\"r\"><code>attr(m$terms, &quot;factors&quot;)<\/code><\/pre>\n<pre><code>##       speed\r\n## dist      0\r\n## speed     1<\/code><\/pre>\n<p><strong>The colon operator<\/strong><\/p>\n<p>I often forget the colon operator can work with decimal values.<\/p>\n<pre class=\"r\"><code>2.5:10.5<\/code><\/pre>\n<pre><code>## [1]  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5<\/code><\/pre>\n<p>And can go backwards:<\/p>\n<pre class=\"r\"><code>10.2:1.2<\/code><\/pre>\n<pre><code>##  [1] 10.2  9.2  8.2  7.2  6.2  5.2  4.2  3.2  2.2  1.2<\/code><\/pre>\n<p><strong>zero length vectors<\/strong><\/p>\n<p>The sum of zero length vector is 0, but the product of a zero length vector is 1.<\/p>\n<pre class=\"r\"><code>x &lt;- numeric()\r\nlength(x)<\/code><\/pre>\n<pre><code>## [1] 0<\/code><\/pre>\n<pre class=\"r\"><code>sum(x)<\/code><\/pre>\n<pre><code>## [1] 0<\/code><\/pre>\n<pre class=\"r\"><code>prod(x)<\/code><\/pre>\n<pre><code>## [1] 1<\/code><\/pre>\n<p>This is ensures expected behavior when working with sums and products:<\/p>\n<pre class=\"r\"><code># 12 + 0\r\nsum(12, x)<\/code><\/pre>\n<pre><code>## [1] 12<\/code><\/pre>\n<pre class=\"r\"><code># 12 * 1\r\nprod(12, x)<\/code><\/pre>\n<pre><code>## [1] 12<\/code><\/pre>\n<p><strong>.Machine<\/strong><\/p>\n<p>The <code>.Machine<\/code> variable holds information about the numerical characteristics of your machine. For example, the largest integer my machine can represent:<\/p>\n<pre class=\"r\"><code>.Machine$integer.max<\/code><\/pre>\n<pre><code>## [1] 2147483647<\/code><\/pre>\n<p>If I add 1 to that, the result is numeric, not an integer.<\/p>\n<pre class=\"r\"><code>x &lt;- .Machine$integer.max\r\nx2 &lt;- x + 1\r\nis.integer(x2)<\/code><\/pre>\n<pre><code>## [1] FALSE<\/code><\/pre>\n<p>If I add 1L (an explicit integer) to that, the result is a warning and a NA. My machine cannot represent that integer.<\/p>\n<pre class=\"r\"><code>x2 &lt;- x + 1L<\/code><\/pre>\n<pre><code>## Warning in x + 1L: NAs produced by integer overflow<\/code><\/pre>\n<pre class=\"r\"><code>x2<\/code><\/pre>\n<pre><code>## [1] NA<\/code><\/pre>\n<p><strong>Recoding factors<\/strong><\/p>\n<p>There are several convenience functions in other packages for recoding variables such as <code>recode<\/code> in the {car} package, <code>case_when<\/code> in {dplyr}, and a bunch of functions in the {forcats} package. But it\u2019s good to remember how to use base R to recode factors. Create a list with the recoding definitions and assign to the levels of the factor.<\/p>\n<pre class=\"r\"><code>g &lt;- sample(letters[1:5], 30, replace = TRUE)\r\ng &lt;- factor(g)\r\ng<\/code><\/pre>\n<pre><code>##  [1] e c d c d c b a e c a d e e d b e c b c b c b d d c b a d e\r\n## Levels: a b c d e<\/code><\/pre>\n<p>Put \u201ca\u201d and \u201cb\u201d into one group, \u201cc\u201d and \u201cd\u201d into another group, and keep \u201ce\u201d in it\u2019s own group.<\/p>\n<pre class=\"r\"><code>lst &lt;- list(&quot;A&quot; = c(&quot;a&quot;, &quot;b&quot;),\r\n            &quot;B&quot; = c(&quot;c&quot;, &quot;d&quot;),\r\n            &quot;C&quot; = &quot;e&quot;)\r\nlevels(g) &lt;- lst\r\ng<\/code><\/pre>\n<pre><code>##  [1] C B B B B B A A C B A B C C B A C B A B A B A B B B A A B C\r\n## Levels: A B C<\/code><\/pre>\n<p>If we like we can add an attribute to store the definition.<\/p>\n<pre class=\"r\"><code>attr(g, &quot;recoding&quot;) &lt;- c(&quot;A = {ab}, B = {cd}, C = {e}&quot;)\r\ng<\/code><\/pre>\n<pre><code>##  [1] C B B B B B A A C B A B C C B A C B A B A B A B B B A A B C\r\n## attr(,&quot;recoding&quot;)\r\n## [1] A = {ab}, B = {cd}, C = {e}\r\n## Levels: A B C<\/code><\/pre>\n<p><strong>lists can have dimensions<\/strong><\/p>\n<p>Something more interesting than applicable is that lists can have dimensions.<\/p>\n<pre class=\"r\"><code>M &lt;- as.table(rbind(c(762, 327, 468), c(484, 239, 477)))\r\nXsq &lt;- chisq.test(M) # produces 9 element list\r\nXsq &lt;- unclass(Xsq) # remove htest class\r\ndim(Xsq) &lt;- c(3,3)\r\nXsq<\/code><\/pre>\n<pre><code>##      [,1]         [,2]                         [,3]     \r\n## [1,] 30.07015     &quot;Pearson&#39;s Chi-squared test&quot; numeric,6\r\n## [2,] 2            &quot;M&quot;                          table,6  \r\n## [3,] 2.953589e-07 table,6                      table,6<\/code><\/pre>\n<pre class=\"r\"><code>Xsq[1,3]<\/code><\/pre>\n<pre><code>## [[1]]\r\n##          A        B        C\r\n## A 703.6714 319.6453 533.6834\r\n## B 542.3286 246.3547 411.3166<\/code><\/pre>\n<p><strong>Environments<\/strong><\/p>\n<p>We are not restricted to creating objects in the Global Environment. We can create our own environments using the <code>new.env()<\/code> function and then create objects in that environment. We can use the dollar sign operator or the <code>assign()<\/code> function.<\/p>\n<pre class=\"r\"><code>e1 &lt;- new.env()\r\ne1$mod &lt;- lm(dist ~ speed, data = cars)\r\ne1$cumTotal &lt;- function(x)tail(cumsum(x), n = 1)\r\nassign(&quot;vals&quot;, c(20, 23, 34, 19), envir = e1)\r\nls(e1)<\/code><\/pre>\n<pre><code>## [1] &quot;cumTotal&quot; &quot;mod&quot;      &quot;vals&quot;<\/code><\/pre>\n<pre class=\"r\"><code>ls() # list objects in Global Environment<\/code><\/pre>\n<pre><code>##  [1] &quot;d&quot;   &quot;e1&quot;  &quot;g&quot;   &quot;lst&quot; &quot;m&quot;   &quot;M&quot;   &quot;out&quot; &quot;x&quot;   &quot;x2&quot;  &quot;Xsq&quot;<\/code><\/pre>\n<p>We can access objects in our environment using the dollar sign operator or the <code>get()<\/code> and <code>mget()<\/code> functions.<\/p>\n<pre class=\"r\"><code>e1$cumTotal(c(2,4,6))<\/code><\/pre>\n<pre><code>## [1] 12<\/code><\/pre>\n<pre class=\"r\"><code>get(&quot;vals&quot;, envir = e1)<\/code><\/pre>\n<pre><code>## [1] 20 23 34 19<\/code><\/pre>\n<pre class=\"r\"><code>mget(c(&quot;mod&quot;, &quot;vals&quot;), envir = e1) # get more than one object<\/code><\/pre>\n<pre><code>## $mod\r\n## \r\n## Call:\r\n## lm(formula = dist ~ speed, data = cars)\r\n## \r\n## Coefficients:\r\n## (Intercept)        speed  \r\n##     -17.579        3.932  \r\n## \r\n## \r\n## $vals\r\n## [1] 20 23 34 19<\/code><\/pre>\n<p>We can save the environment and reload it in a future session.<\/p>\n<pre class=\"r\"><code>save(e1, file = &quot;e1.Rdata&quot;)\r\nrm(e1)\r\nload(file = &quot;e1.Rdata&quot;)<\/code><\/pre>\n<p>We can also change the environment associated with an object that was created in the Global Environment.<\/p>\n<pre class=\"r\"><code>f &lt;- function(x)(vals + 1000) # vals object defined in e1 environment\r\nenvironment(f) &lt;- e1\r\nf<\/code><\/pre>\n<pre><code>## function(x)(vals + 1000)\r\n## &lt;environment: 0x000001b5ec98d4e0&gt;<\/code><\/pre>\n<pre class=\"r\"><code>f()<\/code><\/pre>\n<pre><code>## [1] 1020 1023 1034 1019<\/code><\/pre>\n<p>Notice if we remove the environment using <code>rm()<\/code>, the function still remains in that environment and we have access to its objects<\/p>\n<pre class=\"r\"><code>rm(e1)\r\nf<\/code><\/pre>\n<pre><code>## function(x)(vals + 1000)\r\n## &lt;environment: 0x000001b5ec98d4e0&gt;<\/code><\/pre>\n<pre class=\"r\"><code>f()<\/code><\/pre>\n<pre><code>## [1] 1020 1023 1034 1019<\/code><\/pre>\n<p><code>rm(e1)<\/code> simply removes the <em>binding<\/em> between the symbol \u201ce1\u201d and structure that contains the objects. Since the environment can be reached as the environment of <code>f()<\/code>, it remains available.<\/p>\n<p><strong>Brackets and Dollar Signs<\/strong><\/p>\n<p>I found this sentence enlightening: \u201cOne way of describing the behavior of the single bracket operator is that the type of the return value matches the type of the value it is applied to.\u201d (p.\u00a028) I like this in favor of <a href=\"https:\/\/adv-r.hadley.nz\/subsetting.html#section\">metaphors involving trains<\/a>.<\/p>\n<pre class=\"r\"><code>lst &lt;- list(a1 = 1:5, b = c(&quot;d&quot;, &quot;g&quot;), c = 99)\r\nlst[&quot;a1&quot;] # returns a list<\/code><\/pre>\n<pre><code>## $a1\r\n## [1] 1 2 3 4 5<\/code><\/pre>\n<p><code>[[<\/code> and <code>$<\/code> extract single values.<\/p>\n<pre class=\"r\"><code>lst[[&quot;a1&quot;]]<\/code><\/pre>\n<pre><code>## [1] 1 2 3 4 5<\/code><\/pre>\n<pre class=\"r\"><code>lst$a1<\/code><\/pre>\n<pre><code>## [1] 1 2 3 4 5<\/code><\/pre>\n<p>The <code>$<\/code> operator supports partial matching.<\/p>\n<pre class=\"r\"><code>lst$a<\/code><\/pre>\n<pre><code>## [1] 1 2 3 4 5<\/code><\/pre>\n<p>The <code>[<\/code> and <code>[[<\/code> operators support expressions, but not partial matching.<\/p>\n<pre class=\"r\"><code>ans &lt;- &quot;c&quot;\r\nlst[ans]<\/code><\/pre>\n<pre><code>## $c\r\n## [1] 99<\/code><\/pre>\n<pre class=\"r\"><code>lst[[ans]]<\/code><\/pre>\n<pre><code>## [1] 99<\/code><\/pre>\n<p>If names are duplicated in named vectors, then only the value corresponding to the first one is returned when subsetting with brackets.<\/p>\n<pre class=\"r\"><code>x &lt;- c(&quot;a&quot; = 1, &quot;a&quot; = 2)\r\nx[&quot;a&quot;]<\/code><\/pre>\n<pre><code>## a \r\n## 1<\/code><\/pre>\n<p>The <code>%in%<\/code> operator can be useful to get all elements with the same name.<\/p>\n<pre class=\"r\"><code>x[names(x) %in% &quot;a&quot;]<\/code><\/pre>\n<pre><code>## a a \r\n## 1 2<\/code><\/pre>\n<p><strong>Matrix indexing<\/strong><\/p>\n<p>I don\u2019t work with arrays that often, but when I do I often forget that I can index them with a matrix. Below I extract the value in row 1, column 4, from each of the 3 layers of the iris3 array.<\/p>\n<pre class=\"r\"><code>m &lt;- matrix(c(1,4,1,\r\n              1,4,2,\r\n              1,4,3), \r\n            ncol = 3, byrow = TRUE)\r\niris3[m]<\/code><\/pre>\n<pre><code>## [1] 0.2 1.4 2.5<\/code><\/pre>\n<p>Of course we can get the same result (in this case) using subsetting indices.<\/p>\n<pre class=\"r\"><code>iris3[1,4,]<\/code><\/pre>\n<pre><code>##     Setosa Versicolor  Virginica \r\n##        0.2        1.4        2.5<\/code><\/pre>\n<p><strong>Negative subscripts<\/strong><\/p>\n<p>Negative subscripts can appear on the <em>left side<\/em> of assignment.<\/p>\n<pre class=\"r\"><code>x &lt;- 1:10\r\nx[-(2:4)] &lt;- 99\r\nx<\/code><\/pre>\n<pre><code>##  [1] 99  2  3  4 99 99 99 99 99 99<\/code><\/pre>\n<p><strong>Subsetting without dimensions<\/strong><\/p>\n<p>Use empty double brackets to select all elements and not change any attributes.<\/p>\n<pre class=\"r\"><code>x &lt;- matrix(10:1, ncol = 2)\r\nx<\/code><\/pre>\n<pre><code>##      [,1] [,2]\r\n## [1,]   10    5\r\n## [2,]    9    4\r\n## [3,]    8    3\r\n## [4,]    7    2\r\n## [5,]    6    1<\/code><\/pre>\n<pre class=\"r\"><code>x[] &lt;- sort(x)\r\nx<\/code><\/pre>\n<pre><code>##      [,1] [,2]\r\n## [1,]    1    6\r\n## [2,]    2    7\r\n## [3,]    3    8\r\n## [4,]    4    9\r\n## [5,]    5   10<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>I recently came across the book R Programming for Bioinformatics at my local library and decided to check it out&#8230;. <a class=\"read-more\" href=\"https:\/\/www.clayford.net\/statistics\/some-r-fundamentals\/\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13],"tags":[],"class_list":["post-908","post","type-post","status-publish","format-standard","hentry","category-using-r"],"_links":{"self":[{"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/posts\/908","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/comments?post=908"}],"version-history":[{"count":3,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/posts\/908\/revisions"}],"predecessor-version":[{"id":912,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/posts\/908\/revisions\/912"}],"wp:attachment":[{"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/media?parent=908"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/categories?post=908"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.clayford.net\/statistics\/wp-json\/wp\/v2\/tags?post=908"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}