Monthly Archives: October 2023

Some R Fundamentals

I recently came across the book R Programming for Bioinformatics at my local library and decided to check it out. I don’t do bioinformatics and the book is a little old (published in 2009), but I figured I would browse through it anyway. Chapter 2 is titled R Language Fundamentals. As I was flipping through it I found several little nuggets of information that I had either forgotten about over the years or never knew in the first place. I decided to document them here.

Variable names

Variable names cannot begin with a digit or underscore, and if they begin with a period they cannot be followed by a number. But we can bend these rules by quoting the names with backticks.

`_evil` <- "probably not wise"
`_evil`
## [1] "probably not wise"
`.666_number of the beast` <- sqrt(666^2)
`.666_number of the beast`
## [1] 666
rm(`_evil`, `.666_number of the beast`)

Attributes

Attributes can be attached to any R object except NULL. They can be useful for storing metadata among many other things. For example, add a source for a dataset.

d <- VADeaths
attr(d, "source") <- "Molyneaux, L., Gilliam, S. K., and Florant, L. C.(1947) Differences in Virginia death rates by color, sex, age, and rural or urban residence. American Sociological Review, 12, 525–535."

To see the source:

attr(d, "source")
## [1] "Molyneaux, L., Gilliam, S. K., and Florant, L. C.(1947) Differences in Virginia death rates by color, sex, age, and rural or urban residence. American Sociological Review, 12, 525–535."

To see all attributes of an object:

attributes(d)
## $dim
## [1] 5 4
## 
## $dimnames
## $dimnames[[1]]
## [1] "50-54" "55-59" "60-64" "65-69" "70-74"
## 
## $dimnames[[2]]
## [1] "Rural Male"   "Rural Female" "Urban Male"   "Urban Female"
## 
## 
## $source
## [1] "Molyneaux, L., Gilliam, S. K., and Florant, L. C.(1947) Differences in Virginia death rates by color, sex, age, and rural or urban residence. American Sociological Review, 12, 525–535."

To remove an attribute:

attr(d, "source") <- NULL

Not all attributes are displayed when called on an object. For example, after fitting a linear model, it appears there are only two attributes.

m <- lm(dist ~ speed, data = cars)
attributes(m)
## $names
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "xlevels"       "call"          "terms"         "model"        
## 
## $class
## [1] "lm"

However, elements of the model object also have attributes. For example, the terms element has 10 attributes.

out <- attributes(m$terms)
length(out)
## [1] 10
names(out)
##  [1] "variables"    "factors"      "term.labels"  "order"        "intercept"   
##  [6] "response"     "class"        ".Environment" "predvars"     "dataClasses"
attr(m$terms, "factors")
##       speed
## dist      0
## speed     1

The colon operator

I often forget the colon operator can work with decimal values.

2.5:10.5
## [1]  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5

And can go backwards:

10.2:1.2
##  [1] 10.2  9.2  8.2  7.2  6.2  5.2  4.2  3.2  2.2  1.2

zero length vectors

The sum of zero length vector is 0, but the product of a zero length vector is 1.

x <- numeric()
length(x)
## [1] 0
sum(x)
## [1] 0
prod(x)
## [1] 1

This is ensures expected behavior when working with sums and products:

# 12 + 0
sum(12, x)
## [1] 12
# 12 * 1
prod(12, x)
## [1] 12

.Machine

The .Machine variable holds information about the numerical characteristics of your machine. For example, the largest integer my machine can represent:

.Machine$integer.max
## [1] 2147483647

If I add 1 to that, the result is numeric, not an integer.

x <- .Machine$integer.max
x2 <- x + 1
is.integer(x2)
## [1] FALSE

If I add 1L (an explicit integer) to that, the result is a warning and a NA. My machine cannot represent that integer.

x2 <- x + 1L
## Warning in x + 1L: NAs produced by integer overflow
x2
## [1] NA

Recoding factors

There are several convenience functions in other packages for recoding variables such as recode in the {car} package, case_when in {dplyr}, and a bunch of functions in the {forcats} package. But it’s good to remember how to use base R to recode factors. Create a list with the recoding definitions and assign to the levels of the factor.

g <- sample(letters[1:5], 30, replace = TRUE)
g <- factor(g)
g
##  [1] e c d c d c b a e c a d e e d b e c b c b c b d d c b a d e
## Levels: a b c d e

Put “a” and “b” into one group, “c” and “d” into another group, and keep “e” in it’s own group.

lst <- list("A" = c("a", "b"),
            "B" = c("c", "d"),
            "C" = "e")
levels(g) <- lst
g
##  [1] C B B B B B A A C B A B C C B A C B A B A B A B B B A A B C
## Levels: A B C

If we like we can add an attribute to store the definition.

attr(g, "recoding") <- c("A = {ab}, B = {cd}, C = {e}")
g
##  [1] C B B B B B A A C B A B C C B A C B A B A B A B B B A A B C
## attr(,"recoding")
## [1] A = {ab}, B = {cd}, C = {e}
## Levels: A B C

lists can have dimensions

Something more interesting than applicable is that lists can have dimensions.

M <- as.table(rbind(c(762, 327, 468), c(484, 239, 477)))
Xsq <- chisq.test(M) # produces 9 element list
Xsq <- unclass(Xsq) # remove htest class
dim(Xsq) <- c(3,3)
Xsq
##      [,1]         [,2]                         [,3]     
## [1,] 30.07015     "Pearson's Chi-squared test" numeric,6
## [2,] 2            "M"                          table,6  
## [3,] 2.953589e-07 table,6                      table,6
Xsq[1,3]
## [[1]]
##          A        B        C
## A 703.6714 319.6453 533.6834
## B 542.3286 246.3547 411.3166

Environments

We are not restricted to creating objects in the Global Environment. We can create our own environments using the new.env() function and then create objects in that environment. We can use the dollar sign operator or the assign() function.

e1 <- new.env()
e1$mod <- lm(dist ~ speed, data = cars)
e1$cumTotal <- function(x)tail(cumsum(x), n = 1)
assign("vals", c(20, 23, 34, 19), envir = e1)
ls(e1)
## [1] "cumTotal" "mod"      "vals"
ls() # list objects in Global Environment
##  [1] "d"   "e1"  "g"   "lst" "m"   "M"   "out" "x"   "x2"  "Xsq"

We can access objects in our environment using the dollar sign operator or the get() and mget() functions.

e1$cumTotal(c(2,4,6))
## [1] 12
get("vals", envir = e1)
## [1] 20 23 34 19
mget(c("mod", "vals"), envir = e1) # get more than one object
## $mod
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Coefficients:
## (Intercept)        speed  
##     -17.579        3.932  
## 
## 
## $vals
## [1] 20 23 34 19

We can save the environment and reload it in a future session.

save(e1, file = "e1.Rdata")
rm(e1)
load(file = "e1.Rdata")

We can also change the environment associated with an object that was created in the Global Environment.

f <- function(x)(vals + 1000) # vals object defined in e1 environment
environment(f) <- e1
f
## function(x)(vals + 1000)
## <environment: 0x000001b5ec98d4e0>
f()
## [1] 1020 1023 1034 1019

Notice if we remove the environment using rm(), the function still remains in that environment and we have access to its objects

rm(e1)
f
## function(x)(vals + 1000)
## <environment: 0x000001b5ec98d4e0>
f()
## [1] 1020 1023 1034 1019

rm(e1) simply removes the binding between the symbol “e1” and structure that contains the objects. Since the environment can be reached as the environment of f(), it remains available.

Brackets and Dollar Signs

I found this sentence enlightening: “One way of describing the behavior of the single bracket operator is that the type of the return value matches the type of the value it is applied to.” (p. 28) I like this in favor of metaphors involving trains.

lst <- list(a1 = 1:5, b = c("d", "g"), c = 99)
lst["a1"] # returns a list
## $a1
## [1] 1 2 3 4 5

[[ and $ extract single values.

lst[["a1"]]
## [1] 1 2 3 4 5
lst$a1
## [1] 1 2 3 4 5

The $ operator supports partial matching.

lst$a
## [1] 1 2 3 4 5

The [ and [[ operators support expressions, but not partial matching.

ans <- "c"
lst[ans]
## $c
## [1] 99
lst[[ans]]
## [1] 99

If names are duplicated in named vectors, then only the value corresponding to the first one is returned when subsetting with brackets.

x <- c("a" = 1, "a" = 2)
x["a"]
## a 
## 1

The %in% operator can be useful to get all elements with the same name.

x[names(x) %in% "a"]
## a a 
## 1 2

Matrix indexing

I don’t work with arrays that often, but when I do I often forget that I can index them with a matrix. Below I extract the value in row 1, column 4, from each of the 3 layers of the iris3 array.

m <- matrix(c(1,4,1,
              1,4,2,
              1,4,3), 
            ncol = 3, byrow = TRUE)
iris3[m]
## [1] 0.2 1.4 2.5

Of course we can get the same result (in this case) using subsetting indices.

iris3[1,4,]
##     Setosa Versicolor  Virginica 
##        0.2        1.4        2.5

Negative subscripts

Negative subscripts can appear on the left side of assignment.

x <- 1:10
x[-(2:4)] <- 99
x
##  [1] 99  2  3  4 99 99 99 99 99 99

Subsetting without dimensions

Use empty double brackets to select all elements and not change any attributes.

x <- matrix(10:1, ncol = 2)
x
##      [,1] [,2]
## [1,]   10    5
## [2,]    9    4
## [3,]    8    3
## [4,]    7    2
## [5,]    6    1
x[] <- sort(x)
x
##      [,1] [,2]
## [1,]    1    6
## [2,]    2    7
## [3,]    3    8
## [4,]    4    9
## [5,]    5   10