A Note on Random Walks

A random walk is a time series constructed as follows

\[Y_t = Y_{t-1} + e_t \]

where the \(e_t\) are independent and identically distributed random variables with mean 0 and a fixed variance, \(\sigma^2\). The initial condition is \(Y_1 = e_1\). Here’s one way to construct a single random walk in R with \(\sigma^2 = 4\) for \(t\) ranging from 1 to 100.

y <- numeric(100)
y[1] <- rnorm(n = 1, mean = 0, sd = 2)
for(i in 2:100){
  y[i] <- y[i - 1] + rnorm(n = 1, mean = 0, sd = 2)
}
plot(y, type = "o")

A more efficient way to simulate this in R is using the cumsum() function.

e <- rnorm(n = 100, mean = 0, sd = 2)
y <- cumsum(e)
plot(y, type = "o")

Mean

The mean at any time \(t\) in a random walk is 0. To illustrate, let’s generate 100,000 random walks.

rw <- function(t = 100){
  e <- rnorm(n = t, mean = 0, sd = 2)
  y <- cumsum(e)
  y
}

rwalks <- replicate(n = 1e5, rw())
# each column is a random walk
dim(rwalks)
## [1]    100 100000

Each column is a random walk. To get the mean at time 10, we take the mean of the 10th row. Here I calculate the mean and 95% confidence interval. Notice it’s close to 0.

list(mean = mean(rwalks[10,]), ci = t.test(rwalks[10,])$conf.int)
## $mean
## [1] -0.02851296
## 
## $ci
## [1] -0.06770377  0.01067786
## attr(,"conf.level")
## [1] 0.95

The mean at time 85 is also about 0.

list(mean = mean(rwalks[85,]), ci = t.test(rwalks[85,])$conf.int)
## $mean
## [1] 0.0426798
## 
## $ci
## [1] -0.07166066  0.15702027
## attr(,"conf.level")
## [1] 0.95

Variance

The variance at time t increases linearly with time:

\[\text{Var}(Y_t) = t\sigma_{t}^2\]

From our random walks above, the variance at time = 10 should be about 10 * 4 = 40.

var(rwalks[10,])
## [1] 39.98176

Likewise, the variance at time = 85 should be about 85 * 4 = 340.

var(rwalks[85,])
## [1] 340.3245

Covariance

Since each time point is independent of all other time points, the covariance between any two time points, say t and s, such that \(1 \le t \le s\), is also \(t\sigma_{e}^2\).

The covariance between time points 10 and 11 should be about 10 * 4 = 40.

cov(rwalks[10,], rwalks[11,])
## [1] 39.92179

Likewise, the covariance between time points 10 and 85 should also be about 10 * 4 = 40.

cov(rwalks[10,], rwalks[85,])
## [1] 40.19847

Autocorrelation

The autocorrelation between two time points, t and s, such that \(1 \le t \le s\) is as follows:

\[\rho_{t, s} = \frac{\text{Cov}(Y_t,Y_s)}{\sqrt{\text{Cov}(Y_t,Y_t), \text{Cov}(Y_s,Y_s)}}\]

\[\rho_{t, s} = \frac{t\sigma_{e}^2}{\sqrt{t\sigma_{e}^2 s\sigma_{e}^2}} \]

Which simplifies to:

\[\rho_{t, s} = \frac{t\sigma_{e}^2}{\sqrt{t\sigma_{e}^2 s\sigma_{e}^2}} \frac{\sqrt{t\sigma_{e}^2}}{\sqrt{t\sigma_{e}^2}} = \frac{t\sigma_{e}^2\sqrt{ t\sigma_{e}^2}}{t\sigma_{e}^2\sqrt{s\sigma_{e}^2}} = \sqrt{\frac{t}{s}}\]

Therefore the autocorrelation between time points 10 and 11 should theoretically be \(\sqrt{10/11} \approx 0.953\).

cor(rwalks[10,], rwalks[11,])
## [1] 0.9532757

Likewise, the the autocorrelation between time points 10 and 85 should theoretically be \(\sqrt{10/85} \approx 0.343\).

cor(rwalks[10,], rwalks[85,])
## [1] 0.3446132

The values of Y at neighboring time points are more and more strongly and positively correlated as time goes by.

cor(rwalks[1,], rwalks[2,])
## [1] 0.7072922
cor(rwalks[20,], rwalks[21,])
## [1] 0.9758825
cor(rwalks[50,], rwalks[51,])
## [1] 0.9902071

We can visualize this using a simple for loop:

corrs <- numeric(99)
for(i in 1:99){
  corrs[i] <- cor(rwalks[i,], rwalks[i + 1,])
}
plot(corrs, type = "l")

The values of y at distant time points are less and less correlated

cor(rwalks[1,], rwalks[2,])
## [1] 0.7072922
cor(rwalks[1,], rwalks[12,])
## [1] 0.2909879
cor(rwalks[1,], rwalks[30,])
## [1] 0.1827475

Again, pretty easy to visualize:

corrs <- numeric(99)
for(i in 1:99){
  corrs[i] <- cor(rwalks[1,], rwalks[i + 1,])
}
plot(corrs, type = "l")

Since variance increases over time and correlation between values nearby in time are close to 1, we should expect long excursions from the mean level of 0. That’s precisely what we see in these nine random walks.

op <- par(mfrow = c(3,3), mar = c(2,2,1,2) + 0.1)
for(i in 1:9) plot(rwalks[,i], type = "l", ylab = "")

par(op)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.