Day 1 Part 3: Random Walks

Author

Dr Christian Engels

Last updated

November 16, 2024

Load Libraries

Load necessary libraries for data manipulation, finance data, and time-series analysis. These libraries will enable us to generate, visualize, and analyze random walks, which are fundamental to understanding stochastic processes in time series econometrics.

library(tidyverse)
library(tidyfinance)
library(tsibble)
library(fable)
library(feasts)
library(scales)
library(fixest)

Illustrating a Random Walk

Generate and visualize a single random walk. A random walk is a simple but important stochastic process used in time series analysis to model variables whose changes are entirely unpredictable. Here, we simulate a single random walk to understand its behavior.

Mathematically, a random walk can be represented as:

y_t = y_{t-1} + e_t

where e_t is white noise, typically assumed to be normally distributed with mean zero and constant variance. The difference between consecutive values, y_t - y_{t-1} = e_t, is white noise, which implies that the changes in a random walk are completely random and unpredictable.

steps <- 100
e <- rnorm(steps)
y <- e
coef <- 1
drift = 0

plot(e, type="l")

for (i in 2:steps) {
  y[i] <- drift + coef*y[i-1] + e[i]
}

plot(y, type="l")

Multiple Random Walks

Generate and visualize multiple random walks. Generating multiple random walks allows us to explore the variability between different realizations of the process, illustrating the concept of stochasticity and highlighting the random nature of these paths.

n_random_walks <- 30

create_random_walk <- 
  function(random_walk_no, steps = 100, n_random_walks = 30) {
    
    innovations <- 
      tibble(
        random_walk_no, 
        step = 1:steps,
        e = rnorm(steps)
      ) 
    
    random_walk <- 
      innovations %>% 
        mutate(random_walk = cumsum(e))
    
    return(random_walk)
  }

random_walks <- 
  1:n_random_walks %>%  
    map(~create_random_walk(.)) %>% 
    list_rbind()

random_walks %>% 
  mutate(random_walk_no = factor(random_walk_no, 1:n_random_walks)) %>% 
  ggplot(aes(x = step, y = random_walk, color = random_walk_no)) +
  geom_line()

Analysis of a Single Random Walk

Analyze a single random walk with a lag variable. Here, we analyze the properties of a single random walk by estimating a regression with a lagged variable. This helps us understand if past values of the walk can predict future values, which is key to determining whether the series is truly a random walk.

random_walk_1 <- 
  random_walks %>% 
  filter(random_walk_no == 1) %>% 
  mutate(random_walk_lag = lag(random_walk)) %>% 
  remove_missing() %>% 
  print()
Warning: Removed 1 row containing missing values or values outside the scale
range.
# A tibble: 99 × 5
   random_walk_no  step       e random_walk random_walk_lag
            <int> <int>   <dbl>       <dbl>           <dbl>
 1              1     2 -0.395      -1.06           -0.667 
 2              1     3  0.206      -0.855          -1.06  
 3              1     4  1.15        0.299          -0.855 
 4              1     5 -0.0244      0.275           0.299 
 5              1     6 -1.07       -0.794           0.275 
 6              1     7  0.806       0.0124         -0.794 
 7              1     8 -0.516      -0.504           0.0124
 8              1     9  0.414      -0.0899         -0.504 
 9              1    10  0.128       0.0379         -0.0899
10              1    11  0.297       0.335           0.0379
# ℹ 89 more rows

Next, we estimate the following regression equation:

y_t = \beta y_{t-1} + \epsilon_t

where y_t is the value of the random walk at time t, y_{t-1} is the lagged value at time t-1, \beta is the estimated coefficient, and \epsilon_t is the residual (error term). In this case, there is no intercept term, which is why the regression formula includes -1 to remove the intercept.

est <- feols(random_walk ~ random_walk_lag - 1, data = random_walk_1)
etable(est)
                              est
Dependent Var.:       random_walk
                                 
random_walk_lag 1.013*** (0.0228)
_______________ _________________
S.E. type                     IID
Observations                   99
R2                        0.88212
Adj. R2                   0.88212
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note that the estimate of \beta is less than one due to the finite sample bias associated with unit root processes. This bias, often called the “small sample bias,” arises because of the non-stationary nature of a unit root process, causing the estimated \beta to be slightly lower than its true value of 1 in most realizations.

Monte Carlo Simulation of Random Walks

Simulate a large number of random walks and estimate coefficients. Monte Carlo simulations allow us to repeatedly generate random walks and estimate their coefficients to understand the distribution of these coefficients. This is important for understanding the statistical properties of random walks and verifying theoretical expectations.

random_walks <- 
  1:10^3 %>%  
  map(~create_random_walk(.)) %>% 
  list_rbind() %>% 
  group_by(random_walk_no) %>% 
  mutate(random_walk_lag = lag(random_walk)) %>% 
  remove_missing() %>% 
  nest() %>% 
  mutate(
    est = map(data, ~feols(random_walk ~ random_walk_lag - 1, data = .)),
    coef = map(est, ~.$coefficients)
  ) %>% 
  unnest(cols = coef) %>% 
  ungroup()
Warning: Removed 1000 rows containing missing values or values outside the
scale range.
random_walks
# A tibble: 1,000 × 4
   random_walk_no data              est       coef
            <int> <list>            <list>   <dbl>
 1              1 <tibble [99 × 4]> <fixest> 0.930
 2              2 <tibble [99 × 4]> <fixest> 1.01 
 3              3 <tibble [99 × 4]> <fixest> 0.895
 4              4 <tibble [99 × 4]> <fixest> 0.971
 5              5 <tibble [99 × 4]> <fixest> 0.985
 6              6 <tibble [99 × 4]> <fixest> 0.995
 7              7 <tibble [99 × 4]> <fixest> 0.990
 8              8 <tibble [99 × 4]> <fixest> 0.979
 9              9 <tibble [99 × 4]> <fixest> 0.973
10             10 <tibble [99 × 4]> <fixest> 1.01 
# ℹ 990 more rows

Summary of Coefficients

Summarize and visualize the distribution of estimated coefficients. Summarizing the coefficients from the Monte Carlo simulations allows us to visualize the typical behavior of the estimated relationships in random walks, such as their average and variability. This can help us draw conclusions about the nature of randomness and persistence in these series.

coef_mean <- random_walks %>% summarise(coef_mean = mean(coef)) %>% pull(coef_mean)

random_walks %>% 
  ggplot(aes(coef)) + 
  geom_density() + 
  geom_vline(xintercept = coef_mean, linetype = "dashed", color = "red") +
  geom_vline(xintercept = 1, linetype = "twodash", color = "navy") +
  scale_x_continuous(
    breaks = c(coef_mean, 1),
    labels = c("coef_mean" = "mean estimate", "1" = "1 (true)")
  )

Exercises

Adapt the following code for the process

y_t = \alpha + \beta y_{t-1} + \epsilon_{t}

and plot the process for each of these cases. Pick your own values for \alpha and \beta that satisfy these cases.

  1. Set \alpha>0 and \beta=1
  2. Set \alpha<0 and \beta=1
  3. Set \alpha<0 and -1 < \beta < 1
  4. Set \alpha=0 and \beta>1
n_steps <- 100
epsilon <- rnorm(n_steps)
y <- rep(0, n_steps)
beta <- 1
alpha <- 0

for (i in 2:steps) {
  y[i] <- alpha + beta*y[i-1] + epsilon[i]
}

plot(y, type="l")

Process 3. is “explosive” and 4. is “convergent”. From the plots you have created, note down observations why this is the case.