library(tidyverse)
library(tidyfinance)
library(tsibble)
library(fable)
library(feasts)
library(scales)
library(fixest)
Day 1 Part 3: Random Walks
Load Libraries
Load necessary libraries for data manipulation, finance data, and time-series analysis. These libraries will enable us to generate, visualize, and analyze random walks, which are fundamental to understanding stochastic processes in time series econometrics.
Illustrating a Random Walk
Generate and visualize a single random walk. A random walk is a simple but important stochastic process used in time series analysis to model variables whose changes are entirely unpredictable. Here, we simulate a single random walk to understand its behavior.
Mathematically, a random walk can be represented as:
y_t = y_{t-1} + e_t
where e_t is white noise, typically assumed to be normally distributed with mean zero and constant variance. The difference between consecutive values, y_t - y_{t-1} = e_t, is white noise, which implies that the changes in a random walk are completely random and unpredictable.
<- 100
steps <- rnorm(steps)
e <- e
y <- 1
coef = 0
drift
plot(e, type="l")
for (i in 2:steps) {
<- drift + coef*y[i-1] + e[i]
y[i]
}
plot(y, type="l")
Multiple Random Walks
Generate and visualize multiple random walks. Generating multiple random walks allows us to explore the variability between different realizations of the process, illustrating the concept of stochasticity and highlighting the random nature of these paths.
<- 30
n_random_walks
<-
create_random_walk function(random_walk_no, steps = 100, n_random_walks = 30) {
<-
innovations tibble(
random_walk_no, step = 1:steps,
e = rnorm(steps)
)
<-
random_walk %>%
innovations mutate(random_walk = cumsum(e))
return(random_walk)
}
<-
random_walks 1:n_random_walks %>%
map(~create_random_walk(.)) %>%
list_rbind()
%>%
random_walks mutate(random_walk_no = factor(random_walk_no, 1:n_random_walks)) %>%
ggplot(aes(x = step, y = random_walk, color = random_walk_no)) +
geom_line()
Analysis of a Single Random Walk
Analyze a single random walk with a lag variable. Here, we analyze the properties of a single random walk by estimating a regression with a lagged variable. This helps us understand if past values of the walk can predict future values, which is key to determining whether the series is truly a random walk.
<-
random_walk_1 %>%
random_walks filter(random_walk_no == 1) %>%
mutate(random_walk_lag = lag(random_walk)) %>%
remove_missing() %>%
print()
Warning: Removed 1 row containing missing values or values outside the scale
range.
# A tibble: 99 × 5
random_walk_no step e random_walk random_walk_lag
<int> <int> <dbl> <dbl> <dbl>
1 1 2 -0.395 -1.06 -0.667
2 1 3 0.206 -0.855 -1.06
3 1 4 1.15 0.299 -0.855
4 1 5 -0.0244 0.275 0.299
5 1 6 -1.07 -0.794 0.275
6 1 7 0.806 0.0124 -0.794
7 1 8 -0.516 -0.504 0.0124
8 1 9 0.414 -0.0899 -0.504
9 1 10 0.128 0.0379 -0.0899
10 1 11 0.297 0.335 0.0379
# ℹ 89 more rows
Next, we estimate the following regression equation:
y_t = \beta y_{t-1} + \epsilon_t
where y_t is the value of the random walk at time t, y_{t-1} is the lagged value at time t-1, \beta is the estimated coefficient, and \epsilon_t is the residual (error term). In this case, there is no intercept term, which is why the regression formula includes -1 to remove the intercept.
<- feols(random_walk ~ random_walk_lag - 1, data = random_walk_1)
est etable(est)
est
Dependent Var.: random_walk
random_walk_lag 1.013*** (0.0228)
_______________ _________________
S.E. type IID
Observations 99
R2 0.88212
Adj. R2 0.88212
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note that the estimate of \beta is less than one due to the finite sample bias associated with unit root processes. This bias, often called the “small sample bias,” arises because of the non-stationary nature of a unit root process, causing the estimated \beta to be slightly lower than its true value of 1 in most realizations.
Monte Carlo Simulation of Random Walks
Simulate a large number of random walks and estimate coefficients. Monte Carlo simulations allow us to repeatedly generate random walks and estimate their coefficients to understand the distribution of these coefficients. This is important for understanding the statistical properties of random walks and verifying theoretical expectations.
<-
random_walks 1:10^3 %>%
map(~create_random_walk(.)) %>%
list_rbind() %>%
group_by(random_walk_no) %>%
mutate(random_walk_lag = lag(random_walk)) %>%
remove_missing() %>%
nest() %>%
mutate(
est = map(data, ~feols(random_walk ~ random_walk_lag - 1, data = .)),
coef = map(est, ~.$coefficients)
%>%
) unnest(cols = coef) %>%
ungroup()
Warning: Removed 1000 rows containing missing values or values outside the
scale range.
random_walks
# A tibble: 1,000 × 4
random_walk_no data est coef
<int> <list> <list> <dbl>
1 1 <tibble [99 × 4]> <fixest> 0.930
2 2 <tibble [99 × 4]> <fixest> 1.01
3 3 <tibble [99 × 4]> <fixest> 0.895
4 4 <tibble [99 × 4]> <fixest> 0.971
5 5 <tibble [99 × 4]> <fixest> 0.985
6 6 <tibble [99 × 4]> <fixest> 0.995
7 7 <tibble [99 × 4]> <fixest> 0.990
8 8 <tibble [99 × 4]> <fixest> 0.979
9 9 <tibble [99 × 4]> <fixest> 0.973
10 10 <tibble [99 × 4]> <fixest> 1.01
# ℹ 990 more rows
Summary of Coefficients
Summarize and visualize the distribution of estimated coefficients. Summarizing the coefficients from the Monte Carlo simulations allows us to visualize the typical behavior of the estimated relationships in random walks, such as their average and variability. This can help us draw conclusions about the nature of randomness and persistence in these series.
<- random_walks %>% summarise(coef_mean = mean(coef)) %>% pull(coef_mean)
coef_mean
%>%
random_walks ggplot(aes(coef)) +
geom_density() +
geom_vline(xintercept = coef_mean, linetype = "dashed", color = "red") +
geom_vline(xintercept = 1, linetype = "twodash", color = "navy") +
scale_x_continuous(
breaks = c(coef_mean, 1),
labels = c("coef_mean" = "mean estimate", "1" = "1 (true)")
)
Exercises
Adapt the following code for the process
y_t = \alpha + \beta y_{t-1} + \epsilon_{t}
and plot the process for each of these cases. Pick your own values for \alpha and \beta that satisfy these cases.
- Set \alpha>0 and \beta=1
- Set \alpha<0 and \beta=1
- Set \alpha<0 and -1 < \beta < 1
- Set \alpha=0 and \beta>1
<- 100
n_steps <- rnorm(n_steps)
epsilon <- rep(0, n_steps)
y <- 1
beta <- 0
alpha
for (i in 2:steps) {
<- alpha + beta*y[i-1] + epsilon[i]
y[i]
}
plot(y, type="l")
Process 3. is “explosive” and 4. is “convergent”. From the plots you have created, note down observations why this is the case.