Day 1 Part 1: Analysis with Tidyverse

Zero to Hero Bootcamp - Time Series Econometrics in R

Author

Dr Christian Engels

Last updated

November 16, 2024

Load Libraries

Load necessary libraries for data manipulation, finance data, and time-series analysis. These libraries are essential for handling, transforming, and analyzing time series data, providing tools for visualization, statistical analysis, and econometric modeling.

library(tidyverse)
library(tidyfinance)
library(tsibble)
library(fable)

Download Data

Download and print data from FRED for selected series: GDP and CPIAUCNS (Consumer Price Index). The Federal Reserve Economic Data (FRED) repository provides a rich source of economic time series data, which we can use to perform exploratory analysis and model economic trends.

fred <- download_data("fred", series = c("GDP", "CPIAUCNS"))
No `start_date` or `end_date` provided. Returning the full data set.
fred
# A tibble: 1,653 × 3
   date       value series
   <date>     <dbl> <chr> 
 1 1947-01-01  243. GDP   
 2 1947-04-01  246. GDP   
 3 1947-07-01  250. GDP   
 4 1947-10-01  260. GDP   
 5 1948-01-01  266. GDP   
 6 1948-04-01  273. GDP   
 7 1948-07-01  279. GDP   
 8 1948-10-01  280. GDP   
 9 1949-01-01  275. GDP   
10 1949-04-01  271. GDP   
# ℹ 1,643 more rows

Summary of Data

Display a summary glimpse of the FRED data. A quick overview of the dataset’s variables helps verify data types, spot missing values, and understand the overall data structure.

fred %>% glimpse()
Rows: 1,653
Columns: 3
$ date   <date> 1947-01-01, 1947-04-01, 1947-07-01, 1947-10-01, 1948-01-01, 19…
$ value  <dbl> 243.164, 245.968, 249.585, 259.745, 265.742, 272.567, 279.196, …
$ series <chr> "GDP", "GDP", "GDP", "GDP", "GDP", "GDP", "GDP", "GDP", "GDP", …

Filter for CPIAUCNS Series

Filter the FRED data to show only the CPIAUCNS series. Filtering for the Consumer Price Index series allows us to focus on inflation data, which is an important economic indicator for understanding price stability.

fred %>% filter(series == "CPIAUCNS")
# A tibble: 1,342 × 3
   date       value series  
   <date>     <dbl> <chr>   
 1 1913-01-01   9.8 CPIAUCNS
 2 1913-02-01   9.8 CPIAUCNS
 3 1913-03-01   9.8 CPIAUCNS
 4 1913-04-01   9.8 CPIAUCNS
 5 1913-05-01   9.7 CPIAUCNS
 6 1913-06-01   9.8 CPIAUCNS
 7 1913-07-01   9.9 CPIAUCNS
 8 1913-08-01   9.9 CPIAUCNS
 9 1913-09-01  10   CPIAUCNS
10 1913-10-01  10   CPIAUCNS
# ℹ 1,332 more rows

Summary Statistics for CPIAUCNS

Calculate the earliest date, latest date, and mean value for the CPIAUCNS series. These summary statistics provide insights into the temporal coverage of the data and the average level of consumer prices over the observed period.

fred %>% 
  filter(series == "CPIAUCNS") %>% 
  summarise(
    min(date), 
    max(date), 
    mean(value)
  )
# A tibble: 1 × 3
  `min(date)` `max(date)` `mean(value)`
  <date>      <date>              <dbl>
1 1913-01-01  2024-10-01           88.9

Provide an enhanced summary of CPIAUCNS with labeled output for minimum and maximum dates, and mean value. This labeled summary helps in clearly interpreting the results, which is useful for documentation and reporting.

fred %>% 
  filter(series == "CPIAUCNS") %>% 
  summarise(
    date_min = min(date),
    date_max = max(date),
    value_mean = mean(value)
  )
# A tibble: 1 × 3
  date_min   date_max   value_mean
  <date>     <date>          <dbl>
1 1913-01-01 2024-10-01       88.9

Summary for All Series

Summarize minimum and maximum dates, and mean value for each series in the FRED data. Summarizing each series individually allows us to compare the temporal coverage and mean values, providing a comparative snapshot of the key metrics for GDP and CPI.

fred %>% 
  group_by(series) %>% 
  summarise(
    date_min = min(date),
    date_max = max(date),
    value_mean = mean(value)
  )
# A tibble: 2 × 4
  series   date_min   date_max   value_mean
  <chr>    <date>     <date>          <dbl>
1 CPIAUCNS 1913-01-01 2024-10-01       88.9
2 GDP      1947-01-01 2024-07-01     7380. 

Yearly Mean Value

Calculate the yearly mean value for each series in the FRED data. Calculating yearly means helps smooth out short-term fluctuations and reveal longer-term trends, which is important for econometric modeling.

fred %>% 
  group_by(series, year = year(date)) %>% 
  summarise(value_mean = mean(value))
`summarise()` has grouped output by 'series'. You can override using the
`.groups` argument.
# A tibble: 190 × 3
# Groups:   series [2]
   series    year value_mean
   <chr>    <dbl>      <dbl>
 1 CPIAUCNS  1913       9.88
 2 CPIAUCNS  1914      10.0 
 3 CPIAUCNS  1915      10.1 
 4 CPIAUCNS  1916      10.9 
 5 CPIAUCNS  1917      12.8 
 6 CPIAUCNS  1918      15.0 
 7 CPIAUCNS  1919      17.3 
 8 CPIAUCNS  1920      20.0 
 9 CPIAUCNS  1921      17.8 
10 CPIAUCNS  1922      16.8 
# ℹ 180 more rows

Pivot Yearly Mean Data

Pivot the yearly mean data to show each series as a column for easier comparison. Pivoting the data allows us to easily compare trends across different economic indicators, making it more efficient to analyze relationships between GDP and CPI.

fred %>% 
  group_by(series, year = year(date)) %>% 
  summarise(value_mean = mean(value)) %>% 
  pivot_wider(
    id_cols = year,
    names_from = series,
    values_from = value_mean
  )
`summarise()` has grouped output by 'series'. You can override using the
`.groups` argument.
# A tibble: 112 × 3
    year CPIAUCNS   GDP
   <dbl>    <dbl> <dbl>
 1  1913     9.88    NA
 2  1914    10.0     NA
 3  1915    10.1     NA
 4  1916    10.9     NA
 5  1917    12.8     NA
 6  1918    15.0     NA
 7  1919    17.3     NA
 8  1920    20.0     NA
 9  1921    17.8     NA
10  1922    16.8     NA
# ℹ 102 more rows

Summarize and Reshape Yearly Data

Summarize and reshape the data to show yearly averages for each series, then pivot to make each series a column and remove any missing data. Removing missing values after reshaping ensures a clean dataset, which is crucial for accurate analysis and avoids issues during statistical modeling.

fred_yearly <- 
  fred %>% 
  group_by(series, year = year(date)) %>% 
  summarise(value_mean = mean(value)) %>% 
  pivot_wider(
    id_cols = year,
    names_from = series,
    values_from = value_mean
  ) %>% 
  remove_missing()
`summarise()` has grouped output by 'series'. You can override using the
`.groups` argument.
Warning: Removed 34 rows containing missing values or values outside the scale
range.
fred_yearly
# A tibble: 78 × 3
    year CPIAUCNS   GDP
   <dbl>    <dbl> <dbl>
 1  1947     22.3  250.
 2  1948     24.0  274.
 3  1949     23.8  272.
 4  1950     24.1  300.
 5  1951     26.0  347.
 6  1952     26.6  367.
 7  1953     26.8  389.
 8  1954     26.8  391.
 9  1955     26.8  425.
10  1956     27.2  449.
# ℹ 68 more rows

Inspect GDP Data

Select and inspect the GDP data, organizing it by year. Inspecting GDP specifically allows us to focus on this key economic indicator, preparing it for further analysis such as trend examination and forecasting.

GDP <- 
  fred_yearly %>% 
  select(year, GDP) %>% 
  glimpse()
Rows: 78
Columns: 2
$ year <dbl> 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957,…
$ GDP  <dbl> 249.6155, 274.4678, 272.4753, 299.8272, 346.9133, 367.3408, 389.2…

Plot GDP Over Time

Convert GDP data to a time-series format and plot it over time. Visualizing GDP over time allows us to identify potential trends, cycles, or structural breaks, which is an important preliminary step before formal econometric modeling.

GDP %>% 
  as_tsibble(index=year) %>% 
  autoplot(.vars=GDP)

Logarithmic GDP Analysis

Convert GDP data to a time-series format, add a column with the log of GDP, and plot the logarithmic GDP over time to analyze growth trends. Taking the logarithm of GDP helps linearize exponential growth patterns, making it easier to interpret percentage changes and apply econometric models that assume linear relationships.

GDP %>% 
  as_tsibble(index=year) %>% 
  mutate(log_GDP = log(GDP)) %>% 
  autoplot(.vars = log_GDP)

Exercises

In this exercise, you will download data from FRED and inspect it.

  1. Visit FRED Economic Data St. Louis FED
  2. Identify a time series that interests you and download it using the data_download function (e.g., MSPUS here)
  3. Plot the time series and aggregate it to the yearly frequency using at least three of the useful summarise functions in dplyr

Throughout, use the print, glimpse and View functions to keep track of your data.