Day 1 Part 1: Analysis with Tidyverse
Zero to Hero Bootcamp - Time Series Econometrics in R
Load Libraries
Load necessary libraries for data manipulation, finance data, and time-series analysis. These libraries are essential for handling, transforming, and analyzing time series data, providing tools for visualization, statistical analysis, and econometric modeling.
Download Data
Download and print data from FRED for selected series: GDP and CPIAUCNS (Consumer Price Index). The Federal Reserve Economic Data (FRED) repository provides a rich source of economic time series data, which we can use to perform exploratory analysis and model economic trends.
<- download_data("fred", series = c("GDP", "CPIAUCNS")) fred
No `start_date` or `end_date` provided. Returning the full data set.
# A tibble: 1,653 × 3
date value series
<date> <dbl> <chr>
1 1947-01-01 243. GDP
2 1947-04-01 246. GDP
3 1947-07-01 250. GDP
4 1947-10-01 260. GDP
5 1948-01-01 266. GDP
6 1948-04-01 273. GDP
7 1948-07-01 279. GDP
8 1948-10-01 280. GDP
9 1949-01-01 275. GDP
10 1949-04-01 271. GDP
# ℹ 1,643 more rows
Print and View Data
Print the data and view it in the viewer for a detailed look. Viewing the data in this manner allows us to understand the structure and content of the dataset, which is helpful for planning further analysis steps.
%>% View fred
Summary of Data
Display a summary glimpse of the FRED data. A quick overview of the dataset’s variables helps verify data types, spot missing values, and understand the overall data structure.
%>% glimpse() fred
Rows: 1,653
Columns: 3
$ date <date> 1947-01-01, 1947-04-01, 1947-07-01, 1947-10-01, 1948-01-01, 19…
$ value <dbl> 243.164, 245.968, 249.585, 259.745, 265.742, 272.567, 279.196, …
$ series <chr> "GDP", "GDP", "GDP", "GDP", "GDP", "GDP", "GDP", "GDP", "GDP", …
Filter for CPIAUCNS Series
Filter the FRED data to show only the CPIAUCNS series. Filtering for the Consumer Price Index series allows us to focus on inflation data, which is an important economic indicator for understanding price stability.
%>% filter(series == "CPIAUCNS") fred
# A tibble: 1,342 × 3
date value series
<date> <dbl> <chr>
1 1913-01-01 9.8 CPIAUCNS
2 1913-02-01 9.8 CPIAUCNS
3 1913-03-01 9.8 CPIAUCNS
4 1913-04-01 9.8 CPIAUCNS
5 1913-05-01 9.7 CPIAUCNS
6 1913-06-01 9.8 CPIAUCNS
7 1913-07-01 9.9 CPIAUCNS
8 1913-08-01 9.9 CPIAUCNS
9 1913-09-01 10 CPIAUCNS
10 1913-10-01 10 CPIAUCNS
# ℹ 1,332 more rows
Summary Statistics for CPIAUCNS
Calculate the earliest date, latest date, and mean value for the CPIAUCNS series. These summary statistics provide insights into the temporal coverage of the data and the average level of consumer prices over the observed period.
fred filter(series == "CPIAUCNS") %>%
# A tibble: 1 × 3
`min(date)` `max(date)` `mean(value)`
<date> <date> <dbl>
1 1913-01-01 2024-10-01 88.9
Provide an enhanced summary of CPIAUCNS with labeled output for minimum and maximum dates, and mean value. This labeled summary helps in clearly interpreting the results, which is useful for documentation and reporting.
fred filter(series == "CPIAUCNS") %>%
date_min = min(date),
date_max = max(date),
value_mean = mean(value)
# A tibble: 1 × 3
date_min date_max value_mean
<date> <date> <dbl>
1 1913-01-01 2024-10-01 88.9
Summary for All Series
Summarize minimum and maximum dates, and mean value for each series in the FRED data. Summarizing each series individually allows us to compare the temporal coverage and mean values, providing a comparative snapshot of the key metrics for GDP and CPI.
fred group_by(series) %>%
date_min = min(date),
date_max = max(date),
value_mean = mean(value)
# A tibble: 2 × 4
series date_min date_max value_mean
<chr> <date> <date> <dbl>
1 CPIAUCNS 1913-01-01 2024-10-01 88.9
2 GDP 1947-01-01 2024-07-01 7380.
Yearly Mean Value
Calculate the yearly mean value for each series in the FRED data. Calculating yearly means helps smooth out short-term fluctuations and reveal longer-term trends, which is important for econometric modeling.
fred group_by(series, year = year(date)) %>%
summarise(value_mean = mean(value))
`summarise()` has grouped output by 'series'. You can override using the
`.groups` argument.
# A tibble: 190 × 3
# Groups: series [2]
series year value_mean
<chr> <dbl> <dbl>
1 CPIAUCNS 1913 9.88
2 CPIAUCNS 1914 10.0
3 CPIAUCNS 1915 10.1
4 CPIAUCNS 1916 10.9
5 CPIAUCNS 1917 12.8
6 CPIAUCNS 1918 15.0
7 CPIAUCNS 1919 17.3
8 CPIAUCNS 1920 20.0
9 CPIAUCNS 1921 17.8
10 CPIAUCNS 1922 16.8
# ℹ 180 more rows
Pivot Yearly Mean Data
Pivot the yearly mean data to show each series as a column for easier comparison. Pivoting the data allows us to easily compare trends across different economic indicators, making it more efficient to analyze relationships between GDP and CPI.
fred group_by(series, year = year(date)) %>%
summarise(value_mean = mean(value)) %>%
id_cols = year,
names_from = series,
values_from = value_mean
`summarise()` has grouped output by 'series'. You can override using the
`.groups` argument.
# A tibble: 112 × 3
<dbl> <dbl> <dbl>
1 1913 9.88 NA
2 1914 10.0 NA
3 1915 10.1 NA
4 1916 10.9 NA
5 1917 12.8 NA
6 1918 15.0 NA
7 1919 17.3 NA
8 1920 20.0 NA
9 1921 17.8 NA
10 1922 16.8 NA
# ℹ 102 more rows
Summarize and Reshape Yearly Data
Summarize and reshape the data to show yearly averages for each series, then pivot to make each series a column and remove any missing data. Removing missing values after reshaping ensures a clean dataset, which is crucial for accurate analysis and avoids issues during statistical modeling.
fred_yearly %>%
fred group_by(series, year = year(date)) %>%
summarise(value_mean = mean(value)) %>%
id_cols = year,
names_from = series,
values_from = value_mean
) remove_missing()
`summarise()` has grouped output by 'series'. You can override using the
`.groups` argument.
Warning: Removed 34 rows containing missing values or values outside the scale
# A tibble: 78 × 3
<dbl> <dbl> <dbl>
1 1947 22.3 250.
2 1948 24.0 274.
3 1949 23.8 272.
4 1950 24.1 300.
5 1951 26.0 347.
6 1952 26.6 367.
7 1953 26.8 389.
8 1954 26.8 391.
9 1955 26.8 425.
10 1956 27.2 449.
# ℹ 68 more rows
Inspect GDP Data
Select and inspect the GDP data, organizing it by year. Inspecting GDP specifically allows us to focus on this key economic indicator, preparing it for further analysis such as trend examination and forecasting.
GDP %>%
fred_yearly select(year, GDP) %>%
Rows: 78
Columns: 2
$ year <dbl> 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957,…
$ GDP <dbl> 249.6155, 274.4678, 272.4753, 299.8272, 346.9133, 367.3408, 389.2…
Plot GDP Over Time
Convert GDP data to a time-series format and plot it over time. Visualizing GDP over time allows us to identify potential trends, cycles, or structural breaks, which is an important preliminary step before formal econometric modeling.
GDP as_tsibble(index=year) %>%
Logarithmic GDP Analysis
Convert GDP data to a time-series format, add a column with the log of GDP, and plot the logarithmic GDP over time to analyze growth trends. Taking the logarithm of GDP helps linearize exponential growth patterns, making it easier to interpret percentage changes and apply econometric models that assume linear relationships.
GDP as_tsibble(index=year) %>%
mutate(log_GDP = log(GDP)) %>%
autoplot(.vars = log_GDP)
In this exercise, you will download data from FRED and inspect it.
- Visit FRED Economic Data St. Louis FED
- Identify a time series that interests you and download it using the
function (e.g.,MSPUS
here) - Plot the time series and aggregate it to the yearly frequency using at least three of the useful summarise functions in dplyr
Throughout, use the print
, glimpse
and View
functions to keep track of your data.