ahead forecasting (v0.10.0): fast time series model calibration and Python plots
Want to share your content on python-bloggers? click here.
ahead
is an R, Python and Julia package for univariate and multivariate time series forecasting with uncertainty quantification (including predictive simulation). Matlab is next. The aim is always to make these implementations of ahead
as similar as possible, but that’s a looot of work as you might guess. Do not hesitate to submit a pull request to these repositories if you want to help and contribute. You can also signal bugs (constructively).
The main changes in ahead
’s v0.10.0
are:
- Fast calibration of
ridge2f
using a Leave-One-Out Cross-Validation (LOOCV) trick (R for now) - Univariate forecasting with
ridge2f
with external regressors - Plotting functionality in Python version
Read on for more details. You can also use this notebook if you want to reproduce the experiments.
1. Fast calibration of ridge2f
:
The fast calibration of ridge2f
uses a remarkable result available for linear models’ LOOCV:
\[LOOCV error = \frac{1}{n} \sum_{i=1}^n\left(y_i-\hat{f}_{-i}\left(\mathbf{z}_i\right)\right)^2 = \frac{1}{n} \sum_{i=1}^n\left(\frac{y_i-\hat{f}\left(\mathbf{z}_i\right)}{1-\mathbf{S}_{i i}}\right)^2\]
Where \(\hat{f}_{-i}\left(\mathbf{z}_i\right)\) is a model statistical learning model $f$ fitted without the \(i\)-th observation, \(\mathbf{z}_i\) is the \(i\)-th observation, and \(\mathbf{S}_{i i}\) is the $i$-th diagonal element of the hat matrix (a.k.a smoother, a matrix so that \(\hat{y} = \mathbf{S} y\)). LOOCV is valid for time series data because contrary to other cross-validation techniques, it does not break the sequential structure of the inputs. Keep in mind, however, that time series cross-vaidation will give different results.
The LOOCV result can be extended further and approximated by Generalized Cross-Validation (GCV), still with a closed-form formula available. GCV is used in ahead::ridge2f
. Indeed, even though ridge2
is not – strictly speaking – linear, it possesses the structure of a ridge regression model with 2 regularization parameters; a regularization parameter for the original explanative variables, and a regularization parameter for new, engineered features. These engineered features transform the linear model into a non-linear one.
In the following R example, it’s worth mentioning that only the 2 regularization parameters are calibrated. Other hyperparameters such as the number of time series lags or the number of nodes in the hidden layer are set to their default values. Feel free try and share other experiments including them.
%load_ext rpy2.ipython
%%R options(repos = c( techtonique = 'https://techtonique.r-universe.dev', CRAN = 'https://cloud.r-project.org')) install.packages("ahead") install.packages("ggplot2") install.packages("forecast") install.packages("dfoptim")
%%R library(ahead) library(dfoptim)
# Convert to R DataFrame %R -i df
%R df_ts <- as.ts(df)
%%R objective_function <- function(xx) { ahead::loocvridge2f(df_ts, h = 20, type_pi="blockbootstrap", lambda_1=10^xx[1], lambda_2=10^xx[2], show_progress = FALSE, )$loocv } start <- proc.time()[3] (opt <- dfoptim::nmkb(fn=objective_function, lower=c(-10,-10), upper=c(10,10), par=c(0.1, 0.1))) print(proc.time()[3]-start)
elapsed 6.657
%%R start <- proc.time()[3] res <- ahead::ridge2f(df_ts, h = 20, type_pi="blockbootstrap", lambda_1=10^opt$par[1], # 'optimal' parameters lambda_2=10^opt$par[2]) # 'optimal' parameters print(proc.time()[3]-start)
|======================================================================| 100% elapsed 0.805
%%R par(mfrow=c(2, 2)) plot(res, "realgovt", type = "sims", main="50 predictive simulations \n of realgovt") plot(res, "realgovt", type = "dist", main="predictive distribution \n of realgovt") plot(res, "realgdp", type = "sims", main="50 predictive simulations \n of realgdp") plot(res, "realgdp", type = "dist", main="predictive distribution \n of realgovt")
2. Univariate forecasting with ridge2f
with external regressors:
ridge2f
was initially designed for multivariate time series forecasting. Nothing prevents it from being used for univariate forecasting. This functionality is now available in ahead
(R for now). In addition, as in the multivariate case, it is possible to use external regressors in the forecasting process.
%%R library(ggplot2) library(forecast) x <- fdeaths xreg <- ahead::createtrendseason(x) (z <- ahead::ridge2f(x, xreg = xreg, h=20)) autoplot(z)
3. Plotting functionality in Python version:
Install and import packages
!pip install ahead
import numpy as np import pandas as pd import statsmodels.api as sm from time import time from ahead import Ridge2Regressor # may take some time to run, ONLY the 1st time it's run from statsmodels.tsa.base.datetools import dates_from_str from time import time
Import data
# some example data mdata = sm.datasets.macrodata.load_pandas().data # prepare the dates index dates = mdata[['year', 'quarter']].astype(int).astype(str) quarterly = dates["year"] + "Q" + dates["quarter"] quarterly = dates_from_str(quarterly) mdata = mdata[['realgovt', 'tbilrate', 'cpi', 'realgdp']] mdata.index = pd.DatetimeIndex(quarterly) df = np.log(mdata).diff().dropna() display(df)
realgovt | tbilrate | cpi | realgdp | |
---|---|---|---|---|
1959-06-30 | 0.023664 | 0.088193 | 0.005849 | 0.024942 |
1959-09-30 | 0.020481 | 0.215321 | 0.006838 | -0.001193 |
1959-12-31 | -0.014781 | 0.125317 | 0.000681 | 0.003495 |
1960-03-31 | -0.046197 | -0.212805 | 0.005772 | 0.022190 |
1960-06-30 | -0.003900 | -0.266946 | 0.000338 | -0.004685 |
… | … | … | … | … |
2008-09-30 | 0.031005 | -0.396881 | -0.007904 | -0.006781 |
2008-12-31 | 0.015732 | -2.277267 | -0.021979 | -0.013805 |
2009-03-31 | -0.010967 | 0.606136 | 0.002340 | -0.016612 |
2009-06-30 | 0.026975 | -0.200671 | 0.008419 | -0.001851 |
2009-09-30 | 0.019888 | -0.405465 | 0.008894 | 0.006862 |
202 rows × 4 columns
regr = Ridge2Regressor(h = 20, date_formatting = "original", type_pi="blockbootstrap", # type of simulations for predictive inference B=50, # number of simulations seed=1) start = time() regr.forecast(df) print(f"Elapsed: {time() - start} s")
|======================================================================| 100% Elapsed: 0.49063634872436523 s
regr2 = Ridge2Regressor(h = 20, date_formatting = "original", type_pi="gaussian", # Gaussian prediction intervals seed=1) start = time() regr2.forecast(df) print(f"Elapsed: {time() - start} s")
Elapsed: 0.038446664810180664 s
regr.plot("realgovt")
regr.plot("realgdp")
regr2.plot("realgovt")
regr2.plot("realgdp")
Want to share your content on python-bloggers? click here.