AutoML in nnetsauce (randomized and quasi-randomized nnetworks) Pt.2: multivariate time series forecasting
Want to share your content on python-bloggers? click here.
Last week, I talked about an AutoML method for regression and classification implemented in Python package nnetsauce
. This week, my post is about the same AutoML method, applied this time to multivariate time series (MTS) forecasting.
In the examples below, keep in mind that VAR (Vector Autoregression) and VECM (Vector Error Correction Model) forecasting models aren’t thoroughly trained here. nnetsauce.MTS
isn’t really tuned either; this is just a demo. To finish, a probabilistic error metric (instead of the Root Mean Squared Error, RMSE) is better suited for models capturing forecasting uncertainty.
Contents
- 1 – Install
- 2 – MTS
- 2 – 1 nnetsauce.MTS
- 2 – 2 statsmodels VAR
- 2 – 3 statsmodels VECM
1 – Install
!pip install git+https://github.com/Techtonique/nnetsauce.git@lazy-predict
import nnetsauce as ns import numpy as np import pandas as pd import statsmodels.api as sm from statsmodels.tsa.api import VAR from statsmodels.tsa.base.datetools import dates_from_str from sklearn.linear_model import LassoCV from statsmodels.tsa.api import VAR from sklearn.metrics import mean_squared_error from statsmodels.tsa.vector_ar.vecm import VECM, select_order from statsmodels.tsa.base.datetools import dates_from_str
2 – MTS
Macro data
# some example data mdata = sm.datasets.macrodata.load_pandas().data # prepare the dates index dates = mdata[['year', 'quarter']].astype(int).astype(str) quarterly = dates["year"] + "Q" + dates["quarter"] quarterly = dates_from_str(quarterly) mdata = mdata[['realgovt', 'tbilrate']] mdata.index = pd.DatetimeIndex(quarterly) data = np.log(mdata).diff().dropna() display(data)
df = data df.index.rename('date') idx_train = int(df.shape[0]*0.8) idx_end = df.shape[0] df_train = df.iloc[0:idx_train,] df_test = df.iloc[idx_train:idx_end,] regr_mts = ns.LazyMTS(verbose=1, ignore_warnings=True, custom_metric=None, lags = 1, n_hidden_features=3, n_clusters=0, random_state=1) models, predictions = regr_mts.fit(df_train, df_test) model_dictionary = regr_mts.provide_models(df_train, df_test)
display(models)
RMSE | MAE | MPL | Time Taken | |
---|---|---|---|---|
Model | ||||
LassoCV | 0.22 | 0.12 | 0.06 | 0.20 |
ElasticNetCV | 0.22 | 0.12 | 0.06 | 0.19 |
LassoLarsCV | 0.22 | 0.12 | 0.06 | 0.08 |
LarsCV | 0.22 | 0.12 | 0.06 | 0.08 |
DummyRegressor | 0.22 | 0.12 | 0.06 | 0.06 |
ElasticNet | 0.22 | 0.12 | 0.06 | 0.07 |
LassoLars | 0.22 | 0.12 | 0.06 | 0.06 |
Lasso | 0.22 | 0.12 | 0.06 | 0.07 |
ExtraTreeRegressor | 0.22 | 0.14 | 0.07 | 0.12 |
KNeighborsRegressor | 0.22 | 0.12 | 0.06 | 0.09 |
SVR | 0.22 | 0.12 | 0.06 | 0.13 |
HistGradientBoostingRegressor | 0.23 | 0.13 | 0.06 | 0.79 |
NuSVR | 0.23 | 0.13 | 0.06 | 0.20 |
ExtraTreesRegressor | 0.24 | 0.13 | 0.07 | 0.87 |
GradientBoostingRegressor | 0.24 | 0.13 | 0.07 | 0.25 |
RandomForestRegressor | 0.26 | 0.16 | 0.08 | 2.06 |
AdaBoostRegressor | 0.28 | 0.19 | 0.10 | 0.45 |
DecisionTreeRegressor | 0.28 | 0.18 | 0.09 | 0.06 |
BaggingRegressor | 0.28 | 0.19 | 0.10 | 0.20 |
GaussianProcessRegressor | 8.26 | 5.90 | 2.95 | 0.17 |
BayesianRidge | 11774168792.68 | 3129885640.50 | 1564942820.25 | 0.08 |
TweedieRegressor | 1066305878860.67 | 263521546472.00 | 131760773236.00 | 0.12 |
LassoLarsIC | 10841414830181.57 | 2665022282527.50 | 1332511141263.75 | 0.08 |
PassiveAggressiveRegressor | 200205325611502239744.00 | 40689888595970097152.00 | 20344944297985048576.00 | 0.17 |
SGDRegressor | 1383750703550277812748288.00 | 269310062772019343130624.00 | 134655031386009671565312.00 | 0.13 |
LinearSVR | 6205416599219790202011648.00 | 1189414936788171753521152.00 | 594707468394085876760576.00 | 0.06 |
OrthogonalMatchingPursuitCV | 18588484112627753604349952.00 | 3542235944300533382119424.00 | 1771117972150266691059712.00 | 0.23 |
OrthogonalMatchingPursuit | 18588484112627753604349952.00 | 3542235944300533382119424.00 | 1771117972150266691059712.00 | 0.20 |
HuberRegressor | 50554040814422644093913571262464.00 | 9061839427591544042390898606080.00 | 4530919713795772021195449303040.00 | 0.09 |
RidgeCV | 1788858960353426286932811384356864.00 | 317940467527547291488891451736064.00 | 158970233763773645744445725868032.00 | 0.23 |
RANSACRegressor | 352805899757804849079011831705501696.00 | 61914238966205227684888230708117504.00 | 30957119483102613842444115354058752.00 | 1.44 |
LinearRegression | 13408548756595947978849418193194188800.00 | 2316276205868561893698967459810246656.00 | 1158138102934280946849483729905123328.00 | 0.06 |
TransformedTargetRegressor | 13408548756595947978849418193194188800.00 | 2316276205868561893698967459810246656.00 | 1158138102934280946849483729905123328.00 | 0.11 |
Lars | 13408548756596845228481163425784791040.00 | 2316276205868715960905471081985343488.00 | 1158138102934357980452735540992671744.00 | 0.08 |
Ridge | 27935786184657480745080678989281886208.00 | 4824713257018197525713060327109689344.00 | 2412356628509098762856530163554844672.00 | 0.12 |
KernelRidge | 27935786184685139645570846501298503680.00 | 4824713257022931107816326787730767872.00 | 2412356628511465553908163393865383936.00 | 0.09 |
MLPRegressor | 64247413650209509837810706524366567768365621314… | 10088348458681313437051396009759695398571807517… | 50441742293406567185256980048798476992859037587… | 0.42 |
model_dictionary['LassoCV']
MTS(n_clusters=0, n_hidden_features=3, obj=LassoCV(random_state=1), seed='mean')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
MTS(n_clusters=0, n_hidden_features=3, obj=LassoCV(random_state=1), seed='mean')
LassoCV(random_state=1)
LassoCV(random_state=1)
2 – 1 – nnetsauce.MTS
regr = ns.MTS(obj = LassoCV(random_state=1), lags = 1, n_hidden_features=3, n_clusters=0, replications = 250, kernel = "gaussian", verbose = 1)
regr.fit(df_train)
Adjusting LassoCV to multivariate time series... 100%|██████████| 2/2 [00:00<00:00, 6.22it/s] Simulate residuals using gaussian kernel... Best parameters for gaussian kernel: {'bandwidth': 0.04037017258596558}
MTS(kernel='gaussian', n_clusters=0, n_hidden_features=3, obj=LassoCV(random_state=1), replications=250, verbose=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
MTS(kernel='gaussian', n_clusters=0, n_hidden_features=3, obj=LassoCV(random_state=1), replications=250, verbose=1)
LassoCV(random_state=1)
LassoCV(random_state=1)
res = regr.predict(h=df_test.shape[0], level=95)
100%|██████████| 250/250 [00:00<00:00, 3686.16it/s] 100%|██████████| 250/250 [00:00<00:00, 6971.82it/s]
regr.plot("realgovt") regr.plot("tbilrate")
2 – 2 – VAR
model = VAR(df_train) results = model.fit(maxlags=5, ic='aic') lag_order = results.k_ar VAR_preds = results.forecast(df_train.values[-lag_order:], df_test.shape[0])
results.plot_forecast(steps = df_test.shape[0]);
2 – 3 – VECM
model = VECM(df_train, k_ar_diff=2, coint_rank=2) vecm_res = model.fit() vecm_res.gamma.round(4) vecm_res.summary() vecm_res.predict(steps=df_test.shape[0]) forecast, lower, upper = vecm_res.predict(df_test.shape[0], 0.05)
vecm_res.plot_forecast(steps = df_test.shape[0])
out-of-sample errors
display([("nnetsauce.MTS+"+models.index[i], models["RMSE"].iloc[i]) for i in range(3)]) display(('VAR', mean_squared_error(df_test.values, VAR_preds, squared=False))) display(('VECM', mean_squared_error(df_test.values, forecast, squared=False)))
[('nnetsauce.MTS+LassoCV', 0.22102547609924011), ('nnetsauce.MTS+ElasticNetCV', 0.22103106562991648), ('nnetsauce.MTS+LassoLarsCV', 0.22103468506703655)] ('VAR', 0.22128770514262763) ('VECM', 0.22170093788693065)
Want to share your content on python-bloggers? click here.
Copyright © 2024 | MH Corporate basic by MH Themes