Want to share your content on python-bloggers? click here.
Last week, I talked about an AutoML method for regression and classification implemented in Python package nnetsauce
. This week, my post is about the same AutoML method, applied this time to multivariate time series (MTS) forecasting.
In the examples below, keep in mind that VAR (Vector Autoregression) and VECM (Vector Error Correction Model) forecasting models aren’t thoroughly trained here. nnetsauce.MTS
isn’t really tuned either; this is just a demo. To finish, a probabilistic error metric (instead of the Root Mean Squared Error, RMSE) is better suited for models capturing forecasting uncertainty.
Contents
- 1 – Install
- 2 – MTS
- 2 – 1 nnetsauce.MTS
- 2 – 2 statsmodels VAR
- 2 – 3 statsmodels VECM
1 – Install
!pip install git+https://github.com/Techtonique/nnetsauce.git@lazy-predict
import nnetsauce as ns import numpy as np import pandas as pd import statsmodels.api as sm from statsmodels.tsa.api import VAR from statsmodels.tsa.base.datetools import dates_from_str from sklearn.linear_model import LassoCV from statsmodels.tsa.api import VAR from sklearn.metrics import mean_squared_error from statsmodels.tsa.vector_ar.vecm import VECM, select_order from statsmodels.tsa.base.datetools import dates_from_str
2 – MTS
Macro data
# some example data mdata = sm.datasets.macrodata.load_pandas().data # prepare the dates index dates = mdata[['year', 'quarter']].astype(int).astype(str) quarterly = dates["year"] + "Q" + dates["quarter"] quarterly = dates_from_str(quarterly) mdata = mdata[['realgovt', 'tbilrate']] mdata.index = pd.DatetimeIndex(quarterly) data = np.log(mdata).diff().dropna() display(data)
df = data df.index.rename('date') idx_train = int(df.shape[0]*0.8) idx_end = df.shape[0] df_train = df.iloc[0:idx_train,] df_test = df.iloc[idx_train:idx_end,] regr_mts = ns.LazyMTS(verbose=1, ignore_warnings=True, custom_metric=None, lags = 1, n_hidden_features=3, n_clusters=0, random_state=1) models, predictions = regr_mts.fit(df_train, df_test) model_dictionary = regr_mts.provide_models(df_train, df_test)
display(models)
RMSE | MAE | MPL | Time Taken | |
---|---|---|---|---|
Model | ||||
LassoCV | 0.22 | 0.12 | 0.06 | 0.20 |
ElasticNetCV | 0.22 | 0.12 | 0.06 | 0.19 |
LassoLarsCV | 0.22 | 0.12 | 0.06 | 0.08 |
LarsCV | 0.22 | 0.12 | 0.06 | 0.08 |
DummyRegressor | 0.22 | 0.12 | 0.06 | 0.06 |
ElasticNet | 0.22 | 0.12 | 0.06 | 0.07 |
LassoLars | 0.22 | 0.12 | 0.06 | 0.06 |
Lasso | 0.22 | 0.12 | 0.06 | 0.07 |
ExtraTreeRegressor | 0.22 | 0.14 | 0.07 | 0.12 |
KNeighborsRegressor | 0.22 | 0.12 | 0.06 | 0.09 |
SVR | 0.22 | 0.12 | 0.06 | 0.13 |
HistGradientBoostingRegressor | 0.23 | 0.13 | 0.06 | 0.79 |
NuSVR | 0.23 | 0.13 | 0.06 | 0.20 |
ExtraTreesRegressor | 0.24 | 0.13 | 0.07 | 0.87 |
GradientBoostingRegressor | 0.24 | 0.13 | 0.07 | 0.25 |
RandomForestRegressor | 0.26 | 0.16 | 0.08 | 2.06 |
AdaBoostRegressor | 0.28 | 0.19 | 0.10 | 0.45 |
DecisionTreeRegressor | 0.28 | 0.18 | 0.09 | 0.06 |
BaggingRegressor | 0.28 | 0.19 | 0.10 | 0.20 |
GaussianProcessRegressor | 8.26 | 5.90 | 2.95 | 0.17 |
BayesianRidge | 11774168792.68 | 3129885640.50 | 1564942820.25 | 0.08 |
TweedieRegressor | 1066305878860.67 | 263521546472.00 | 131760773236.00 | 0.12 |
LassoLarsIC | 10841414830181.57 | 2665022282527.50 | 1332511141263.75 | 0.08 |
PassiveAggressiveRegressor | 200205325611502239744.00 | 40689888595970097152.00 | 20344944297985048576.00 | 0.17 |
SGDRegressor | 1383750703550277812748288.00 | 269310062772019343130624.00 | 134655031386009671565312.00 | 0.13 |
LinearSVR | 6205416599219790202011648.00 | 1189414936788171753521152.00 | 594707468394085876760576.00 | 0.06 |
OrthogonalMatchingPursuitCV | 18588484112627753604349952.00 | 3542235944300533382119424.00 | 1771117972150266691059712.00 | 0.23 |
OrthogonalMatchingPursuit | 18588484112627753604349952.00 | 3542235944300533382119424.00 | 1771117972150266691059712.00 | 0.20 |
HuberRegressor | 50554040814422644093913571262464.00 | 9061839427591544042390898606080.00 | 4530919713795772021195449303040.00 | 0.09 |
RidgeCV | 1788858960353426286932811384356864.00 | 317940467527547291488891451736064.00 | 158970233763773645744445725868032.00 | 0.23 |
RANSACRegressor | 352805899757804849079011831705501696.00 | 61914238966205227684888230708117504.00 | 30957119483102613842444115354058752.00 | 1.44 |
LinearRegression | 13408548756595947978849418193194188800.00 | 2316276205868561893698967459810246656.00 | 1158138102934280946849483729905123328.00 | 0.06 |
TransformedTargetRegressor | 13408548756595947978849418193194188800.00 | 2316276205868561893698967459810246656.00 | 1158138102934280946849483729905123328.00 | 0.11 |
Lars | 13408548756596845228481163425784791040.00 | 2316276205868715960905471081985343488.00 | 1158138102934357980452735540992671744.00 | 0.08 |
Ridge | 27935786184657480745080678989281886208.00 | 4824713257018197525713060327109689344.00 | 2412356628509098762856530163554844672.00 | 0.12 |
KernelRidge | 27935786184685139645570846501298503680.00 | 4824713257022931107816326787730767872.00 | 2412356628511465553908163393865383936.00 | 0.09 |
MLPRegressor | 64247413650209509837810706524366567768365621314… | 10088348458681313437051396009759695398571807517… | 50441742293406567185256980048798476992859037587… | 0.42 |
model_dictionary['LassoCV']
MTS(n_clusters=0, n_hidden_features=3, obj=LassoCV(random_state=1), seed='mean')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.