Day 15: Backtest II
Want to share your content on python-bloggers? click here.
On Day 14 we showed how the trading model we built was snooping and provided one way to correct it. Essentially, we ensure the time in which we actually have the target variable data aligns with when the trading signals are produced. We then used the value of the next time step to input into the model to generate a forecast. If the forecast was positive, we’d go long the SPY ETF, if negative stay out of the market or short depending on the strategy. Results were decidedly worse than the snooped model. But, compared to buy-and-hold, they were not poop the bed horrible, though still underperforming. To refresh our memories, we plot the cumulative graph again below.
The strategy underperforms buy-and-hold by about 25% points. However, its Sharpe Ratio is about 600bps higher at 36% – nice, but nothing to write home about. We’ll forego a broader analysis as we presented on Day 3. Some readers may be wondering why the heck would you use the time stamp directly after the last training step when it’s clearly 11 weeks old? Glad you asked. It does indeed seem stale at best, silly at worst. We wanted to show it for completeness of comparison. A likely better input is the most recent time stamp. That is, the model is trained on lookback returns whose forward returns are indeed 12-weeks ahead, as opposed those that mostly already occurred. When we finally get to that 12th week to train the model, we can turnaround and use the lookback data from the most recently completed week to input into the model to generate a prediction.
Let’s do that now and graph the result below.
Certainly more of the result we were looking for! Here the long-only strategy outperforms buy-and-hold by 10% points. Long-short is even better. Critically, long-only’s Sharpe Ratio is over 20% points higher; long-short’s is about 600bps better. This definitely warrants further investigation and comparison to our benchmarks, which delve into tomorrow.Stay tuned!
Code below.
# Built using Python 3.10.19 and a virtual environment # Load libraries import pandas as pd import numpy as np from datetime import datetime, timedelta import statsmodels.api as sm import matplotlib.pyplot as plt import yfinance as yf plt.style.use('seaborn-v0_8') plt.rcParams['figure.figsize'] = (14,8) # Function to get data def get_spy_weekly_data() -> pd.DataFrame: df = yf.download('SPY', start='2000-01-01', end='2024-10-01') df.columns = ['open', 'high', 'low', 'close', 'adj close', 'volume'] df.index.name = 'date' # Create training set and downsample to weekly ending Friday df_train = df.loc[:'2019-01-01', 'adj close'].copy() df_w = pd.DataFrame(df_train.resample('W-FRI').last()) df_w.columns = ['price'] return df_w # Get data df_w = get_spy_weekly_data() # Create momentum dictionary periods = [3, 6, 9, 12] momo_dict = {} for back in periods: for forward in periods: df_out = df_w.copy() df_out['ret_back'] = np.log(df_out['price']/df_out['price'].shift(back)) df_out['ret_for'] = np.log(df_out['price'].shift(-forward)/df_out['price']) df_out = df_out.dropna() mod = sm.OLS(df_out['ret_for'], sm.add_constant(df_out['ret_back'])).fit() momo_dict[f"{back} - {forward}"] = {'data': df_out, 'params': mod.params, 'pvalues': mod.pvalues} # Prepare model model_name = '12 - 12' mod_look_forward = 12 train_pd = 5 test_pd = 1 tot_pd = train_pd + test_pd # Create trading dataframes for Day 14 and Day 15 df_trade_14 = momo_dict[model_name]['data'].copy() df_trade_15 = momo_dict[model_name]['data'].copy() # Run Day 14 model with train/forecast steps trade_pred_14 = [] for i in range(tot_pd, len(df_trade_14)+1, test_pd): train_df = df_trade_14.iloc[i-tot_pd:i-test_pd, 1:] test_df = df_trade_14.iloc[i-test_pd:i, 1:] # Ensure 'ret_back' is 2D by selecting it as a DataFrame, not a Series X_train = sm.add_constant(train_df[['ret_back']]) if test_df.shape[0] > 1: X_test = sm.add_constant(test_df[['ret_back']]) else: X_test = sm.add_constant(test_df[['ret_back']], has_constant='add') # Fit the model mod_run = sm.OLS(train_df['ret_for'], X_train).fit() # Predict using the test data mod_pred = mod_run.predict(X_test).values trade_pred_14.extend(mod_pred) # Add predictions to dataframe # Snooped predictions. Pad = train_pd # df_trade['pred'] = np.concatenate((np.repeat(np.nan,train_pd), np.array(trade_pred))) # Non-snooped. Pad = mod_look_forward + train_pd df_trade_14['pred'] = np.concatenate((np.repeat(np.nan, mod_look_forward + train_pd - 1), np.array(trade_pred_14[:-(mod_look_forward - 1)]))) # Generate returns df_trade_14['ret'] = np.log(df_trade_14['price']/df_trade_14['price'].shift(1)) # Generate signals df_trade_14['signal'] = np.where(df_trade_14['pred'] == np.nan, np.nan, np.where(df_trade_14['pred'] > 0, 1, 0)) df_trade_14['signal_sh'] = np.where(df_trade_14['pred'] == np.nan, np.nan, np.where(df_trade_14['pred'] >= 0, 1, -1)) # Generate strategy returns df_trade_14['strat_ret'] = df_trade_14['signal'].shift(1) * df_trade_14['ret'] df_trade_14['strat_ret_sh'] = df_trade_14['signal_sh'].shift(1) * df_trade_14['ret'] # Plot cumulative performance plot for long-only and long-short for Day 14 model fig, (ax1, ax2) = plt.subplots(2,1) top = df_trade_14[['strat_ret', 'ret']].cumsum() bottom = df_trade_14[['strat_ret_sh', 'ret']].cumsum() ax1.plot(top.index, top.values*100) ax1.set_xlabel("") ax1.set_ylabel("Return (%)") ax1.legend(['Strategy', 'Buy-and-Hold'], loc="upper left") ax1.set_title("Cumulative returns: long-only") ax2.plot(bottom.index, bottom.values*100) ax2.set_xlabel("") ax2.set_ylabel("Return (%)") ax2.legend(['Strategy', 'Buy-and-Hold'], loc="upper left") ax2.set_title("Cumulative returns: long-short") plt.show() # Run model with train/forecast steps with revised forecast using Day 15 dataframe trade_pred_15 = [] for i in range(tot_pd, len(df_trade_15)+1, test_pd): train_df = df_trade_15.iloc[i-tot_pd:i-test_pd, 1:] test_df = df_trade_15.iloc[i-test_pd+mod_look_forward-1:i-test_pd+mod_look_forward, 1:] # Ensure 'ret_back' is 2D by selecting it as a DataFrame, not a Series X_train = sm.add_constant(train_df[['ret_back']]) if test_df.shape[0] > 1: X_test = sm.add_constant(test_df[['ret_back']]) else: X_test = sm.add_constant(test_df[['ret_back']], has_constant='add') # Fit the model mod_run = sm.OLS(train_df['ret_for'], X_train).fit() # Predict using the test data mod_pred = mod_run.predict(X_test).values trade_pred_15.extend(mod_pred) # Add predictions to dataframe # Same as in Day 14 but test_df is moved forward in for loop df_trade_15['pred'] = np.concatenate((np.repeat(np.nan, mod_look_forward + train_pd - 1), np.array(trade_pred_15))) # Generate returns df_trade_15['ret'] = np.log(df_trade_15['price']/df_trade_15['price'].shift(1)) # Generate signals df_trade_15['signal'] = np.where(df_trade_15['pred'] == np.nan, np.nan, np.where(df_trade_15['pred'] > 0, 1, 0)) df_trade_15['signal_sh'] = np.where(df_trade_15['pred'] == np.nan, np.nan, np.where(df_trade_15['pred'] >= 0, 1, -1)) # Generate strategy returns df_trade_15['strat_ret'] = df_trade_15['signal'].shift(1) * df_trade_15['ret'] df_trade_15['strat_ret_sh'] = df_trade_15['signal_sh'].shift(1) * df_trade_15['ret'] # Plot cumulative performance plot for long-only and long-short fig, (ax1, ax2) = plt.subplots(2,1) top = df_trade_15[['strat_ret', 'ret']].cumsum() bottom = df_trade_15[['strat_ret_sh', 'ret']].cumsum() ax1.plot(top.index, top.values*100) ax1.set_xlabel("") ax1.set_ylabel("Return (%)") ax1.legend(['Strategy', 'Buy-and-Hold'], loc="upper left") ax1.set_title("Cumulative returns: long-only") ax2.plot(bottom.index, bottom.values*100) ax2.set_xlabel("") ax2.set_ylabel("Return (%)") ax2.legend(['Strategy', 'Buy-and-Hold'], loc="upper left") ax2.set_title("Cumulative returns: long-short") plt.show()
Want to share your content on python-bloggers? click here.