Day 17: Drawdowns
Want to share your content on python-bloggers? click here.
On Day 16, we showed the adjusted 12-by-12 strategy with full performance metrics against buy-and-hold, the 60-40 SPY-IEF ETF portfolio, and the 200-day SMA strategy. In all cases, it tended to perform better than the benchmarks. However, against the 200-day SMA that performance came primarily at the end of the period. This begs the question of what to make of the performance differences between the 12-by-12 and 200-day SMA strategies.
The 12-by-12 strategy derived most of its outperformance from the last three years of the backtest. For a good portion, it actually underperformed the 200-day SMA. Additionally, the difference in risk-adjusted returns between the two strategies, as judged by the Sharpe Ratio, was de minimis. On this account, we’ll need to drill down into drawdowns to see if that favors one strategy over the over.
When we look at the magnitude of top 5 drawdowns, we see that the 12-by-12 strategy certainly exceeds the 200-day SMA by almost 14% points. However, for the remaining periods it is below the 200-day. The average difference of maximum drawdowns in the top 5 periods is a less than 1% point.
Next let’s look at drawdown duration. Again, the 12-by-12 massively exceeds the 200-day for the top spot. But the 200-day beats the 12-by-12 for the remainder. In this case, the 12-by-12 is decidedly worse off, as the average difference in drawdown periods is about 132 days greater than the 200-day.
Let’s take stock. While the 12-by-12 modestly outperforms on a cumulative performance basis, it’s risk-adjusted return is not significantly different from the 200-day1. The 12-by-12 has a single drawdown that exceeds the 200-day, but the differences are not meaningful on average. Finally, the 12-by-12 does suffer a single drawdown period that far exceeds the 200-day individually and over the average. Should we favor the 12-by-12 or stick with Hello World
– the 200-day?
Before we answer that question, it is important to note that the main periods of underperformance and outperformance for the 12-by-12 come at the beginning and end of the analysis timeframe. Notably, the aftermath of the Tech bubble (e.g., underperformance) and the halcyon years prior to Covid-19. Recall, we stopped our analysis prior to 2019 to have around five years of data once we’d found a strategy that seemed attractive to test out-of-sample. So should we weight this barbell performance equally with the intermediate period that includes the Global Financial Crisis?
The barbell results certainly speak to regime switching and may reveal how one strategy performs better in different regimes. The decision to employ one or the other could now turn on risk preferences and explainability, as opposed to performance results. No one wants to endure over 1,000 days of underperformance. But who is to say there will be another Tech Bubble or pre-Covid period? What if we just looked at the 2005-2015 period? Guess what 12-by-12 outperforms by 30% points! But isn’t that cherry-picking too?
While these questions are relevant, they’re hard to answer without a crystal ball. That said, we can return to the recommendations of Marcos Lopéz de Prado and simulate future returns. We’ll look at that in detail in our next post. Code below.
# Built using Python 3.10.19 and a virtual environment # Load libraries import pandas as pd import numpy as np from datetime import datetime, timedelta import statsmodels.api as sm import matplotlib.pyplot as plt import yfinance as yf from matplotlib.ticker import FuncFormatter plt.style.use('seaborn-v0_8') plt.rcParams['figure.figsize'] = (14,8) # Function to get data def get_spy_weekly_data() -> pd.DataFrame: df = yf.download('SPY', start='2000-01-01', end='2024-10-01') df.columns = ['open', 'high', 'low', 'close', 'adj close', 'volume'] df.index.name = 'date' # Create training set and downsample to weekly ending Friday df_train = df.loc[:'2019-01-01', 'adj close'].copy() df_w = pd.DataFrame(df_train.resample('W-FRI').last()) df_w.columns = ['price'] return df_w # Get data df_w = get_spy_weekly_data() # Create momentum dictionary periods = [3, 6, 9, 12] momo_dict = {} for back in periods: for forward in periods: df_out = df_w.copy() df_out['ret_back'] = np.log(df_out['price']/df_out['price'].shift(back)) df_out['ret_for'] = np.log(df_out['price'].shift(-forward)/df_out['price']) df_out = df_out.dropna() mod = sm.OLS(df_out['ret_for'], sm.add_constant(df_out['ret_back'])).fit() momo_dict[f"{back} - {forward}"] = {'data': df_out, 'params': mod.params, 'pvalues': mod.pvalues} # Prepare model model_name = '12 - 12' mod_look_forward = 12 train_pd = 5 test_pd = 1 tot_pd = train_pd + test_pd # Create trading dataframe df_trade = momo_dict[model_name]['data'].copy() # Run model with train/forecast steps trade_pred = [] for i in range(tot_pd, len(df_trade)+1, test_pd): train_df = df_trade.iloc[i-tot_pd:i-test_pd, 1:] test_df = df_trade.iloc[i-test_pd+mod_look_forward-1:i-test_pd+mod_look_forward, 1:] # Ensure 'ret_back' is 2D by selecting it as a DataFrame, not a Series X_train = sm.add_constant(train_df[['ret_back']]) if test_df.shape[0] > 1: X_test = sm.add_constant(test_df[['ret_back']]) else: X_test = sm.add_constant(test_df[['ret_back']], has_constant='add') # Fit the model mod_run = sm.OLS(train_df['ret_for'], X_train).fit() # Predict using the test data mod_pred = mod_run.predict(X_test).values trade_pred.extend(mod_pred) # Test for lenghth assert len(trade_pred) + mod_look_forward + train_pd - 1 == len(df_trade) # Add predictions to dataframe df_trade['pred'] = np.concatenate((np.repeat(np.nan, mod_look_forward + train_pd - 1), np.array(trade_pred))) # Generate returns df_trade['ret'] = np.log(df_trade['price']/df_trade['price'].shift(1)) # Generate signals df_trade['signal'] = np.where(df_trade['pred'] == np.nan, np.nan, np.where(df_trade['pred'] > 0, 1, 0)) df_trade['signal_sh'] = np.where(df_trade['pred'] == np.nan, np.nan, np.where(df_trade['pred'] >= 0, 1, -1)) # Generate strategy returns df_trade['strat_ret'] = df_trade['signal'].shift(1) * df_trade['ret'] df_trade['strat_ret_sh'] = df_trade['signal_sh'].shift(1) * df_trade['ret'] # Create 200-day strategy df_200 = df_w.copy() df_200.columns = ['price'] df_200 = df_200.resample('W-FRI').last() # 40 weeks = 200 days df_200['sma_200'] = df_200['price'].rolling(40).mean() df_200['ret'] = np.log(df_200['price']/df_200['price'].shift(1)) df_200['signal'] = np.where(df_200['price'] > df_200['sma_200'], 1, 0) df_200['strat_ret'] = df_200['signal'].shift(1)*df_200['ret'] df_200_bench = df_200.loc[df_trade['strat_ret'].index[0]:df_trade['strat_ret'].index[-1]] # Function from tearsheet for drawdowns def get_drawdown_periods(cumulative_returns): peak = cumulative_returns.cummax() drawdown = cumulative_returns - peak end_of_dd = drawdown[drawdown == 0].index dd_periods = [] start = cumulative_returns.index[0] for end in end_of_dd: if start < end: period = (start, end) dd_periods.append(period) start = end return drawdown, dd_periods # Get drawdowns and periods dd_12_12, pd_12_12 = get_drawdown_periods(df_trade['strat_ret'].cumsum()) dd_200, pd_200 = get_drawdown_periods(df_200_bench['strat_ret'].cumsum()) # Sort drawdown perios top_dd_pds_12 = sorted(pd_12_12, key=lambda x: (x[1] - x[0]).days, reverse=True)[:5] top_dd_pds_200 = sorted(pd_200, key=lambda x: (x[1] - x[0]).days, reverse=True)[:5] top_5_12 = [(end - start).days for start, end in top_dd_pds_12] top_5_200 = [(end - start).days for start, end in top_dd_pds_200] # Get drawdown magnitude peak_12 = df_trade['strat_ret'].cumsum().cummax() drawdown_12 = df_trade['strat_ret'].cumsum() - peak_12 peak_200 = df_200_bench['strat_ret'].cumsum().cummax() drawdown_200 = df_200_bench['strat_ret'].cumsum() - peak_200 drawdown_12[drawdown_12 != 0.0].drop_duplicates().nsmallest(5) drawdown_200[drawdown_200 != 0.0].drop_duplicates().nsmallest(5) top_five_12_min = [drawdown_12[start:end].min()*100 for start, end in top_dd_pds_12] top_five_200_min = [drawdown_200[start:end].min()*100 for start, end in top_dd_pds_200] # Plot top 5 drawdowns by magnitude x = np.arange(len(top_five_12_min)) width = 0.35 # Width of the bars fig, ax = plt.subplots() # Plotting bars side by side bar1 = ax.bar(x - width/2, top_five_12_min, width, label='Strategy') bar2 = ax.bar(x + width/2, top_five_200_min, width, label='Benchmark') # Adding labels, title, and legend ax.set_xlabel('') ax.set_ylabel('Percent (%)') ax.set_title('Top 5 drawdowns by strategy') ax.set_xticks(x) ax.set_xticklabels("") ax.legend() ax.invert_yaxis() plt.show() # Plot top 5 drawdown period comparisons x = np.arange(len(top_5_12)) width = 0.35 # Width of the bars fig, ax = plt.subplots() # Plotting bars side by side bar1 = ax.bar(x - width/2, top_5_12, width, label='Strategy') bar2 = ax.bar(x + width/2, top_5_200, width, label='Benchmark') # Adding labels, title, and legend ax.set_xlabel('') ax.set_ylabel('Days') ax.set_title('Duration of drawdown by strategy') ax.set_xticks(x) ax.set_xticklabels("") ax.legend() ax.yaxis.set_major_formatter(FuncFormatter(lambda x, p: f'{int(x):,}')) plt.show()
We’re not using the term significant with precision in the statistical sense. There are tests for the significance of the Sharpe Ratio, but we did not employ them.↩︎
Want to share your content on python-bloggers? click here.