Statistical Arbitrage_ Bayesian Pairs Trading System

StatArb Pairs: Optimized Mean- Reversion Trading System 8 Pham The Anh 21min read + Just now Pair Trading Strategy: Mechanism and Key Components Strategy Steps:Project Overview This Jupyter notebook implements a sophisticated pairs trading strategy using statistical arbitrage techniques and Bayesian optimization. The system is designed to identify and exploit mean-reversion opportunities in financial markets. The project is structured into the following key components: The project code and Jupyter Notebook are available on: + GitHub: https://github.com/theanh97/Statistical-Arbitrage-Bayesian- Optimized-Kappa-Half-life-Pairs-Trading-Engine/ + Kaggle: https://www.kaggle.com/code/phamtheanh97/statarb-pairs- Feel free to explore the code, run the notebook, and provide feedback. Contributions and 1. Data Collection and Preprocessing * Download historical stock data for 50 major companies using yfinance * Clean and align data for analysis2. Pair Selection + Implement cointegration tests ¢ Perform correlation analysis + Conduct Augmented Dickey-Fuller (ADF) tests for stationarity + Rank and select the best pairs for trading 3. Pairs Trading Strategy Development * Calculate key parameters: Kappa and Half-Life ¢ Implement the pairTradingStrategy class using Backtrader + Define entry and exit logic ¢ Implement risk management techniques 4. Backtesting Engine * Develop a custom BacktestEngine class * Simulate trades on historical data * Calculate comprehensive performance metrics5. Bayesian Optimization * Define the parameter space for optimization * Create an objective function based on Sharpe Ratio + Implement and run the optimization process to find optimal strategy parameters 6. Results Analysis « Visualize the equity curve * Generate and analyze trade tables Calculate and interpret key performance metrics 7. Future Improvements + Suggest potential enhancements to the strategy * Discuss areas for further research and development8. Conclusion * Recap the project’s achievements * Discuss the potential of statistical arbitrage in modern markets ¢ Offer final thoughts on the future of quantitative trading strategies This project demonstrates the application of advanced statistical methods and machine learning optimization techniques in developing a robust pairs trading system. By following this structured approach, we aim to create a strategy that can identify and exploit market inefficiencies in a systematic and data-driven manner. Key technologies and libraries used include Python, Jupyter Notebook, yfinance, pandas, numpy, statsmodels, Backtrader, scikit-optimize (skopt), and matplotlib. 1. Data Collection and Preprocessing The foundation of any robust trading strategy lies in high-quality, well- prepared data. In this section, we'll discuss our approach to collecting and preprocessing the historical stock data necessary for our pairs trading strategy.A. Selecting the stock universe For this project, we've chosen a diverse set of 50 stocks from various sectors of the market. This selection provides a broad base for identifying potential trading pairs. Our stock universe includes major companies such as: * Technology: AAPL, MSFT, GOOGL, AMZN, META « Finance: JPM, V, BAC + Healthcare: JNJ, UNH, PFE * Consumer goods: PG, KO, NKE * Energy: XOM, CVX + And many others... This diverse selection allows us to explore potential pairs both within and across different sectors. B. Downloading historical data using yfinance To obtain historical stock data, we utilize the yfinance library, which provides a convenient interface to download data from Yahoo Finance. Here's how we implement the data download process:import yfinance as yf ‘import os from datetime import datetime # List of 59 stock tickers List_tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META', 'TSLA', "BRK-A', # Set start and end times start_time = '202-01-01' end_time = '2023-01-01' # Create a folder to save data if it doesn't exist folder_name = 'stock_data ‘if not os.path.exists(folder_name) : os.makedirs (folder_name) # Function to download and save data for a stock def download_and_save_stock_data(ticker) : try: # Download data data = yf.download(ticker, start=start_time, end=end_time # Create file name file_name = os.path.join(folder_name, f"{ticker}.csv") # Save data to CSV file data.to_csv(file_name) print(f"Downloaded and saved data for {ticker}") except Exception as e: print(f"Error downloading data for {ticker}: {str(e)}") # Download and save data for all stocks for ticker in List_tickers: download_and_save_stock_data(ticker) print ("Completed downloading data for all stocks. I,This script downloads daily data for each stock from January 1, 2020, to January 1, 2023, providing us with three years of historical data to work with. By carefully collecting our data, we ensure a solid foundation for the subsequent steps in our pairs trading strategy development. This attention to data quality is crucial for the success of our statistical arbitrage approach. 3. Pair Selection In the realm of pairs trading, selecting the right pairs of stocks is crucial for the strategy’s success. Our approach combines several statistical methods to identify pairs with the highest potential for profitable mean-reversion trading. A. Introduction to cointegration and its importance Cointegration is a fundamental concept in our pair selection process. Two time series are considered cointegrated if they share a long-term equilibrium relationship, even though they may deviate from this equilibrium in the short term. For pairs trading, cointegration suggests that the spread between two stocks tends to revert to a mean over time, providing opportunities for trading.B. Implementing cointegration tests We use the Engle-Granger two-step method to test for cointegration. This is implemented in our analyze_pair function: from statsmodels.tsa.stattools import coint def analyze_pair(stockl, stock2): # ... other code ... # Check Cointegration score, pvalue, _ = coint(stocki['Close'], stock2['Close']) # other code ... The coint function from statsmodels performs the cointegration test, returning a p-value that we use to assess the strength of the cointegration relationship. C. Correlation analysis and its role While cointegration captures long-term relationships, correlation helps us understand the short-term co-movement of stock prices. We calculate the correlation coefficient between potential pairs:def analyze_pair(stockl, stock2): # ... other code # Calculate Correlation correlation = stocki['Close'] .corr(stock2['Close']) 4... other code Highly correlated pairs might indicate a stronger relationship, but we also consider that lower correlations could provide more trading opportunities. D. Augmented Dickey-Fuller (ADF) test for stationarity The ADF test helps us determine if the spread between two stocks is stationary, which is crucial for mean-reversion strategies: from statsmodels.tsa.stattools import adfuller def analyze_pair(stocki, stock2): # ... other code # Augmented Dickey-Fuller Test on log spread adf_result = adfuller(log_spread) # ... other codeA stationary spread suggests that the difference between the two stocks tends to revert to a mean over time, which is ideal for our strategy. E. Ranking and selecting the best pairs We analyze all possible pairs from our stock universe using these methods: def analyze_all_pairs(tickers, data_folder='stock_data'): results = [] for ticker1, ticker2 in combinations(tickers, 2): try: stockl = load_stock data(ticker1, data_folder) stock? = load_stock data(ticker2, data_folder) # Ensure both stocks have the same date range common_dates = stocki. index. intersection (stock2. index) stock1 = stock1.loc[common_dates] stock2 = stock2.loc{common_dates] result = analyze_pair(stockl, stock2) results. append({ 'pair': f'{ticker1}-{ticker2}', ‘cointegration_pvalue': result['cointegration_pvalue'], ‘correlation’: result['correlation'], ‘adf_pvalue': result['adf_pvalue'], ‘mean_zscore': result['z_score’].mean(), ‘std_zscore': result['z_score'] .std() yn except Exception as e: print(f"Error analyzing pair {ticker1}-{ticker2}: {str(e)}") return pd.DataFrame(results)We then filter the pairs based on cointegration and stationarity criteria: def filter_suitable_pairs(df_results, cointegration_threshold=0.05, adf_threshol return df_results[ (df_results['cointegration_pvalue'] < cointegration_threshold) & (df_results['adf_pvalue'] < adf_threshold) ].sort_values('correlation', ascending=False) Finally, we select the best pair: def find_best_pair(tickers, data_folder='stock_data', cointegration_threshol all_pairs = analyze_all_pairs(tickers, data_folder) suitable_pairs = filter_suitable_pairs(all_pairs, cointegration_threshold, a if suitable_pairs.empty: return None, pd.DataFrame() best_pair = suitable_pairs.iloc(o] # ... additional code to calculate metrics for the best pair ... This process allows us to identify the pairs that show the strongest statistical evidence of mean-reverting behavior, providing a solid foundation for our trading strategy.By applying these rigorous statistical tests, we can identify pairs of stocks that are most likely to exhibit the mean-reverting behavior necessary for our statistical arbitrage strategy. This careful pair selection process forms the foundation for potentially profitable trading opportunities in the subsequent steps of our project. 4. Pairs Trading Strategy Development After selecting suitable pairs, the next crucial step is to develop our pairs trading strategy. This section focuses on the implementation of the PairTradingStrategy class, which forms the core of our trading logic. A. Calculating key parameters: Kappa and Half-Life Before implementing the strategy, we calculate two important parameters: Kappa and Half-Life. These parameters help us understand the mean- reversion characteristics of our selected pairs.‘import numpy as np ‘import pandas as pd ‘import matplotlib.pyplot as plt ‘import statsmodels.api as sm def calculate_spread(stock1, stock2): return np. log(stock1['Close']) - np.log(stock2['Close']) def calculate_half_life (spread) : spread_lag = spread. shift (1) spread_lag.iloc[®] = spread_lag.iloc[1] spread_ret = spread - spread_lag spread_ret.iloc[9] = spread_ret.iloc[1] spread_lag2 = sm.add_constant (spread_lag) model = sm.OLS(spread_ret, spread_lag2) res = model. fit() halflife = -np.log(2) / res.params(1] return halflife def calculate_kappa_half_life(spread) # Calculate Kappa spread_lag = spread.shift(1) delta_spread = spread - spread_lag reg = np.polyfit(spread_lag.dropna(), delta_spread.dropna(), deg=1) kappa = -reg[0] # Calculate Half-Life half_life = calculate_half_life(spread) return kappa, half_lifeThese functions allow us to calculate the Kappa (mean reversion rate) and Half-Life (time for the spread to revert halfway to its mean) for our selected pairs. Designing the PairTradingStrategy class The PairTradingStrategy class is implemented using the Backtrader framework. Here's the core structure of our strategy: import backtrader as bt ‘import numpy as np from collections import deque class PairTradingStrategy (bt. Strategy) : params = ( ( lookback", 20), Centry_threshold', 1.5), # Entry threshold in terms of standard deviat ('stoploss_factor', 2.0), # Stop-Loss threshold in terms of standard de Cholding_time_factor', 1.5), # Factor to determine max holding time ba Chalf_life', 14), # Estimated half-life of mean reversion in days ('stocki', None), (’stock2", None), init__(self): # Initialize strategy components self.stockl = self.getdatabyname(self.params.stock1) self.stock2 = self.getdatabyname(self.params.stock2) self.spread = [] self.mean = None self.std = None self.entry_price = None self.entry date = None self.max_holding_time = int(self.params.holding_time_factor * self.param# Initialize performance tracking self.trades = [] self.equity_curve = [self.broker.getvalue()] self.returns = deque(maxlen=252) self.max_drawdown = 0 self.peak = self.broker.getvalue() def next(self): # Strategy logic implementation # ... (details in the following sections) def log_trade(self, action, days_held=None, reason=None) : # Log trade details # =... (implementation details) def stop(self): # Calculate final performance metrics # =... (implementation details) C. Entry and exit logic The core trading logic is implemented in the next() method:def next(self): # Calculate the spread spread = np.log(self.stock1.close[®]) - np. log(self.stock2.close[0]) self. spread. append (spread) # Wait until we have enough data ‘if len(self.spread) <= self.params. lookback: return # Calculate mean and standard deviation self.mean = np.mean(self. spread{-self.params. lookback: ]) self.std = np.std(self.spread[-self.params. lookback: ]) # Calculate trading thresholds buy_threshold = self.mean - self.params.entry threshold * self.std sell_threshold = self.mean + self.params.entry_threshold * self.std # Trading logic if not self.position: if spread < buy_threshold: self. buy (data=self.stock1) self. sell (data=self.stock2) self.entry_price = spread self.entry_date = len(self) self. log_trade(" ENTRY LONG") elif spread > sell_threshold: self. sell (data=self.stock1) self. buy (data-seLf.stock2) self.entry_price = spread self.entry date = len(self) self.log_trade('ENTRY SHORT") else: # Exit logic # ... Cimplementation details for exit conditions) D. Risk management techniques Our strategy incorporates several risk management techniques:1. Stop-loss: We exit the trade if the spread moves against us beyonda certain threshold. 2. Maximum holding time: We limit the duration of each trade based on the calculated half-life. 3. Position sizing: While not explicitly shown in this code snippet, proper position sizing is crucial and should be implemented based on the account size and risk tolerance. # Inside the next() method, for existing positions if abs(spread - self.entry_price) > self.params.stoploss_factor * self.std: self.close(data=self.stock1) self.close(data=self. stock2) self.log_trade('STOP-LOSS', days_since_entry, reason="Stop-Loss Hit") # Max holding time check days_since_entry = len(self) - self.entry_date if days_since_entry >= self.max_holding_time: self.close(data=self.stock1) self.close(data=self. stock2) self. log_trade('EXIT', days_since_entry, reason="Max Holding Time")By implementing these components, we create a robust pairs trading strategy that capitalizes on mean-reversion opportunities while managing risk effectively. This strategy forms the core of our statistical arbitrage system, setting the stage for backtesting and optimization in the subsequent steps of our project. 5. Backtesting Engine After developing our pairs trading strategy, the next crucial step is to test its performance on historical data. This is where our backtesting engine comes into play. Let’s explore how we implement and use this engine to evaluate our strategy. A. Introduction to Backtrader framework Our backtesting engine is built using the Backtrader framework, which provides a flexible and powerful platform for testing trading strategies. While we don't explicitly discuss Backtrader in the fl file, it’s evident from the code that we're using this framework. B. Implementing the custom BacktestEngine class The f1 file provides a custom sacktestengine class that wraps around Backtrader's functionality. Here's the implementation:‘import backtrader as bt ‘import pandas as pd ‘import matplotlib.pyplot as plt from tabulate import tabulate class BacktestEngine: def def def def def __init__(self, strategy_class, dfi, df2, initial_capital=100000, commiss self.cerebro = bt.Cerebro() self.strategy_class = strategy_class self.dfl = df1 self.df2 = df2 self.initial_capital = initial_capital self.commission = commission self.results = None add_data(self) : datal = bt. feeds.PandasData(dataname=self.df1, name="stocki" data2 = bt. feeds. PandasData(dataname-self.df2, name="stock2" self .cerebro.adddata(datal) self .cerebro.adddata(data2) set_strategy(self, **kwargs) self .cerebro.addstrategy(self.strategy_class, *+kwargs) set_broker(self, initial_capital=None, commission=None) if imitial_capital is None: initial_capital = self. initial_capital if commission is None: commission = self.commission run(self): self.add_data() self.set_broker() self.results = self.cerebro.run()This class encapsulates the process of setting up the backtesting environment, adding data, setting the strategy and broker parameters, and running the backtest. C. Simulating trades on historical data The run method of our Backtestengine class is responsible for simulating trades on historical data. It uses the Backtrader engine to run the strategy on the provided data: def run(self): self.add_data() self.set_broker() self.results = self.cerebro.run() This method adds the data, sets up the broker, and then runs the backtest, storing the results for later analysis. D. Calculating performance metrics After running the backtest, we calculate various performance metrics to evaluate our strategy. The Backtesténgine class includes methods for this purpose:def get_metrics(self) : if not self.results return {"Error": "No backtest results available. Please run the backtest strat = self.results(0] if not hasattr(strat, 'metrics') return {"Error": "No metrics found in the strategy. Make sure to impleme return strat.metrics def print_metrics(self) : metrics = self.get_metrics() if isinstance(metrics, dict) and "Error" in metrics: print (metrics["Error"]) return table_data = [["Metric", "Value"] for key, value in metrics. items(): ‘if isinstance(value, float): table_data.append([key, f"{value: .2f}"]) else: table_data.append([key, value]) print("\nBacktest Results:") print (tabulate(table_data, headers="Firstrow", tablefmt="grid") These methods retrieve and print the performance metrics calculated during the backtest. Additionally, the engine provides methods for visualizing the results:def plot_equity_curve(self) : if not self.results: print("No results available. Make sure to run the backtest first return strat = self.results[0] if not hasattr(strat, 'equity_curve'): print("No equity curve data found in the strategy.") return # Convert equity curve to pandas Series equity_curve = pd.Series(strat.equity_curve, index=self.dfl.index[:len(strat # Calculate drawdown drawdown = (equity_curve.cummax() - equity_curve) / equity_curve.cummax() # Create figure with two subplots fig, (axl, ax2) = plt.subplots(2, 1, figsize=(12, 10), gridspec_k Fig.suptitle('Equity Curve and Drawdown', fontsize=16) {'height_ # Plot equity curve ax1.plot (equity_curve. index, equity_curve.values, label='Equity Curve’, colo axl.set_title('Equity Curve’) ax1.set_ylabel('Portfolio Value") axl. legend() axl. grid (True) # Plot drawdown ax2. fill_between (drawdown. index, drawdown.values, , alpha= ax2.set_title('Drawdown') ax2.set_ylabel (‘Drawdown’) ax2.set_xlabel('Date') ax2. legend () ax2.grid (True) +3, color='red',# Format x-axis plt.gcf() autofmt_xdate() # Add strategy performance metrics as text metrics = self.get_metrics() metrics_text = (f"Total Return: {metrics['Total Return (%)']:.2f}%\n" f"Sharpe Ratio: {metrics['Sharpe Ratio']:.2f}\n' lax Drawdown: {metrics['Max Drawdown (%)']:.2}%\n" f"Win Rate: {metrics['Win Rate (%)']:.2f}%") Fig.text(@.02, 0.02, metrics_text, fontsize=10, va='bottom') # Adjust Layout plt.tight_layout(rect=[0, 0.03, 1, 0.95]) # Show plot plt.show() This method creates a comprehensive visualization of the strategy’s performance, including the equity curve and drawdown. By implementing this backtesting engine, we can thoroughly evaluate our pairs trading strategy on historical data, calculate key performance metrics, and visualize the results. This forms a crucial step in our project, allowing us to assess the strategy’s effectiveness before moving on to optimization and real-world implementation.6. Bayesian Optimization After implementing our backtesting engine, the next step is to optimize our strategy’s parameters. We use Bayesian optimization for this purpose, which is an efficient method for finding the optimal parameters of our trading strategy. A. The need for parameter optimization Our pairs trading strategy involves several parameters that can significantly affect its performance. These include the lookback period, entry threshold, stop-loss factor, and others. Optimizing these parameters can potentially improve the strategy’s performance. B. Introduction to Bayesian optimization Bayesian optimization is a powerful technique for optimizing black-box functions, particularly useful when the function is expensive to evaluate — as is the case with our backtesting process. It uses probabilistic models to guide the search for optimal parameters. C. Defining the parameter space In our implementation, we define the parameter space for our optimization process. Here’s how we set it up:from skopt import gp_minimize from skopt.space import Real, Integer from skopt.utils import use_named_args def bayesian_optimization(dfl, df2, param_ranges, n_calls=50, initial_capital=1e space = [ Integer (param_ranges[' lookback" ] [0] .min(), param_ranges[' lookback" [0]. Real (param_ranges['entry_threshold'][0].min(), param_ranges['entry_thres Real (param_ranges['stoploss_factor'][0].min(), param_ranges['stoploss_fa Real (param_ranges['holding_time_factor'][].min(), param_ranges['holding 1 fixed_half_life = param_ranges['half_life'] [0] # ... rest of the function This code defines the search space for our parameters, including lookback period, entry threshold, stop-loss factor, and holding time factor. D. Creating the objective function The objective function is what we're trying to optimize. In our case, it’s based on the Sharpe Ratio of our strategy:def objective(params, df1, df2, initial_capital=100000, commission=0.001): lookback, entry_threshold, stoploss_factor, holding time_factor, half_life engine = Backtestngine( strategy_class=PairTradingstrategy, dfi-dfi, df2=df2, jnitial_capital=initial_capital ) engine. set_strategy( ‘ookback=int (Lookback) , entry_threshold=entry_threshold, stoploss_factor=stoploss_factor, holding_time_factor=holding_time_factor , half_life=half_life, tock", tock2" ) engine. set_broker (conmission=commission, initial_capital=initial_capital) engine. run() metrics = engine.get_metrics() # We want to maximize Sharpe Ratio, so we return its negative return -metrics.get('Sharpe Ratio", 0) This function runs a backtest with given parameters and returns the negative Sharpe Ratio (we negate it because the optimization algorithm minimizes the objective function, but we want to maximize the Sharpe Ratio).E. Running the optimization process Finally, we run the Bayesian optimization process: @use_named_args(space) def objective_wrapper(**params) : params_list = List(params.values()) + [fixed_half_life] return objective(params_list, dfl, df2, initial_capital, commission) def callback(res): n = len(res.x_iters) print(f"Optimization progress: {n / n_calls * 100:.2f}%") result = gp_minimize(objective_wrapper, space, n_calls-n_calls, random_state=42, best_params = { "Lookback': int(result.x[0]), ‘entry_threshold': result.x(11, ‘stoploss_factor': result.x(2], ‘holding time_factor': result.x[3], ‘half_life': fixed_half_life } return best_params, -result.fun # Return best parameters and best Sharpe Ratio This code runs the optimization process, printing progress updates, and returns the best parameters found along with the corresponding Sharpe Ratio.By implementing Bayesian optimization, we can efficiently search the parameter space to find the configuration that yields the best performance for our pairs trading strategy. This optimized strategy can then be further tested and potentially deployed in live trading scenarios. 7. Results Analysis After implementing our pairs trading strategy, backtesting it, and optimizing its parameters, the final crucial step is to analyze the results. This analysis helps us understand the performance of our strategy and make informed decisions about its potential real-world application. A. Visualizing the equity curve One of the key visualizations provided by our BacktestEngine is the equity curve. This gives us a clear picture of how our portfolio value changes over time. The plot_equity_curve method in our BacktestEngine class generates this visualization:def plot_equity_curve(self): if not self.results: print("No results available. Make sure to run the backtest first.") return strat = self.results[0] if not hasattr(strat, 'equity_curve'): print("No equity curve data found in the strategy return # Convert equity curve to pandas Series equity_curve = pd.Series(strat.equity_curve, index=self.dfl.index[:len(strat # Calculate drawdown drawdown = (equity_curve.cummax() - equity_curve) / equity_curve.cummax() # Create figure with two subplots fig, (axl, ax2) = plt.subplots(2, 1, figsize=(12, 10), gridspec_kw={'height_ fig.suptitle('Equity Curve and Drawdown', fontsize=16) # Plot equity curve ax1.plot (equity_curve. index, equity_curve.values, label='Equity Curve', colo ax1.set_title(' Equity Curve") ax1.set_ylabel('Portfolio Value’) ax. legend() ax1.grid(True) # Plot drawdown ax2.fill_between(drawdown.index, drawdown.values, 0, alpha=0.3, color="red’, ax2.set_title( Drawdown" ) ax2.set_ylabel( ‘Dravdown' ) ax2.set_xlabel('Date') ax2.legend() ax2.grid(True) # Format x-axis plt.gcf() -autofmt_xdate()# Add strategy performance metrics as text metrics = self.get_metrics() metrics_text = (f"Total Return: {metrics['Total Return (%)']:.2f}%\n" f"Sharpe Ratio: {metrics['Sharpe Ratio']:.2f}\n" "Max Drawdown: {metrics['Max Drawdown (%)']:.2}%\n" f"Win Rate: {metrics['Win Rate (%)']:.2F}%") fig.text(0.02, 0.02, metrics_text, fontsize=10, va='bottom!') # Adjust Layout plt.tight_layout(rect=[0, 0.03, 1, 0.95]) # Show plot plt.show() This method creates a comprehensive visualization that includes both the equity curve and the drawdown over time, providing a clear picture of the strategy’s performance. B. Analyzing the trade table To get a detailed view of individual trades, we can generate a trade table. The print_trade_table method in our BacktestEngine class provides this functionality:def print_trade_table(self, num_trades=None) : ‘if not self.results: print("'No results available. Make sure to run the backtest first.") return strat = self.results[0] ‘if not hasattr(strat, 'trades' print("No trade information found in the strategy. Make sure to log trad return trades = strat. trades ‘if num_trades is not None: trades = trades[:num_trades] trade_data = [] for trade in trades: trade_data.append([ trade.get('entry_date', 'N/A'), trade.get('exit_date', 'N/A'), trade.get('days_held', 'N/A'), f"{trade.get('pnl', 'N/A'):.2f}" f"{trade.get(*pnl_pct', 'N/A'):.2f}%", f"{trade.get('entry_pricel', 'N/A'):.2f}", f"{trade.get(‘entry_price2", 'N/A'):.2F}", f"{trade.get(‘exit_pricel', 'N/A'):.2f}", f"{trade.get(‘exit_price2', 'N/A'):.2f}", trade.get('exit_type', 'N/A'), trade.get('exit_reason', 'N/A') ap] headers = ["Entry Date", "Exit Date", "Days Held", "PL", "Pnl 5%", Entry Price 1", “Entry Price 2", “Exit Price 1", "Exit Price 2", print("\nTrade History:") print(tabulate(trade_data, headers=headers, tablefmt="grid"))This method provides a detailed view of each trade, including entry and exit dates, profit/loss, and reasons for exit. C. Key performance metrics discussion Our BacktestEngine calculates and stores various performance metrics. We can access these metrics using the get_metrics method: def get_metrics(self): if not self.result: return {"Error": "No backtest results available. Please run the backtest strat = self.results[0] if not hasattr(strat, 'metrics'): return {"Error": "No metrics found in the strategy. Make sure to impleme return strat.metrics The specific metrics calculated are defined in the stop method of our PairTradingStrategy:def stop(self): # Calculate final metrics self.total_trades = len(self. trades) self.winning_trades = sum(1 for trade in self.trades if trade['pnl'] > 0) self.losing_trades = sum(1 for trade in self.trades if trade['pnl'] <= 0) self.win_rate = self.winning_trades / self.total_trades if self.total_trades self.total_return = (self.equity_curve[-1] - self.equity_curve[0]) / self.eq self.mean_return = np.mean(self.returns) if self.returns else 0 self.std_return = np.std(self.returns) if self.returns else 0 self.sharpe_ratio = np.sqrt(252) * self.mean_return / self.std_return if sel # Store all metrics in a dictionary for easy access self.metrics = { ‘Initial Capital’: self.equity_curve[o], ‘Final Portfolio Value!: self.equity_curve[-1], ‘Total Return (%)': self.total_return * 100, ‘Sharpe Ratio': self.sharpe_ratio, "Max Drawdown (%)': self.max_drawdown * 100, ‘Total Trades': self.total_trades, ‘Winning Trades': self.winning_trades, ‘Losing Trades': self.losing_trades, ‘Win Rate (%)': self.win_rate * 100, ‘Mean Daily Return (%)': self.mean_return * 100, ‘Std Dev of Daily Return (%)': self.std_return * 100 These metrics provide a comprehensive view of the strategy’s performance, including profitability, risk-adjusted returns, and trading statistics.D. Comparing strategy performance to benchmarks While not explicitly implemented in the provided code, it’s important to compare our strategy’s performance to relevant benchmarks, such as the overall market performance or a simple buy-and-hold strategy on the individual stocks. This comparison helps us understand if our strategy is truly adding value beyond what could be achieved with simpler, less sophisticated approaches. By thoroughly analyzing these results, we can gain valuable insights into the effectiveness of our pairs trading strategy, its strengths and weaknesses, and potential areas for further improvement. JUHU IIIS IIIS IOI IIIS IDOI IDOI ABI Finding the best pair for pair trading... Dette RRR EHK LOOMS REAAAAARAAAELARAAARL] 1 Of 1 completed DORR LOOSS kA akiKRAkALEA] 1 of 1 completedCompleted pa /var/folders/3h/k_7p78896dn6j jq56w5lrq6mo0eegn/T/ipykernel_1710/1552435527 .py:21 halflife = -np.log(2) / res.params[1] XDI SDDS III IDO IOI II IIIS III Stock Pair for Pair Trading: Stock 1: SFT Stock 2: ACN Kappa: 9.06 Half-Life: 12.25Best parameters found: lookback: 30 entry_threshold: 2.0093080852402427 stoploss_factor: 3.0000000000000013 holding_time_factor: 1.9523141951662566 half_life: 12.249172391275549 Best Sharpe Ratio: 2.0846017856569983 JERI ISI OI IO IO III III I IOI II ISO III III III Final Backtest Results: Backtest Results: é | Metric | vatue | bs Initial Capital | 19000 | Final Portfolio Value | 1216.8 | Total Return (%) | 2.17 | Sharpe Ratio | 2.08 | Max Drawdown (%) | 0.28 | Total Trades lees 29 easy Winning Trades ee2oneeeT Losing Trades [eee Oreee| Win Rate (%) Teeete0;c cI Mean Daily Return (%) | oe Std Dev of Daily Return (%) | 0.03 | $= ta toto totic totrcici-i—Equity Curve and Drawdown Equity Curve 10200 10150 10100 Portfolio Value 10050 10000 Drawdown Ts Drawdow? Faas | g Foo 0.000 4 1 ry > . ° P > nrainensitizn oo we gt a rl Sharpe Ratio: 2.08 Max Drawdown: 0.28% Win Rate: 100.00%8. Future Improvements While our current implementation of the StatArb Pairs trading strategy provides a solid foundation, there are several areas where we could potentially enhance and expand the system. Although the fl file doesn't explicitly outline future improvements, we can infer some potential enhancements based on the current implementation and common practices in quantitative trading. A. Expanding the stock universe Our current implementation uses a predefined list of 50 stock tickers: List_tickers = ['AAPL', 'MSFT', 'GOOGL', ‘AMZN’, 'META', 'TSLA', 'BRK-A', 'V", ' A potential improvement would be to expand this universe to include a larger set of stocks or even other asset classes. This could involve: 1, Automating the process of stock selection based on certain criteria (e.g., market cap, sector, liquidity). 2. Implementing a dynamic stock universe that updates periodically based on market conditions.3. Extending the strategy to other asset classes like ETFs, futures, or forex pairs. B. Incorporating fundamental data Our current strategy relies primarily on price data. Incorporating fundamental data could potentially improve the pair selection process and overall strategy performance. This might include: 1. Using financial ratios to identify similar companies for potential pairs. 2. Incorporating earnings data or other financial metrics into the trading signals. 3. Considering macroeconomic indicators that might affect the relationship between pairs. C. Exploring machine learning for pair selection While our current pair selection process uses statistical methods like cointegration and correlation, machine learning techniques could potentially enhance this process. Some possibilities include: 1. Using clustering algorithms to identify groups of similar stocks. 2. Implementing a classification model to predict which pairs are likely to perform well.3, Applying reinforcement learning techniques to dynamically adjust the strategy parameters. D. Enhancing risk management Although our current implementation includes some risk management techniques, this area could be further improved: # Current risk management in the strategy if abs(spread - self.entry_price) > self.params.stoploss_factor * self.std: self.close(data=self.stockl) self. close(data=self.stock2) self. log_trade('STOP-LOSS', days_since_entry, reason="Stop-Loss Hit") Potential enhancements could include: 1. Implementing more sophisticated stop-loss mechanisms. 2. Adding position sizing based on volatility or other risk metrics. 3. Incorporating a risk parity approach for portfolio construction when trading multiple pairs.E. Improving execution modeling Our current backtesting assumes perfect execution, which is not realistic in live trading. Improvements in this area could include: 1. Modeling slippage and transaction costs more accurately. 2. Implementing limit orders instead of market orders. 3. Considering liquidity constraints and their impact on trade execution. F. Expanding performance analysis While our current analysis provides good insights, we could enhance it further: # Current performance metrics self.metrics = { "Initial Capital': self.equity_curve[6], ‘Final Portfolio Value': self.equity_curve[-1], ‘Total Return (%)': self.total_return * 100, "Sharpe Ratio': self.sharpe_ratio, "Max Drawdown (%)': self.max_drawdown * 100, ‘Total Trades': self.total_trades, ‘Winning Trades": self.winning_trades, ‘Losing Trades': self. losing_trades, ‘Win Rate (%)': self.win_rate * 100, ‘Mean Daily Return (%)': self.mean_return * 100, "Std Dev of Daily Return (%)': self.std_return * 100Potential improvements include: 1. Implementing more advanced risk-adjusted return metrics (e.g., Sortino ratio, Calmar ratio). 2. Conducting more detailed drawdown analysis. 3. Performing attribution analysis to understand sources of returns. By implementing these improvements, we could potentially enhance the robustness, performance, and applicability of our StatArb Pairs trading strategy. 9. Conclusion As we reach the end of our exploration of the StatArb Pairs trading system, it’s important to reflect on what we’ve accomplished and the potential impact of this project. A. Recap of the project’s achievements Throughout this project, we've successfully developed a comprehensive pairs trading strategy using statistical arbitrage techniques and Bayesian optimization. Our key achievements include:1. Data Collection and Preprocessing: We implemented a robust system to download and preprocess historical stock data for a diverse set of 50 major companies: List_tickers = ["AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META', 'TSLA', 'BRK-A', 'V', . Pair Selection: We developed a sophisticated pair selection process using statistical methods such as cointegration tests, correlation analysis, and the Augmented Dickey-Fuller test. N . Strategy Development: We created a pairtradingstrategy class that implements the core logic of our trading strategy, including entry and exit rules, and risk management techniques. » . Backtesting Engine: We built a custom BacktestEngine class that allows us to rigorously test our strategy on historical data and calculate comprehensive performance metrics. » . Bayesian Optimization: We implemented a Bayesian optimization process to fine-tune our strategy parameters, potentially improving its performance. wa . Results Analysis: We developed tools to visualize and analyze our strategy’s performance, including equity curves, drawdown charts, andB. The potential of statistical arbitrage in modern markets Our project demonstrates the continued relevance and potential of statistical arbitrage strategies in today’s markets. By leveraging advanced statistical techniques and machine learning optimization, we’ve shown how it’s possible to identify and exploit subtle market inefficiencies. The pairs trading approach we've developed offers several advantages: 1. Market neutrality, which can provide returns uncorrelated with overall market movements. 2. Risk management through the balanced long-short structure of trades. 3. Adaptability to various market conditions through parameter optimization. C. Final thoughts on the future of quantitative trading strategies As we look to the future, it’s clear that quantitative trading strategies like the one we've developed will continue to play a crucial role in financial markets. However, as markets become increasingly efficient and competitive, the importance of continuous innovation and refinement cannot be overstated. Some key areas for future development include:1. Incorporating alternative data sources to gain unique insights. 2. Leveraging advanced machine learning techniques for improved pattern recognition and prediction. 3. Expanding to other asset classes and global markets. 4, Enhancing execution algorithms to minimize market impact and transaction costs. In conclusion, our StatArb Pairs project serves as a solid foundation for pairs trading strategy development. It demonstrates the power of combining statistical analysis, algorithmic trading, and machine learning optimization. As the financial landscape continues to evolve, strategies like this — continually refined and adapted — will be crucial tools for investors seeking to navigate the complexities of modern markets.

Statistical Arbitrage_ Bayesian Pairs Trading System

Uploaded by

Copyright:

Available Formats

You might also like

Statistical Arbitrage_ Bayesian Pairs Trading System

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical Arbitrage_ Bayesian Pairs Trading System

Uploaded by

Copyright:

Available Formats

You might also like