StatArb Pairs: Optimized Mean-
Reversion Trading System
8 Pham The Anh
21min read + Just now
Pair Trading Strategy: Mechanism and Key Components
Strategy Steps:Project Overview
This Jupyter notebook implements a sophisticated pairs trading strategy
using statistical arbitrage techniques and Bayesian optimization. The
system is designed to identify and exploit mean-reversion opportunities in
financial markets. The project is structured into the following key
components:
The project code and Jupyter Notebook are available on:
+ GitHub: https://github.com/theanh97/Statistical-Arbitrage-Bayesian-
Optimized-Kappa-Half-life-Pairs-Trading-Engine/
+ Kaggle: https://www.kaggle.com/code/phamtheanh97/statarb-pairs-
Feel free to explore the code, run the notebook, and provide feedback.
Contributions and
1. Data Collection and Preprocessing
* Download historical stock data for 50 major companies using yfinance
* Clean and align data for analysis2. Pair Selection
+ Implement cointegration tests
¢ Perform correlation analysis
+ Conduct Augmented Dickey-Fuller (ADF) tests for stationarity
+ Rank and select the best pairs for trading
3. Pairs Trading Strategy Development
* Calculate key parameters: Kappa and Half-Life
¢ Implement the pairTradingStrategy class using Backtrader
+ Define entry and exit logic
¢ Implement risk management techniques
4. Backtesting Engine
* Develop a custom BacktestEngine class
* Simulate trades on historical data
* Calculate comprehensive performance metrics5. Bayesian Optimization
* Define the parameter space for optimization
* Create an objective function based on Sharpe Ratio
+ Implement and run the optimization process to find optimal strategy
parameters
6. Results Analysis
« Visualize the equity curve
* Generate and analyze trade tables
Calculate and interpret key performance metrics
7. Future Improvements
+ Suggest potential enhancements to the strategy
* Discuss areas for further research and development8. Conclusion
* Recap the project’s achievements
* Discuss the potential of statistical arbitrage in modern markets
¢ Offer final thoughts on the future of quantitative trading strategies
This project demonstrates the application of advanced statistical methods
and machine learning optimization techniques in developing a robust pairs
trading system. By following this structured approach, we aim to create a
strategy that can identify and exploit market inefficiencies in a systematic
and data-driven manner.
Key technologies and libraries used include Python, Jupyter Notebook,
yfinance, pandas, numpy, statsmodels, Backtrader, scikit-optimize (skopt),
and matplotlib.
1. Data Collection and Preprocessing
The foundation of any robust trading strategy lies in high-quality, well-
prepared data. In this section, we'll discuss our approach to collecting and
preprocessing the historical stock data necessary for our pairs trading
strategy.A. Selecting the stock universe
For this project, we've chosen a diverse set of 50 stocks from various sectors
of the market. This selection provides a broad base for identifying potential
trading pairs. Our stock universe includes major companies such as:
* Technology: AAPL, MSFT, GOOGL, AMZN, META
« Finance: JPM, V, BAC
+ Healthcare: JNJ, UNH, PFE
* Consumer goods: PG, KO, NKE
* Energy: XOM, CVX
+ And many others...
This diverse selection allows us to explore potential pairs both within and
across different sectors.
B. Downloading historical data using yfinance
To obtain historical stock data, we utilize the yfinance library, which
provides a convenient interface to download data from Yahoo Finance.
Here's how we implement the data download process:import yfinance as yf
‘import os
from datetime import datetime
# List of 59 stock tickers
List_tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META', 'TSLA', "BRK-A',
# Set start and end times
start_time = '202-01-01'
end_time = '2023-01-01'
# Create a folder to save data if it doesn't exist
folder_name = 'stock_data
‘if not os.path.exists(folder_name) :
os.makedirs (folder_name)
# Function to download and save data for a stock
def download_and_save_stock_data(ticker) :
try:
# Download data
data = yf.download(ticker, start=start_time, end=end_time
# Create file name
file_name = os.path.join(folder_name, f"{ticker}.csv")
# Save data to CSV file
data.to_csv(file_name)
print(f"Downloaded and saved data for {ticker}")
except Exception as e:
print(f"Error downloading data for {ticker}: {str(e)}")
# Download and save data for all stocks
for ticker in List_tickers:
download_and_save_stock_data(ticker)
print ("Completed downloading data for all stocks.
I,This script downloads daily data for each stock from January 1, 2020, to
January 1, 2023, providing us with three years of historical data to work with.
By carefully collecting our data, we ensure a solid foundation for the
subsequent steps in our pairs trading strategy development. This attention to
data quality is crucial for the success of our statistical arbitrage approach.
3. Pair Selection
In the realm of pairs trading, selecting the right pairs of stocks is crucial for
the strategy’s success. Our approach combines several statistical methods to
identify pairs with the highest potential for profitable mean-reversion
trading.
A. Introduction to cointegration and its importance
Cointegration is a fundamental concept in our pair selection process. Two
time series are considered cointegrated if they share a long-term
equilibrium relationship, even though they may deviate from this
equilibrium in the short term. For pairs trading, cointegration suggests that
the spread between two stocks tends to revert to a mean over time, providing
opportunities for trading.B. Implementing cointegration tests
We use the Engle-Granger two-step method to test for cointegration. This is
implemented in our analyze_pair function:
from statsmodels.tsa.stattools import coint
def analyze_pair(stockl, stock2):
# ... other code ...
# Check Cointegration
score, pvalue, _ = coint(stocki['Close'], stock2['Close'])
# other code ...
The coint function from statsmodels performs the cointegration test,
returning a p-value that we use to assess the strength of the cointegration
relationship.
C. Correlation analysis and its role
While cointegration captures long-term relationships, correlation helps us
understand the short-term co-movement of stock prices. We calculate the
correlation coefficient between potential pairs:def analyze_pair(stockl, stock2):
# ... other code
# Calculate Correlation
correlation = stocki['Close'] .corr(stock2['Close'])
4... other code
Highly correlated pairs might indicate a stronger relationship, but we also
consider that lower correlations could provide more trading opportunities.
D. Augmented Dickey-Fuller (ADF) test for stationarity
The ADF test helps us determine if the spread between two stocks is
stationary, which is crucial for mean-reversion strategies:
from statsmodels.tsa.stattools import adfuller
def analyze_pair(stocki, stock2):
# ... other code
# Augmented Dickey-Fuller Test on log spread
adf_result = adfuller(log_spread)
# ... other codeA stationary spread suggests that the difference between the two stocks
tends to revert to a mean over time, which is ideal for our strategy.
E. Ranking and selecting the best pairs
We analyze all possible pairs from our stock universe using these methods:
def analyze_all_pairs(tickers, data_folder='stock_data'):
results = []
for ticker1, ticker2 in combinations(tickers, 2):
try:
stockl = load_stock data(ticker1, data_folder)
stock? = load_stock data(ticker2, data_folder)
# Ensure both stocks have the same date range
common_dates = stocki. index. intersection (stock2. index)
stock1 = stock1.loc[common_dates]
stock2 = stock2.loc{common_dates]
result = analyze_pair(stockl, stock2)
results. append({
'pair': f'{ticker1}-{ticker2}',
‘cointegration_pvalue': result['cointegration_pvalue'],
‘correlation’: result['correlation'],
‘adf_pvalue': result['adf_pvalue'],
‘mean_zscore': result['z_score’].mean(),
‘std_zscore': result['z_score'] .std()
yn
except Exception as e:
print(f"Error analyzing pair {ticker1}-{ticker2}: {str(e)}")
return pd.DataFrame(results)We then filter the pairs based on cointegration and stationarity criteria:
def filter_suitable_pairs(df_results, cointegration_threshold=0.05, adf_threshol
return df_results[
(df_results['cointegration_pvalue'] < cointegration_threshold) &
(df_results['adf_pvalue'] < adf_threshold)
].sort_values('correlation', ascending=False)
Finally, we select the best pair:
def find_best_pair(tickers, data_folder='stock_data', cointegration_threshol
all_pairs = analyze_all_pairs(tickers, data_folder)
suitable_pairs = filter_suitable_pairs(all_pairs, cointegration_threshold, a
if suitable_pairs.empty:
return None, pd.DataFrame()
best_pair = suitable_pairs.iloc(o]
# ... additional code to calculate metrics for the best pair ...
This process allows us to identify the pairs that show the strongest statistical
evidence of mean-reverting behavior, providing a solid foundation for our
trading strategy.By applying these rigorous statistical tests, we can identify pairs of stocks
that are most likely to exhibit the mean-reverting behavior necessary for our
statistical arbitrage strategy. This careful pair selection process forms the
foundation for potentially profitable trading opportunities in the subsequent
steps of our project.
4. Pairs Trading Strategy Development
After selecting suitable pairs, the next crucial step is to develop our pairs
trading strategy. This section focuses on the implementation of the
PairTradingStrategy class, which forms the core of our trading logic.
A. Calculating key parameters: Kappa and Half-Life
Before implementing the strategy, we calculate two important parameters:
Kappa and Half-Life. These parameters help us understand the mean-
reversion characteristics of our selected pairs.‘import numpy as np
‘import pandas as pd
‘import matplotlib.pyplot as plt
‘import statsmodels.api as sm
def calculate_spread(stock1, stock2):
return np. log(stock1['Close']) - np.log(stock2['Close'])
def calculate_half_life (spread) :
spread_lag = spread. shift (1)
spread_lag.iloc[®] = spread_lag.iloc[1]
spread_ret = spread - spread_lag
spread_ret.iloc[9] = spread_ret.iloc[1]
spread_lag2 = sm.add_constant (spread_lag)
model = sm.OLS(spread_ret, spread_lag2)
res = model. fit()
halflife = -np.log(2) / res.params(1]
return halflife
def calculate_kappa_half_life(spread)
# Calculate Kappa
spread_lag = spread.shift(1)
delta_spread = spread - spread_lag
reg = np.polyfit(spread_lag.dropna(), delta_spread.dropna(), deg=1)
kappa = -reg[0]
# Calculate Half-Life
half_life = calculate_half_life(spread)
return kappa, half_lifeThese functions allow us to calculate the Kappa (mean reversion rate) and
Half-Life (time for the spread to revert halfway to its mean) for our selected
pairs.
Designing the PairTradingStrategy class
The PairTradingStrategy class is implemented using the Backtrader
framework. Here's the core structure of our strategy:
import backtrader as bt
‘import numpy as np
from collections import deque
class PairTradingStrategy (bt. Strategy) :
params = (
( lookback", 20),
Centry_threshold', 1.5), # Entry threshold in terms of standard deviat
('stoploss_factor', 2.0), # Stop-Loss threshold in terms of standard de
Cholding_time_factor', 1.5), # Factor to determine max holding time ba
Chalf_life', 14), # Estimated half-life of mean reversion in days
('stocki', None),
(’stock2", None),
init__(self):
# Initialize strategy components
self.stockl = self.getdatabyname(self.params.stock1)
self.stock2 = self.getdatabyname(self.params.stock2)
self.spread = []
self.mean = None
self.std = None
self.entry_price = None
self.entry date = None
self.max_holding_time = int(self.params.holding_time_factor * self.param# Initialize performance tracking
self.trades = []
self.equity_curve = [self.broker.getvalue()]
self.returns = deque(maxlen=252)
self.max_drawdown = 0
self.peak = self.broker.getvalue()
def next(self):
# Strategy logic implementation
# ... (details in the following sections)
def log_trade(self, action, days_held=None, reason=None) :
# Log trade details
# =... (implementation details)
def stop(self):
# Calculate final performance metrics
# =... (implementation details)
C. Entry and exit logic
The core trading logic is implemented in the next() method:def next(self):
# Calculate the spread
spread = np.log(self.stock1.close[®]) - np. log(self.stock2.close[0])
self. spread. append (spread)
# Wait until we have enough data
‘if len(self.spread) <= self.params. lookback:
return
# Calculate mean and standard deviation
self.mean = np.mean(self. spread{-self.params. lookback: ])
self.std = np.std(self.spread[-self.params. lookback: ])
# Calculate trading thresholds
buy_threshold = self.mean - self.params.entry threshold * self.std
sell_threshold = self.mean + self.params.entry_threshold * self.std
# Trading logic
if not self.position:
if spread < buy_threshold:
self. buy (data=self.stock1)
self. sell (data=self.stock2)
self.entry_price = spread
self.entry_date = len(self)
self. log_trade(" ENTRY LONG")
elif spread > sell_threshold:
self. sell (data=self.stock1)
self. buy (data-seLf.stock2)
self.entry_price = spread
self.entry date = len(self)
self.log_trade('ENTRY SHORT")
else:
# Exit logic
# ... Cimplementation details for exit conditions)
D. Risk management techniques
Our strategy incorporates several risk management techniques:1. Stop-loss: We exit the trade if the spread moves against us beyonda
certain threshold.
2. Maximum holding time: We limit the duration of each trade based on the
calculated half-life.
3. Position sizing: While not explicitly shown in this code snippet, proper
position sizing is crucial and should be implemented based on the
account size and risk tolerance.
# Inside the next() method, for existing positions
if abs(spread - self.entry_price) > self.params.stoploss_factor * self.std:
self.close(data=self.stock1)
self.close(data=self. stock2)
self.log_trade('STOP-LOSS', days_since_entry, reason="Stop-Loss Hit")
# Max holding time check
days_since_entry = len(self) - self.entry_date
if days_since_entry >= self.max_holding_time:
self.close(data=self.stock1)
self.close(data=self. stock2)
self. log_trade('EXIT', days_since_entry, reason="Max Holding Time")By implementing these components, we create a robust pairs trading
strategy that capitalizes on mean-reversion opportunities while managing
risk effectively. This strategy forms the core of our statistical arbitrage
system, setting the stage for backtesting and optimization in the subsequent
steps of our project.
5. Backtesting Engine
After developing our pairs trading strategy, the next crucial step is to test its
performance on historical data. This is where our backtesting engine comes
into play. Let’s explore how we implement and use this engine to evaluate
our strategy.
A. Introduction to Backtrader framework
Our backtesting engine is built using the Backtrader framework, which
provides a flexible and powerful platform for testing trading strategies.
While we don't explicitly discuss Backtrader in the fl file, it’s evident from
the code that we're using this framework.
B. Implementing the custom BacktestEngine class
The f1 file provides a custom sacktestengine class that wraps around
Backtrader's functionality. Here's the implementation:‘import backtrader as bt
‘import pandas as pd
‘import matplotlib.pyplot as plt
from tabulate import tabulate
class BacktestEngine:
def
def
def
def
def
__init__(self, strategy_class, dfi, df2, initial_capital=100000, commiss
self.cerebro = bt.Cerebro()
self.strategy_class = strategy_class
self.dfl = df1
self.df2 = df2
self.initial_capital = initial_capital
self.commission = commission
self.results = None
add_data(self) :
datal = bt. feeds.PandasData(dataname=self.df1, name="stocki"
data2 = bt. feeds. PandasData(dataname-self.df2, name="stock2"
self .cerebro.adddata(datal)
self .cerebro.adddata(data2)
set_strategy(self, **kwargs)
self .cerebro.addstrategy(self.strategy_class, *+kwargs)
set_broker(self, initial_capital=None, commission=None)
if imitial_capital is None:
initial_capital = self. initial_capital
if commission is None:
commission = self.commission
run(self):
self.add_data()
self.set_broker()
self.results = self.cerebro.run()This class encapsulates the process of setting up the backtesting
environment, adding data, setting the strategy and broker parameters, and
running the backtest.
C. Simulating trades on historical data
The run method of our Backtestengine class is responsible for simulating
trades on historical data. It uses the Backtrader engine to run the strategy on
the provided data:
def run(self):
self.add_data()
self.set_broker()
self.results = self.cerebro.run()
This method adds the data, sets up the broker, and then runs the backtest,
storing the results for later analysis.
D. Calculating performance metrics
After running the backtest, we calculate various performance metrics to
evaluate our strategy. The Backtesténgine class includes methods for this
purpose:def get_metrics(self) :
if not self.results
return {"Error": "No backtest results available. Please run the backtest
strat = self.results(0]
if not hasattr(strat, 'metrics')
return {"Error": "No metrics found in the strategy. Make sure to impleme
return strat.metrics
def print_metrics(self) :
metrics = self.get_metrics()
if isinstance(metrics, dict) and "Error" in metrics:
print (metrics["Error"])
return
table_data = [["Metric", "Value"]
for key, value in metrics. items():
‘if isinstance(value, float):
table_data.append([key, f"{value: .2f}"])
else:
table_data.append([key, value])
print("\nBacktest Results:")
print (tabulate(table_data, headers="Firstrow", tablefmt="grid")
These methods retrieve and print the performance metrics calculated during
the backtest.
Additionally, the engine provides methods for visualizing the results:def plot_equity_curve(self) :
if not self.results:
print("No results available. Make sure to run the backtest first
return
strat = self.results[0]
if not hasattr(strat, 'equity_curve'):
print("No equity curve data found in the strategy.")
return
# Convert equity curve to pandas Series
equity_curve = pd.Series(strat.equity_curve, index=self.dfl.index[:len(strat
# Calculate drawdown
drawdown = (equity_curve.cummax() - equity_curve) / equity_curve.cummax()
# Create figure with two subplots
fig, (axl, ax2) = plt.subplots(2, 1, figsize=(12, 10), gridspec_k
Fig.suptitle('Equity Curve and Drawdown', fontsize=16)
{'height_
# Plot equity curve
ax1.plot (equity_curve. index, equity_curve.values, label='Equity Curve’, colo
axl.set_title('Equity Curve’)
ax1.set_ylabel('Portfolio Value")
axl. legend()
axl. grid (True)
# Plot drawdown
ax2. fill_between (drawdown. index, drawdown.values, , alpha=
ax2.set_title('Drawdown')
ax2.set_ylabel (‘Drawdown’)
ax2.set_xlabel('Date')
ax2. legend ()
ax2.grid (True)
+3, color='red',# Format x-axis
plt.gcf() autofmt_xdate()
# Add strategy performance metrics as text
metrics = self.get_metrics()
metrics_text = (f"Total Return: {metrics['Total Return (%)']:.2f}%\n"
f"Sharpe Ratio: {metrics['Sharpe Ratio']:.2f}\n'
lax Drawdown: {metrics['Max Drawdown (%)']:.2}%\n"
f"Win Rate: {metrics['Win Rate (%)']:.2f}%")
Fig.text(@.02, 0.02, metrics_text, fontsize=10, va='bottom')
# Adjust Layout
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
# Show plot
plt.show()
This method creates a comprehensive visualization of the strategy’s
performance, including the equity curve and drawdown.
By implementing this backtesting engine, we can thoroughly evaluate our
pairs trading strategy on historical data, calculate key performance metrics,
and visualize the results. This forms a crucial step in our project, allowing us
to assess the strategy’s effectiveness before moving on to optimization and
real-world implementation.6. Bayesian Optimization
After implementing our backtesting engine, the next step is to optimize our
strategy’s parameters. We use Bayesian optimization for this purpose, which
is an efficient method for finding the optimal parameters of our trading
strategy.
A. The need for parameter optimization
Our pairs trading strategy involves several parameters that can significantly
affect its performance. These include the lookback period, entry threshold,
stop-loss factor, and others. Optimizing these parameters can potentially
improve the strategy’s performance.
B. Introduction to Bayesian optimization
Bayesian optimization is a powerful technique for optimizing black-box
functions, particularly useful when the function is expensive to evaluate —
as is the case with our backtesting process. It uses probabilistic models to
guide the search for optimal parameters.
C. Defining the parameter space
In our implementation, we define the parameter space for our optimization
process. Here’s how we set it up:from skopt import gp_minimize
from skopt.space import Real, Integer
from skopt.utils import use_named_args
def bayesian_optimization(dfl, df2, param_ranges, n_calls=50, initial_capital=1e
space = [
Integer (param_ranges[' lookback" ] [0] .min(), param_ranges[' lookback" [0].
Real (param_ranges['entry_threshold'][0].min(), param_ranges['entry_thres
Real (param_ranges['stoploss_factor'][0].min(), param_ranges['stoploss_fa
Real (param_ranges['holding_time_factor'][].min(), param_ranges['holding
1
fixed_half_life = param_ranges['half_life'] [0]
# ... rest of the function
This code defines the search space for our parameters, including lookback
period, entry threshold, stop-loss factor, and holding time factor.
D. Creating the objective function
The objective function is what we're trying to optimize. In our case, it’s based
on the Sharpe Ratio of our strategy:def objective(params, df1, df2, initial_capital=100000, commission=0.001):
lookback, entry_threshold, stoploss_factor, holding time_factor, half_life
engine = Backtestngine(
strategy_class=PairTradingstrategy,
dfi-dfi,
df2=df2,
jnitial_capital=initial_capital
)
engine. set_strategy(
‘ookback=int (Lookback) ,
entry_threshold=entry_threshold,
stoploss_factor=stoploss_factor,
holding_time_factor=holding_time_factor ,
half_life=half_life,
tock",
tock2"
)
engine. set_broker (conmission=commission, initial_capital=initial_capital)
engine. run()
metrics = engine.get_metrics()
# We want to maximize Sharpe Ratio, so we return its negative
return -metrics.get('Sharpe Ratio", 0)
This function runs a backtest with given parameters and returns the negative
Sharpe Ratio (we negate it because the optimization algorithm minimizes the
objective function, but we want to maximize the Sharpe Ratio).E. Running the optimization process
Finally, we run the Bayesian optimization process:
@use_named_args(space)
def objective_wrapper(**params) :
params_list = List(params.values()) + [fixed_half_life]
return objective(params_list, dfl, df2, initial_capital, commission)
def callback(res):
n = len(res.x_iters)
print(f"Optimization progress: {n / n_calls * 100:.2f}%")
result = gp_minimize(objective_wrapper, space, n_calls-n_calls, random_state=42,
best_params = {
"Lookback': int(result.x[0]),
‘entry_threshold': result.x(11,
‘stoploss_factor': result.x(2],
‘holding time_factor': result.x[3],
‘half_life': fixed_half_life
}
return best_params, -result.fun # Return best parameters and best Sharpe Ratio
This code runs the optimization process, printing progress updates, and
returns the best parameters found along with the corresponding Sharpe
Ratio.By implementing Bayesian optimization, we can efficiently search the
parameter space to find the configuration that yields the best performance
for our pairs trading strategy. This optimized strategy can then be further
tested and potentially deployed in live trading scenarios.
7. Results Analysis
After implementing our pairs trading strategy, backtesting it, and optimizing
its parameters, the final crucial step is to analyze the results. This analysis
helps us understand the performance of our strategy and make informed
decisions about its potential real-world application.
A. Visualizing the equity curve
One of the key visualizations provided by our BacktestEngine is the equity
curve. This gives us a clear picture of how our portfolio value changes over
time. The plot_equity_curve method in our BacktestEngine class generates
this visualization:def plot_equity_curve(self):
if not self.results:
print("No results available. Make sure to run the backtest first.")
return
strat = self.results[0]
if not hasattr(strat, 'equity_curve'):
print("No equity curve data found in the strategy
return
# Convert equity curve to pandas Series
equity_curve = pd.Series(strat.equity_curve, index=self.dfl.index[:len(strat
# Calculate drawdown
drawdown = (equity_curve.cummax() - equity_curve) / equity_curve.cummax()
# Create figure with two subplots
fig, (axl, ax2) = plt.subplots(2, 1, figsize=(12, 10), gridspec_kw={'height_
fig.suptitle('Equity Curve and Drawdown', fontsize=16)
# Plot equity curve
ax1.plot (equity_curve. index, equity_curve.values, label='Equity Curve', colo
ax1.set_title(' Equity Curve")
ax1.set_ylabel('Portfolio Value’)
ax. legend()
ax1.grid(True)
# Plot drawdown
ax2.fill_between(drawdown.index, drawdown.values, 0, alpha=0.3, color="red’,
ax2.set_title( Drawdown" )
ax2.set_ylabel( ‘Dravdown' )
ax2.set_xlabel('Date')
ax2.legend()
ax2.grid(True)
# Format x-axis
plt.gcf() -autofmt_xdate()# Add strategy performance metrics as text
metrics = self.get_metrics()
metrics_text = (f"Total Return: {metrics['Total Return (%)']:.2f}%\n"
f"Sharpe Ratio: {metrics['Sharpe Ratio']:.2f}\n"
"Max Drawdown: {metrics['Max Drawdown (%)']:.2}%\n"
f"Win Rate: {metrics['Win Rate (%)']:.2F}%")
fig.text(0.02, 0.02, metrics_text, fontsize=10, va='bottom!')
# Adjust Layout
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
# Show plot
plt.show()
This method creates a comprehensive visualization that includes both the
equity curve and the drawdown over time, providing a clear picture of the
strategy’s performance.
B. Analyzing the trade table
To get a detailed view of individual trades, we can generate a trade table. The
print_trade_table method in our BacktestEngine class provides this
functionality:def print_trade_table(self, num_trades=None) :
‘if not self.results:
print("'No results available. Make sure to run the backtest first.")
return
strat = self.results[0]
‘if not hasattr(strat, 'trades'
print("No trade information found in the strategy. Make sure to log trad
return
trades = strat. trades
‘if num_trades is not None:
trades = trades[:num_trades]
trade_data = []
for trade in trades:
trade_data.append([
trade.get('entry_date', 'N/A'),
trade.get('exit_date', 'N/A'),
trade.get('days_held', 'N/A'),
f"{trade.get('pnl', 'N/A'):.2f}"
f"{trade.get(*pnl_pct', 'N/A'):.2f}%",
f"{trade.get('entry_pricel', 'N/A'):.2f}",
f"{trade.get(‘entry_price2", 'N/A'):.2F}",
f"{trade.get(‘exit_pricel', 'N/A'):.2f}",
f"{trade.get(‘exit_price2', 'N/A'):.2f}",
trade.get('exit_type', 'N/A'),
trade.get('exit_reason', 'N/A')
ap]
headers = ["Entry Date", "Exit Date", "Days Held", "PL", "Pnl 5%",
Entry Price 1", “Entry Price 2", “Exit Price 1", "Exit Price 2",
print("\nTrade History:")
print(tabulate(trade_data, headers=headers, tablefmt="grid"))This method provides a detailed view of each trade, including entry and exit
dates, profit/loss, and reasons for exit.
C. Key performance metrics discussion
Our BacktestEngine calculates and stores various performance metrics. We
can access these metrics using the get_metrics method:
def get_metrics(self):
if not self.result:
return {"Error": "No backtest results available. Please run the backtest
strat = self.results[0]
if not hasattr(strat, 'metrics'):
return {"Error": "No metrics found in the strategy. Make sure to impleme
return strat.metrics
The specific metrics calculated are defined in the stop method of our
PairTradingStrategy:def stop(self):
# Calculate final metrics
self.total_trades = len(self. trades)
self.winning_trades = sum(1 for trade in self.trades if trade['pnl'] > 0)
self.losing_trades = sum(1 for trade in self.trades if trade['pnl'] <= 0)
self.win_rate = self.winning_trades / self.total_trades if self.total_trades
self.total_return = (self.equity_curve[-1] - self.equity_curve[0]) / self.eq
self.mean_return = np.mean(self.returns) if self.returns else 0
self.std_return = np.std(self.returns) if self.returns else 0
self.sharpe_ratio = np.sqrt(252) * self.mean_return / self.std_return if sel
# Store all metrics in a dictionary for easy access
self.metrics = {
‘Initial Capital’: self.equity_curve[o],
‘Final Portfolio Value!: self.equity_curve[-1],
‘Total Return (%)': self.total_return * 100,
‘Sharpe Ratio': self.sharpe_ratio,
"Max Drawdown (%)': self.max_drawdown * 100,
‘Total Trades': self.total_trades,
‘Winning Trades': self.winning_trades,
‘Losing Trades': self.losing_trades,
‘Win Rate (%)': self.win_rate * 100,
‘Mean Daily Return (%)': self.mean_return * 100,
‘Std Dev of Daily Return (%)': self.std_return * 100
These metrics provide a comprehensive view of the strategy’s performance,
including profitability, risk-adjusted returns, and trading statistics.D. Comparing strategy performance to benchmarks
While not explicitly implemented in the provided code, it’s important to
compare our strategy’s performance to relevant benchmarks, such as the
overall market performance or a simple buy-and-hold strategy on the
individual stocks. This comparison helps us understand if our strategy is
truly adding value beyond what could be achieved with simpler, less
sophisticated approaches.
By thoroughly analyzing these results, we can gain valuable insights into the
effectiveness of our pairs trading strategy, its strengths and weaknesses, and
potential areas for further improvement.
JUHU IIIS IIIS IOI IIIS IDOI IDOI ABI
Finding the best pair for pair trading...
Dette RRR EHK LOOMS REAAAAARAAAELARAAARL] 1 Of 1 completed
DORR LOOSS kA akiKRAkALEA] 1 of 1 completedCompleted pa
/var/folders/3h/k_7p78896dn6j jq56w5lrq6mo0eegn/T/ipykernel_1710/1552435527 .py:21
halflife = -np.log(2) / res.params[1]
XDI SDDS III IDO IOI II IIIS III
Stock Pair for Pair Trading:
Stock 1: SFT
Stock 2: ACN
Kappa: 9.06
Half-Life: 12.25Best parameters found:
lookback: 30
entry_threshold: 2.0093080852402427
stoploss_factor: 3.0000000000000013
holding_time_factor: 1.9523141951662566
half_life: 12.249172391275549
Best Sharpe Ratio: 2.0846017856569983
JERI ISI OI IO IO III III I IOI II ISO III III III
Final Backtest Results:
Backtest Results:
é
| Metric | vatue |
bs
Initial Capital | 19000 |
Final Portfolio Value | 1216.8 |
Total Return (%) | 2.17 |
Sharpe Ratio | 2.08 |
Max Drawdown (%) | 0.28 |
Total Trades lees 29 easy
Winning Trades ee2oneeeT
Losing Trades [eee Oreee|
Win Rate (%) Teeete0;c cI
Mean Daily Return (%) | oe
Std Dev of Daily Return (%) | 0.03 |
$= ta toto totic totrcici-i—Equity Curve and Drawdown
Equity Curve
10200
10150
10100
Portfolio Value
10050
10000
Drawdown
Ts Drawdow?
Faas |
g
Foo
0.000 4 1
ry > . ° P >
nrainensitizn oo we gt a rl
Sharpe Ratio: 2.08
Max Drawdown: 0.28%
Win Rate: 100.00%8. Future Improvements
While our current implementation of the StatArb Pairs trading strategy
provides a solid foundation, there are several areas where we could
potentially enhance and expand the system. Although the fl file doesn't
explicitly outline future improvements, we can infer some potential
enhancements based on the current implementation and common practices
in quantitative trading.
A. Expanding the stock universe
Our current implementation uses a predefined list of 50 stock tickers:
List_tickers = ['AAPL', 'MSFT', 'GOOGL', ‘AMZN’, 'META', 'TSLA', 'BRK-A', 'V", '
A potential improvement would be to expand this universe to include a
larger set of stocks or even other asset classes. This could involve:
1, Automating the process of stock selection based on certain criteria (e.g.,
market cap, sector, liquidity).
2. Implementing a dynamic stock universe that updates periodically based
on market conditions.3. Extending the strategy to other asset classes like ETFs, futures, or forex
pairs.
B. Incorporating fundamental data
Our current strategy relies primarily on price data. Incorporating
fundamental data could potentially improve the pair selection process and
overall strategy performance. This might include:
1. Using financial ratios to identify similar companies for potential pairs.
2. Incorporating earnings data or other financial metrics into the trading
signals.
3. Considering macroeconomic indicators that might affect the relationship
between pairs.
C. Exploring machine learning for pair selection
While our current pair selection process uses statistical methods like
cointegration and correlation, machine learning techniques could
potentially enhance this process. Some possibilities include:
1. Using clustering algorithms to identify groups of similar stocks.
2. Implementing a classification model to predict which pairs are likely to
perform well.3, Applying reinforcement learning techniques to dynamically adjust the
strategy parameters.
D. Enhancing risk management
Although our current implementation includes some risk management
techniques, this area could be further improved:
# Current risk management in the strategy
if abs(spread - self.entry_price) > self.params.stoploss_factor * self.std:
self.close(data=self.stockl)
self. close(data=self.stock2)
self. log_trade('STOP-LOSS', days_since_entry, reason="Stop-Loss Hit")
Potential enhancements could include:
1. Implementing more sophisticated stop-loss mechanisms.
2. Adding position sizing based on volatility or other risk metrics.
3. Incorporating a risk parity approach for portfolio construction when
trading multiple pairs.E. Improving execution modeling
Our current backtesting assumes perfect execution, which is not realistic in
live trading. Improvements in this area could include:
1. Modeling slippage and transaction costs more accurately.
2. Implementing limit orders instead of market orders.
3. Considering liquidity constraints and their impact on trade execution.
F. Expanding performance analysis
While our current analysis provides good insights, we could enhance it
further:
# Current performance metrics
self.metrics = {
"Initial Capital': self.equity_curve[6],
‘Final Portfolio Value': self.equity_curve[-1],
‘Total Return (%)': self.total_return * 100,
"Sharpe Ratio': self.sharpe_ratio,
"Max Drawdown (%)': self.max_drawdown * 100,
‘Total Trades': self.total_trades,
‘Winning Trades": self.winning_trades,
‘Losing Trades': self. losing_trades,
‘Win Rate (%)': self.win_rate * 100,
‘Mean Daily Return (%)': self.mean_return * 100,
"Std Dev of Daily Return (%)': self.std_return * 100Potential improvements include:
1. Implementing more advanced risk-adjusted return metrics (e.g., Sortino
ratio, Calmar ratio).
2. Conducting more detailed drawdown analysis.
3. Performing attribution analysis to understand sources of returns.
By implementing these improvements, we could potentially enhance the
robustness, performance, and applicability of our StatArb Pairs trading
strategy.
9. Conclusion
As we reach the end of our exploration of the StatArb Pairs trading system,
it’s important to reflect on what we’ve accomplished and the potential impact
of this project.
A. Recap of the project’s achievements
Throughout this project, we've successfully developed a comprehensive
pairs trading strategy using statistical arbitrage techniques and Bayesian
optimization. Our key achievements include:1. Data Collection and Preprocessing: We implemented a robust system to
download and preprocess historical stock data for a diverse set of 50
major companies:
List_tickers = ["AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META', 'TSLA', 'BRK-A', 'V',
. Pair Selection: We developed a sophisticated pair selection process using
statistical methods such as cointegration tests, correlation analysis, and
the Augmented Dickey-Fuller test.
N
. Strategy Development: We created a pairtradingstrategy class that
implements the core logic of our trading strategy, including entry and
exit rules, and risk management techniques.
»
. Backtesting Engine: We built a custom BacktestEngine class that allows
us to rigorously test our strategy on historical data and calculate
comprehensive performance metrics.
»
. Bayesian Optimization: We implemented a Bayesian optimization
process to fine-tune our strategy parameters, potentially improving its
performance.
wa
. Results Analysis: We developed tools to visualize and analyze our
strategy’s performance, including equity curves, drawdown charts, andB. The potential of statistical arbitrage in modern markets
Our project demonstrates the continued relevance and potential of statistical
arbitrage strategies in today’s markets. By leveraging advanced statistical
techniques and machine learning optimization, we’ve shown how it’s
possible to identify and exploit subtle market inefficiencies.
The pairs trading approach we've developed offers several advantages:
1. Market neutrality, which can provide returns uncorrelated with overall
market movements.
2. Risk management through the balanced long-short structure of trades.
3. Adaptability to various market conditions through parameter
optimization.
C. Final thoughts on the future of quantitative trading strategies
As we look to the future, it’s clear that quantitative trading strategies like the
one we've developed will continue to play a crucial role in financial markets.
However, as markets become increasingly efficient and competitive, the
importance of continuous innovation and refinement cannot be overstated.
Some key areas for future development include:1. Incorporating alternative data sources to gain unique insights.
2. Leveraging advanced machine learning techniques for improved pattern
recognition and prediction.
3. Expanding to other asset classes and global markets.
4, Enhancing execution algorithms to minimize market impact and
transaction costs.
In conclusion, our StatArb Pairs project serves as a solid foundation for pairs
trading strategy development. It demonstrates the power of combining
statistical analysis, algorithmic trading, and machine learning optimization.
As the financial landscape continues to evolve, strategies like this —
continually refined and adapted — will be crucial tools for investors seeking
to navigate the complexities of modern markets.