Professional Documents
Culture Documents
Deep Limit Order Book Trading - Half-A-Second, Please! - 1647041664887001aTDL
Deep Limit Order Book Trading - Half-A-Second, Please! - 1647041664887001aTDL
Jie Yin
1 / 24
Table of Contents
2 Data description
Limit order book
Chinese A-share market
5 Conclusions
2 / 24
Table of Content
2 Data description
5 Conclusions
3 / 24
Introduction to trading
Trading types:
Long-term vs. Short-term
Interday trading vs. Intraday trading
Manual trading vs. Algorithmic trading
A study in 2019 showed that around 92% of trading in the Forex market was performed by trading
algorithms rather than humans.
Low-frequency financial data vs. High-frequency financial data
LF: daily open, close, high and low.
HF: limit order book, message book.
High frequency observations across one day of a liquid market can equal the amount of daily data
collected in 30 years.
We consider the intraday algorithmic high-frequency trading by limit order book data.
4 / 24
Modeling LOB before trading
Our objective:
To develop a novel deep LOB trading system that takes advantage of tick-time intervals with deep
learning and GPU techniques, predicts direct trading signals and executes orders to gain profits under
real trading circumstances.
5 / 24
Table of Content
2 Data description
5 Conclusions
6 / 24
Limit order book (LOB)
In order-driven markets, traders place, execute, and cancel limit orders or market orders on ask
side or bid side.
The LOB provides a granular view of market data, listing all quotes at each price level.
Figure 1: A limit order book at time t and t + 1. The first level of the ask side is changed because of the cancellation of limit orders
or the arrival of market orders.
7 / 24
Data summary
Characteristics of data:
Massive ⇒ GPUs
Shape: (T*S*4800, 40), where T is the number of historical days and S is the number of stocks.
Size: ≈ 5GB of LOB data for around 4000 stocks in the Chinese A-share markets per day.
Multivariate ⇒ Overfitting caution
Noisy ⇒ Model design to extract useful information
Sequential (time series) ⇒ Rolling window method
Labeled artificially ⇒ A multi-classification problem.
Three types of labels (trading signals): long, short, and none.
Trading action long : buy before selling. # upward trend
Trading action short: sell before buying. # downward trend
Imbalanced ⇒ Loss function
Profitable trading signals are few.
Transaction fees are nontrivial.
Spread costs by taking market orders which ensure the execution.
Time limit of one trading signal.
8 / 24
Real data
9 / 24
Datasets
Simulation datasets: Three hypothetical market sentiments. (Uptrend, downtrend, and flat.)
Benchmark dataset: A small set of 20 stocks from the Chinese stock market and cover four
consecutive months provided by Huang et al. (2021).
Proprietary CS-100 dataset: LOB data covering 100 stocks selected by ranking the number of
signals combined with the in-sample performance from the entire Chinese stock pool covering around
4,000 stocks. This proprietary dataset is provided by the Fintech company TradeMaster1 .
Number of stocks (SZ/SH) Ratio of long/short signals Mean (std) of mid-prices
Group 1 25 (14/11) 0.26/0.25 70.76 (69.21)
Group 2 25 (13/12) 0.22/0.21 53.44 (95.03)
Group 3 25 (20/5) 0.18/0.17 47.47 (103.82)
Group 4 25 (21/4) 0.12/0.13 11.25 (12.36)
Table 2: Summary statistics. “SZ/SH” shows the number of stocks from the Shenzhen Stock Exchange and Shanghai Stock Exchange.
1
https://www.trademastertech.com
10 / 24
Learning from data
CNN CNN
=⇒ Cat! =⇒ Not cat!
DCNN
=⇒ Long signal!
11 / 24
Table of Content
2 Data description
5 Conclusions
12 / 24
System - Training, predicting, and trading
Figure 2: Framework of our deep LOB trading system for selected stocks.
13 / 24
More details in training
2. Imbalanced classification.
× Categorical cross-entropy (CE) loss function.
✓ Focal loss (FL) function. (Lin et al., 2017)
FL: Use hyperparameters to adjust the class weights. More signals are detected and profits are
improved.
14 / 24
Risk control in trading
Risk measure
Based on the estimated in-sample Value at Risk (VaR) of signal return, we cut the loss when the
current return is less than VaR
d α during the out-of-sample trading. It also helps to measure the
level of risk exposure in sample.
Optimization
Instead of investing the same amount of money all the time, we decide the investment based on
the current signal strength and in-sample performance.
1. Use the information like the estimated probability of a signal (p̂tlong , p̂tshort ) by the DCNN
model, the estimated returns for successful/failed actions.
Generally, p̂tlong ↑, p̂tshort ↓, current signal strength of long ↑, the “bet size” ↑.
2. In-sample estimated VaR d long
α ↑, the “bet size” ↑.
(*The explicit solution to optimal investment is omitted here.)
15 / 24
Hardware
The training with very large LOB datasets is compute-intensive and the GPU platform is necessary
for acceleration in deep learning. CPU is not workable.
When training with a much larger stock group, the model can learn more universal features and
behave more stable. However, it will have higher requirements for hardware.
Apply parallel computing with multiple GPUs to speed up the process.
The training time is also related the version of GPUs, like NVIDIA K80, P100, V100 or A100
Tensor Core GPUs.
16 / 24
Real-time tick-by-tick trading
The calculation is within 0.03 seconds. Time for additional order-related operations such as order
submission and execution in exchanges. (0.2-0.3 seconds)
A gap of longer than or close to 0.5 seconds is adequate to implement our system.
We are confident in implementing this system and obtaining considerable returns from eligible
markets, like the Chinese A-share market.
17 / 24
Table of Content
2 Data description
5 Conclusions
18 / 24
Profits from the proprietary CS-100 dataset
The testing period is four weeks and is achieved using the rolling window method.
The transaction fee (10 basis points) is deducted.
Accumulated daily average profit (left), profit per signal per day (right) and average profit per signal
(next page) are all very promising, especially for stocks in group 1. The amount of invested equity
is at most 1 unit for every signal.
If we trade only one signal from the first stock group every day continuously, the annual return can
reach approximately 25%.
non-op-CE
0.0020 non-op
op
0.0015
0.0005
0.0000
−0.0005
−0.0010
19 / 24
Profits from the proprietary CS-100 dataset
Group 1 Group 2
non-op-CE non-op op adjusted op non-op-CE non-op op adjusted op
Avg Profit 6.00 6.43 8.74 9.64 0.91 1.92 4.69 5.34
Std 0.0046 0.0045 0.0042 0.0043 0.0040 0.0041 0.0038 0.0040
Avg Qty 90.75 93.52 93.52 84.83 75.86 73.94 73.94 64.97
Std 49.10 51.68 51.68 44.04 74.31 68.88 68.88 54.48
0.15 - - 0.0008 - -
p-value
- 0.0 - - 0.0 -
Group 3 Group 4
non-op-CE non-op op adjusted op non-op-CE non-op op adjusted op
Avg Profit 1.97 2.70 4.98 5.27 3.02 4.76 5.43 5.45
Std 0.0053 0.0054 0.0050 0.0052 0.0064 0.0060 0.0056 0.0056
Avg Qty 47.87 49.75 49.75 47.03 22.53 31.80 31.80 31.73
Std 39.05 38.64 38.64 35.90 12.13 15.01 15.01 14.99
0.13 - - 0.02 - -
p-value
- 2.08e-171 - - 5.08e-15 -
Table 5: Statistical descriptions of the average profit per signal. “Avg qty” is the average number of signals per stock per day with the standard
deviation (“std”). The p-value is derived from the t-test on the average profit between two settings.
20 / 24
Table of Content
2 Data description
5 Conclusions
21 / 24
Conclusions
22 / 24
Reference
Huang, C., Ge, W., Chou, H., & Du, X. (2021). Benchmark dataset for short-term market prediction of limit
order book in china markets. The Journal of Financial Data Science.
Kercheval, A. N. & Zhang, Y. (2015). Modelling high-frequency limit order book dynamics with support vector
machines. Quantitative Finance, 15 (8), 1315–1329.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In
Proceedings of the IEEE international conference on computer vision, (pp. 2980–2988).
Ntakaris, A., Magris, M., Kanniainen, J., Gabbouj, M., & Iosifidis, A. (2018). Benchmark dataset for mid-price
forecasting of limit order book data with machine learning methods. Journal of Forecasting, 37 (8), 852–866.
Sirignano, J. & Cont, R. (2019). Universal features of price formation in financial markets: perspectives from
deep learning. Quantitative Finance, 19 (9), 1449–1459.
Tsantekidis, A., Passalis, N., Tefas, A., Kanniainen, J., Gabbouj, M., & Iosifidis, A. (2017). Forecasting stock
prices from the limit order book using convolutional neural networks. In 2017 IEEE 19th conference on
business informatics (CBI), volume 1, (pp. 7–12). IEEE.
Zhang, Z., Zohren, S., & Roberts, S. (2019). Deeplob: Deep convolutional neural networks for limit order
books. IEEE Transactions on Signal Processing, 67 (11), 3001–3012.
23 / 24
Deep LOB Trading: Half a second please!
Thank you!
24 / 24