Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Proceedings of the URECA@NTU 2023-24

Generative Adversarial Networks for Time-Series Simulations


under Continuous Conditions
Horstann Ho Rui Yao Assoc Prof. Patrick Pun Chi Seng
College of Computing and Data Science School of Physical and Mathematical
Sciences

Abstract – Computer-generated time series 1 INTRODUCTION


simulations have been heavily used in banks and Today, machine learning techniques have
hedge funds for risk management, pricing, volatility proliferated throughout the banking and finance
trading, hedging and more. Traditionally, these industry in applications such as investment
simulations use Monte-Carlo algorithms, which insights, alpha generation, risk management,
usually fail to capture fat tails or unsymmetric
derivative pricing and more. This is an exciting
distributions. Moreover, many of these simulations
area with ongoing research by institutions and
require binding model-based assumptions about
how market prices move, which often may not be asset managers who strive to capture these
reflected in real-life. nascent opportunities to stay ahead in the financial
markets.
To address this shortcoming, we propose the use
of generative deep learning frameworks, in This paper aims to explore innovations within the
particular, Generative Adversarial Networks sub-area of Monte-Carlo simulations for time-
(GANs) given their strong capability in simulating series data, using a novel variation of Generative
synthetic data (images). However, there are 2 Adversarial Networks (GANs). Monte-Carlo
main challenges: simulations are extremely useful in industry, as
they can be used to interpret the distributions and
• Recent research on time-series GANs
other statistical properties of returns or price paths.
rarely focus on conditional generation,
These are extremely critical in modelling
which could render such models useless
sensitivity, pricing complex exotic options,
in an industry setting where the
forecasting, as well as analysing quantiles for
incorporation of context is crucial
methods such as Value-at-Risk (VaR) and
• Conditional GANs have traditionally only expected shortfall.
used categorical conditions, while
continuous conditions are much less GAN models have been noted for their amazing
straightforward to sample and train on. ability of generating realistic and high-quality
However, many financial indicators/metrics images. The usual GAN architecture involves a
are often continuous in nature. generator model that aims to generate images
indistinguishable from real images, and a
To this end, we propose the CC-TTS-GAN discriminator (or critic) model that aims to
(Continuous Conditional Transformer-Based Time- distinguish between real and generated images.
Series GANs). CC-TTS-GAN possess the ability to Both models are usually constructed using deep
incorporate some prior condition (such as implied neural networks. Such an adversarial dynamic
volatility or other market indicators) which are allows us to train both the generator and
assumed to be continuous in nature, to inform their discriminator model to be experts in their
own simulations in a model-free approach. We respective objectives in an unsupervised setting.
then compare our model’s simulations to In the past, GAN applications have been focused
traditional benchmark models such as Geometric on image generation and text generation. Only
Brownian Motion (GBM), Constant Elasticity of recently we see more publications on how this
Variance (CEV) and Heston, based on evaluation framework could instead be used to generate time-
metrics such as Jensen-Shannon (JS) divergence, series simulations.
FID score, RMSE and Dynamic Time Warping
However, we note that many existing GANs for
(DTW) distance.
time-series simulations are unconditional, which
prevents the incorporation of conditions (prior to
Source code: https://github.com/Horstann/CC-
simulations) that are often crucial in capturing
TTS-GANs/tree/predictive
context the financial markets. These conditions
could be important input variables such as market
Keywords – Machine Learning, Deep Learning,
indicators (eg. RSI, market sentiments) and option
Generative Adversarial Networks, Transformer,
chains data (eg. volatility, option prices, put-call
Attention, Time-Series, Monte-Carlo, Simulations,
ratios), that may heavily influence the distribution
Brownian Motion, Heston, Quantitative Finance,
of time series simulations. Besides, traditional
Generative AI
image GANs trained on conditional generation
Proceedings of the URECA@NTU 2023-24

often only use categorical conditions (eg. “cat” and to add value beyond what existing models are
“dog”). This poses a problem if our conditions are capable of, and can be extended across different
continuous in nature, such as our aforementioned areas of research or applications within
financial variables. quantitative finance.
Hence, through this paper we hope to contribute
valuable insights within this area of research, by 2 DATA COLLECTION
developing a GAN model that is capable of
modelling the distribution of time-series In our paper, we will experiment with simulations
simulations given some continuous condition(s) – of the S&P500 ETF end-of-day price paths, as
a continuous, conditional GAN for financial time data available for the ETF and its options are most
series. We believe this capability will lead to more freely available. It will also be a good starting point
accurate forward-looking distributions of time- before we extend our model to other ETFs, single-
series data, from which users can extract more stocks, or even other asset classes. We were able
meaningful and valuable insights in an industry to retrieve S&P500 prices from Jan 2005 to
setting. present day easily using the Yahoo Finance API.
As for conditions, we considered variables such as
1.1 RELATED WORK options implied volatility, options put-call ratio and
RSI (relative strength index) of S&P500 prices.
There have been many experiments in leveraging Implied volatility data was extracted from the
the powerful capabilities of the traditional GAN Bloomberg LIVE (Listed Implied Volatility Engine),
framework. Each offering different benefits and
that provided the daily implied volatility of listed
insights with respect to this paper’s goal.
options for all global asset classes back until Jan
(Li et al., 2022) showcased how GANs can be very 2005, given days to expiry and moneyness. We
well adapted to time-series data and produce extracted the implied volatility of S&P500 at-the-
realistic simulations when coupled with the money options with days to expiry of 30 and 60,
transformer mechanism. However, generations using the BQL command below on MS Excel:
from this model were unconditional so users could
not incorporate any information prior to simulation. =@BQL("SPX
Moreover, experiments were mainly performed on Index","dropna(IMPLIED_VOLATILITY(PCT_MON
non-financial time series such as random EYNESS=100,
sinusoidal waves, human heartbeat signals and EXPIRY=30D))","dates=range(2005-01-01, 2024-
other human activity recognition data. Much of 05-10)")
these datasets were stationary and non-trending,
unlike financial price paths. Similarly, we were also able to extract S&P500
put-call ratios using the BDH command below:
On the other hand, (Ding et al., 2023) was the first
to introduce a GAN with a new vicinal loss function =@BDH("SPX Index",
that enabled the incorporation of continuous "PUT_CALL_VOLUME_RATIO_CUR_DAY",
conditions to generate images, such as objects at "20050101", "20240510")
specified angles and human faces at specified
The end-of-day RSI was easily computed directly
ages. This was possible by sampling training
from the S&P500 price paths.
examples within the vicinity of a target condition.
These results were promising and opened up Lastly, we also require end-of-day market prices of
further possibilities for incorporating conditions in S&P500 options to calibrate some of the Monte-
time series generation. However, in this setup, we Carlo benchmark models, namely CEV and
must assume that our conditions are relatively Heston. A brilliant source of free historical end-of-
uniformly distributed and we have sufficient day options chain data going back until Jan 2005
number of samples. This can be true for object can be found on OptionsDx at
angles and human ages, but not so for financial https://www.optionsdx.com/. Once retrieved, for
market variables. each day, we filtered for options with expiry of
Another interesting work by (Milena et al., 2023) within 30 and 90 days, then computed the mean
proposed implementing GANs focused on moneyness level (strike price / underlying price)
generating financial time-series, but under a semi- and average market price for all put and call
supervised setting. The generator model was options. With mean strike price, underlying price,
trained on a supervised economics-based loss mean days to expiry and option market price, we
function with terms that measure PnL, MSE and are now able to calibrate parameters for our
Sharpe Ratio. Monte-Carlo benchmarks.
Our paper attempts to draw upon ideas from these
various pieces to construct a GAN model capable
of achieving our aforementioned objective. We
hope to show that such a model has the potential
Proceedings of the URECA@NTU 2023-24

3 METHODOLOGY Min p-
Causality 0.0736 0.0197 0.0004
value
The objective of our experiment is to produce on close
forward-looking distributions of daily S&P500 prices Mean p-
0.3517 0.3921 0.0489
closing price paths within 2-month windows. value
Our time series from Jan 2005 to present day will
have a train-test split of 90-10. The training period Table 1: P-Value Statistics from Granger-Causality
is from Jan 2005 to Apr 2022 and the testing Tests
period is from Apr 2022 to May 2024. The training
size of 90% is to allow for a larger sample size to Table 1 summarises granger-causality test results
draw training examples from, and also to include of our standardized variables when applied on log-
as much of the volatile Covid-19 pandemic period returns and close prices. We see that the p-values
in training. for log-returns are substantially lower than close
prices, which proves our point that log-returns are
The model will be trained on the whole training set easier to model. As we observe relatively low
for multiple epochs, and then evaluated on the mean p-values for these 3 variables, this indicates
whole test set at once. As we do not use a rolling they have some predictive potential over log-
model that is retrained at each timestep, this returns. Hence, we will proceed to use them in our
actually renders the problem more challenging. conditions embeddings. Though note that we
excluded put-call ratio from our condition variables
as it was too noisy and would unlikely be a
3.1 PREPROCESSING AND FEATURE meaningful signal for an output window of 2
ENGINEERING months.
Before constructing the model, we first perform a
few necessary data preprocessing steps to ensure
our data can be fitted by the model. 3.2 MODEL ARCHITECTURE
Firstly, we get the log-returns of the S&P500 price The CC-TTS-GAN’s model architecture is
paths. Log-returns are more stationary in nature composed of the usual generator and
and thus easier to model, and this could also allow discriminator. The generator takes in a noise
the model to better capture volatility clusters. Most vector and condition embedding to generate a 2-
importantly, as we will cover later, the GAN model
month log-returns series. This is converted back to
will perform layer normalisation at each network
a price path that has an initial value of 1.0, which
layer and so its output will almost always be
in turn is fed alongside the same condition
stationary. After applying a Dickey-Fuller test on
the log-returns, we get a p-value of 0.00, which embedding into the discriminator, that produces a
indicates that the log-returns are indeed stationary. score that reflects the “realness” of the generator’s
output.
Besides that, we also normalise our conditions There are 2 aspects to the architecture that are
using a rolling z-score. This allows all our essential to creating a conditional transformer-
conditions to be on the same scale, which is based time-series GAN.
essential in sampling vicinal examples with respect
to a target condition using L2 distance, as we will
Aspect 1: Transformer-Based-Time-Series
see later. The rolling window we set for each
Architecture
variable is usually around 1 year or less, and
determined by the p-values of granger causality We first establish an unconditional GANs model
tests applied across 2-month windows, where the that leverages on the transformer architecture.
treatment are the standardised values and Transformers and their attention mechanisms
outcome are the log-returns. have grown in popularity as they have proven to
outperform previous neural network architectures
commonly used in GANs, such as RNNs, LSTMs
30d 60d and CNNs. This is attributed to their ability to
ATM ATM handle long sequences without suffering from
Variables RSI
implied implied gradient vanishing, which is extremely useful in
vol vol time series analysis.
Z-score rolling
252 189 189 The time series is divided and processed in
window size
patches similar to how the Vision Transformer
Min p- processes images. Both the generator and
Causality 0.0029 0.0015 0.0000
value discriminator model start with positional encoding,
on log- followed by 3 self-attention layers. The generator
returns Mean p-
0.1035 0.0563 0.0044 also has 5 attention heads in each transformer
value
layer to allow for more complex computations and
more diverse outputs. Note that our positional
Proceedings of the URECA@NTU 2023-24

embeddings are randomly generated vectors


produced at the start of training. Figure 1
illustrates the architecture’s main components.

Figure 2: Workflow of Conditions Input (Ding et


al., 2023)

3.3 TRAINING METHODOLOGY


Here we explain how we can sample training
examples from a target condition, which is an
important step in allowing us to train the model to
achieve robust and accurate conditional
generation. We then go over how we use these
real and fake examples to update the weights of
the discriminator and generator.
We also introduce a supervised generator loss
function component on top of the unsupervised
GAN dynamic, that measures the quality of an
entire distribution of generated price paths given a
single condition embeddings, which is very
beneficial since the discriminator only measures
the quality of each individual price path generated,
and not the entire distribution as a whole.
Note that Kaggle GPU resources (GPU T4 x2)
were utilised to speed up the training process.

Figure 1: Transformer-Based-Time-Series GANs 3.3.1 VICINAL SAMPLING GIVEN TARGET


Architecture Logistic regression (Li et al., 2022) CONDITION
In each training epoch, we will iterate through each
Aspect 2: Conditional Generation training example’s target conditions. Each target
Building on top of the architecture in Aspect 1, we condition will be added with a small amount of
want to allow for the incorporation of prior Gaussian noise. The rule of thumb we use for the
conditions. Simply concatenating the condition standard deviation of this Gaussian noise is
embeddings our inputs is the simplest and crudest 1.06 × standard_deviation_of_condition_values ×
approach. train_size-1/5, as proposed by (Ding et al., 2023).
This is important in ensuring that our input
However, feeding them at every layer of the conditions during training aren’t just fixed to those
generator allows the conditions to more strongly available in the training set, and makes outputs
influence the generated output and also produce smoother and more robust especially for input
more diverse results. Doing so requires a 2D conditions that have not been seen by the
conditional layer normalisation after each self- generator before.
attention layer.
Once done, we use L2 distance to get the top 10
As for the discriminator, we simple perform an training examples which conditions are most
add-norm operation to incorporate our condition similar to this new conditions embedding with
embeddings into the final discriminator layer. noise added in. This will be the real price paths
shown to the discriminator. The generator on the
other hand will be also fed this new conditions
embedding, and tasked to generate 30 examples,
10 of which will be randomly selected and fed into
the discriminator.
Proceedings of the URECA@NTU 2023-24

This way, we can ensure the discriminator sees an fake_scores. This leads to an extremely low loss
equal number of real and fake examples, while for the generator, but the discriminator gets stuck
having a localised selection of 10 real examples in a state with a very large loss.
within the very immediate vicinity of the target
condition. At the same time, we have a large As such, we use an OLS version of the generator
enough sample of 30 fake examples, which we loss function, that forces the generator to generate
can use our supervised loss function on for further fake samples with scores as close to real samples
evaluation. as possible. This prevents the aforementioned
problem above, while ensuring that the generated
We use a batch size of 4 and 40 epochs, as this examples are trained to be indistinguishable from
was found to be most optimal. To clarify, each item the fake examples.
in the batch refers to 1 unique condition. As such,
a batch size of 4 translates means 4 unique Besides that, we extend the loss function
conditions, which in turn means 4*10 = 40 real described above to include other loss terms,
examples and 4*30 = 120 fake examples per forming the supervised component of the
batch. generator loss. These loss terms measure the
OLS loss between the first 3 moments and the
3.3.2 UPDATE RULES FOR GENERATOR & covariance matrices of the real and fake price
DISCRIMINATOR paths, and also the first 3 moments of real_scores
Once we have collected a few real and fake and fake_scores. Again, these loss terms evaluate
examples, these can be used to train and update the whole distribution of price paths for each
the weights of the discriminator and generator unique condition embedding.
models in a two-step process.
This modified semi-supervised loss function is as
Step 1: Update the Discriminator below, where a1, b1, b1,x, c1, a2, b2 and c2 are
The first step involves updating only the hyperparameters:
discriminator, which in this case, works like a critic.
Instead of acting like a classifier that assigns output_OLS =
binary or probit scores between 0 and 1 for fake a1 ∙ mean(
and real, it scores each example with a value. This (mean(fake_paths) – mean(real_paths))2
value can be positive or negative, what matters is )
that higher values reflect a higher degree of +
“realness”. This is essentially the Wasserstein loss b1 ∙ mean(
function (Martin et al., 2017) as below: (standard_deviation(fake_paths) –
standard_deviation(real_paths))2
disc_loss = mean(fake_scores) – )
mean(real_scores) + gradient penalty +
b1,x ∙ mean(
Note that we also apply a gradient penalty that (off_diagonal_elements(covariance(fake_paths) –
forces the norm of gradients to be close to 1. This off_diagonal_elements(covariance(real_paths))2
ensures the 1-Lipschitz continuity of the )
discriminator gradients, which helps the +
discriminator train better when using the c1 ∙ mean(
Wasserstein loss (Martin et al., 2017). (standardised_skew(fake_paths) –
standardised_skew(real_paths))2
For each condition data point, the loss function )
above will be applied on 10 real examples and 10
fake examples (randomly selected from the 30
fake examples). scores_OLS =
a2 ∙ (mean(fake_scores) – mean(real_scores))2
Step 2: Update the Generator +
The second step involves updating only the b2 ∙ (standard_deviation(fake_scores) –
generator. Through empirical experiments, we find standard_deviation(real_scores))2
that using the usual Wasserstein loss on the +
generator of just -mean(fake_scores) could c2 ∙ (standardised_skew(fake_scores) –
sometimes lead to a sub-optimal equilibrium, standardised_skew(real_scores))2
where the generator finds a pattern of fake
examples with poor quality but absurdly high
fake_scores, and the discriminator fails to produce gen_loss = output_OLS + scores_OLS
natural real_scores that are higher than the
Proceedings of the URECA@NTU 2023-24

For each unique condition embedding, the loss As the CEV model’s closed-form analytical
function above will be applied on 10 real examples solution is complex and difficult to compute, we
and also all 30 fake examples. calibrated this model’s parameters μ, σ and γ
using the S&P500 option market prices previously
retrieved from OptionsDx. Calibration was made
4 EVALUATION possible using Python’s scipy.optimize.minimize()
function, where we minimise the squared loss
Now that we have defined the model architecture between model prices and market prices. Details
and successfully trained the model, we can of said function can be found at
evaluate the model’s resulting outputs and https://docs.scipy.org/doc/scipy-
compare them with benchmarks. We will go over 1.13.1/reference/generated/scipy.optimize.minimiz
the few Monte-Carlo benchmarks and evaluation
e.html.
metrics that we use.
4.1.3 HESTON MODEL
4.1 MONTE-CARLO BENCHMARKS
The 3 benchmarks we use are popular models
used commonly in traditional Monte-Carlo
simulations. They are described below. Note that The Heston model is a stochastic volatility model,
we implement all these benchmark models with whereby the variance parameter ν follows its own
Empirical Martingale Correction and antithetic stochastic process with long-run average variance
variates. θ, rate of reversion κ, and volatility (vol of vol) ξ
4.1.1 GEOMETRIC BROWNIAN MOTION (GBM) (Karlsson et al., 2021).
As the Heston model has no analytical solution,
we utilise Python’s QuantLib library that contains
This is the standard and most commonly known classes and methods to calibrate all Heston
used dynamic to model underlying asset prices, parameters given market option prices (De La
where the drift μ and volatility σ are constant within Rosa, A., 2024).
each simulation and scales proportionately to the
asset price S. The GBM’s analytical solution 4.2 VISUALISATIONS
(which can be derived via Ito’s Lemma) is as
below: We developed some helpful visualisations to show
how different the distribution of the simulations
from the CC-TTS-GAN is from the benchmark
models’. Here we only use the testing period from
Apr 2022 to May 2024, so all synthetic simulations
To obtain values for μ and σ at each time point, we here by our CC-TTS-GAN are out-sampled. For
assume a risk-neutral setting. Keeping our 2- each timepoint within this testing period, we run n
month window in mind, μ is set to the 3-month simulations for 2 months ahead of that timepoint,
market yield on US treasury securities, which we and compare them with the actual price path
extracted from Yahoo Finance via the symbol realised after 2 months.
“DGS3MO”. σ is set to the implied volatility of
S&P500 at-the-money options with 60 days to 4.2.1 TIME-SERIES SIMULATIONS CHART
expiry, which was previously extracted from the Below are some time-series charts of the CC-TTS-
Bloomberg LIVE. GAN and benchmark’s simulations relative to the
real price paths. For readability, we only show 1
4.1.2 CONSTANT ELASTICITY OF VARIANCE
set of simulations for each 20 days, and we cut
(CEV)
each simulation down from 2 months to 1 month.
Note that our benchmark model simulations are
The CEV model is a local volatility model that has cut off after Jan 2024, as OptionsDx only provides
similar dynamics to GBM, with the exception of the options chain data up until Dec 2023.
γ variable (Hsu et al., 2008). γ is the elasticity of Nonetheless, we are still able to gain much
variance and is also constant for each simulation. valuable insights from visual comparisons.
γ is a useful feature because when γ<1, this
produces a phenomenan where the volatility
increases as its price falls, which is common in
equities as the stock’s leverage ratio increases.
Proceedings of the URECA@NTU 2023-24

synthetic simulations by the CC-TTS-GAN, and


green points refer to the GBM benchmark
simulations.

Figure 3: Simulation Plot (CC-TTS-GAN vs Real)

Figure 7: PCA Scatter Plot

Figure 4: Simulation Plot (GBM vs Real)

Figure 5: Simulation Plot (CEV vs Real) Figure 8: t-SNE Scatter Plot


Of course, reducing 42 dimensions to 2 leads to a
lot of information loss, and we cannot definitively
judge which generated distribution is closest to the
real distribution. Moreover, displaying all the 5
groups of data (real + synthetic + 3 benchmarks)
would render the scatter-plot unreadable pretty
quickly, which is why we omitted the CEV and
Figure 6: Simulation Plot (Heston vs Real) Heston data points in the above 2 figures. As
such, we will introduce evaluation metrics, which
As can be observed, the CC-TTS-GAN simulations
will provide more objective measures of
tend to exhibit less volatility relative to the
performance for all our simulations.
benchmark models. This could be seen as an
improvement as the benchmark models (usually
calibrated based on implied volatility or market
prices) tend to always over-estimate volatility of 4.3 DISTANCE METRICS
simulated time-series. The CC-TTS-GAN doesn’t 4.3.1 JENSEN-SHANNON (JS) DIVERGENCE
seem to exhibit such weaknesses and can provide JS-divergence (Menéndez et al., 1997) is based
more precise distributions. Moreover, the CC-TTS- on the Kullback-Leibler (KL) divergence. Both are
GAN is also able to provide relatively accurate metrics that measures the distance between 2
directional convictions of predicted price distributions. However, unlike KL-divergence, JS-
movements, while the CEV and Heston models divergence is symmetric. This means DJS(P||Q) =
barely deviate from risk-neutrality even when their DJS(P||Q), but DKL(P||Q) ≠ DKL(P||Q), where P and
drift parameter is calibrated on market prices. Q are 2 respective probability distributions. This
makes JS-divergence a more robust metric. Their
4.2.1 SCATTER PLOTS AFTER formulas are given below:
DIMENSIONALITY REDUCTION
To provide another perspective of the model
results, we reduce the dimensions of our 2-month-
long simulations from 42 (2 months =
approximately 42 trading days) to 2, using
principal component analysis (PCA) and t-
distributed Stochastic Neighbour Embedding (t-
SNE).
The resulting scatter plots are shown below. Red
points refer to real price paths, blue points refer to
Proceedings of the URECA@NTU 2023-24

As per the formulas, when P(x) and Q(x) are This makes DTW distance suitable for time-series
identical distributions, DJS(P||Q) = DJS(P||Q) = data like price paths, where we do not expect
DKL(P||Q) = DKL(P||Q) = 0. The more different the exact alignment between 2 series, but a general
distributions are, the larger JS-divergences and pattern.
KL-divergences get.
In our evaluation, we use the DTW distance of
To estimate the respective probability distributions mean, which is the DTW distance between the real
P(x) and Q(x) where x is a price path, we leverage price path, and the mean of generated price paths
Gaussian Kernel Density Estimation (KDE) (averaged at each time point). (Filho, M., 2024)
methods by Python’s scipy.stats.gaussian_kde() shared several examples of how we can
class, which can be found at implement DTW in Python with minimal latency at
https://docs.scipy.org/doc/scipy/reference/generat https://forecastegy.com/posts/dynamic-time-
ed/scipy.stats.gaussian_kde.html. warping-dtw-libraries-python-examples/.
However, when price path x has many dimensions
(42, in our case, for a 2-month simulation), the
estimation of P(x) and Q(x) becomes numerically 4.4 RESULTS
unstable and may fail. As such, we use PCA t- After training the model, we ran 20 trials of
SNE once again to reduce x’s dimensions from 42 evaluation. Each trial contains 50 simulations for
to 10, before running scipy.stats.gaussian_kde(), every available timepoint in the testing period,
which makes the estimation process easier. which results were compared with benchmarks.
4.3.2 FRECHET INCEPTION DISTANCE (FID) These results were then aggregated across the 20
FID is another distance metric introduced by trials and summarised below (lower scores are
(Heusel et al., 2017), traditionally used to better).
measures the difference between simulated CC-
images produced by GANs by looking at their GBM CEV Heston TTS-
means and covariance matrices, as below: GAN
JS-
Divergence 2.2182 2.3317 1.9198 1.5211
We intend to apply this metric on our simulated
(PCA)
and real price paths.
JS-
4.3.3 ROOT MEAN SQUARE ERROR (RMSE) OF
Divergence 0.2189 0.2323 0.1409 0.1368
MEAN
(t-SNE)
RMSE simply refers to the root of the mean
squared loss between 2 sequences. In our
evaluation, we use the RMSE of mean, which is FID 0.0047 0.0028 0.0020 0.0035
the RMSE between the real price path, and the
mean of generated price paths (averaged at each
time point) RMSE of
0.0596 0.0596 0.0596 0.0607
Mean
4.3.4 DYNAMIC TIME WARPING (DTW)
DISTANCE OF MEAN
DTW of
Instead of measuring Euclidean distances like 0.2027 0.2027 0.2027 0.1930
Mean
RMSE, DTW is a dynamic programming algorithm
that measures differences in temporal sequences
by aligning them in a way that minimises the
distance between corresponding points, even if the Table 2: Summary Table of Evaluation Results
sequences vary in speed or timing, as per the
image below. As per Table 2, CC-TTS-GAN beats all
benchmarks in 3 out of the 5 distance metrics, and
remains comparable with the benchmarks for the
other 2 metrics. Values in bold are highest within
that metric.
For each distance metric and benchmark, we also
conducted the hypothesis test below:
• H0: CC-TTS-GAN’s score = benchmark’s
score
Figure 9: Euclidean Distance vs DTW Distance
• H1: CC-TTS-GAN’s score > benchmark’s
score
Proceedings of the URECA@NTU 2023-24

Table 2 cells in grey are where the hypothesis test Below are the results we get (more iterations are
rejects the null hypothesis H0. better).

Besides that, we also made a few other remarks CC-


about the results in Table 2. GBM CEV Heston TTS-
GAN
• As expected, Heston tends to achieve No. of
lower distance metrics than GBM and Iterations
CEV, as it accounts for the stochastic 76.38 2.54 5.27 18.83
per
component of volatility and thus reflects Second
more accurate modelling of price
dynamics. Table 3: Summary of Runtime
• Note that the RMSE and DTW distance
scores for the 3 benchmarks may appear Each model’s simulation process is different from
identical, but they are actually different if one another, hence we see larger variety of
we look at their values at order of values. GBM is fastest as it did not go through any
magnitude -16. These extremely close calibration, while CEV is the slowest as we
scores are expected and can be explained performed manual simulations for each calibration.
from Figures 4, 5 and 6, where we CC-TTS-GAN achieves a decent runtime, ranking
mentioned that the benchmark simulations second, although GBM still far outperforms it
barely deviate from risk-neutrality, implying without calibration. This shows that once trained,
that their drift coefficients (and hence their CC-TTS-GAN can provide time savings as it skips
means) are approximately at the risk-free the calibration process, which is a benefit over
rate. This is unlike the scores for CC-TTS- certain models that have no closed-form solution
GAN simulations, which do not adhere to and require complex calibration.
risk-neutrality and reflect more realistic
market dynamics.
5 CONCLUSION
4.5 RUNTIME COMPARISONS Overall, our evaluation results showcas how the
Lastly, we also compare the runtimes of each CC-TTS-GAN framework could produce accurate
benchmark and CC-TTS-GAN simulations. Using simulations conditioned on some given continuous
Python’s tqdm library, we were able to record the market variables, while being on par with
number of iterations per second, where each benchmark models. The CC-TTS-GAN’s capability
iteration refers to a completed set of 100 simulated of continuous conditional generation makes it a
price paths at one timepoint. We selected 100 potentially value-adding tool to modern industry
timepoints and so the results are averaged across users.
100 iterations.
The points below summarise the main definitive
This runtime includes any calibration process for benefits of the CC-TTS-GAN over the benchmark
the benchmark models. However, we exclude the models:
model training time of the CC-TTS-GAN, as • Model-free setting: No assumptions on
modern high-quality GAN models always require risk-neutrality and price dynamics, which
long training times for high-quality generations. seems to tie benchmark models down.
Below is a summary of what the runtime includes • Skips calibration process: Once trained,
for each model’s simulation process: allows for much faster simulations over
• GBM: Risk-neutral; no calibration. Simply complex models.
substitute treasury yields and implied • Flexibility of input data: Complex models
volatility data for each timepoint, then require very specific options chain data,
simulate. which are often expensive or difficult to
• CEV: Calibrated with market prices via source. The CC-TTS-GAN can accept any
manual simulations for each timepoint, input data as conditions, as long as they
then simulate (again). have some predictive power over price
• Heston: Calibrated with market prices via paths.
Quantlib for each timepoint, then simulate. • Does not overestimate volatilities: The
• CC-TTS-GAN: Input noise vector and implied volatilities exhibited by benchmark
condition embeddings into trained model, models are empirically always inflated.
then forward propagation to retrieve • Reflects future drift of price path more
simulations. accurately: Benchmark models are heavily
Proceedings of the URECA@NTU 2023-24

restrained by risk-neutrality, and so their sample a larger number of training examples


drift is mostly set near the risk-free rate. which dynamics are more similar to one another,
improving model performance.
However, we also noted some weaknesses of the
CC-TTS-GAN relative to the benchmark models:
• Needs training time: Comes with all deep ACKNOWLEDGMENT
learning models, though this can be sped I would like to thank Assoc Prof. Patrick Pun Chi
up with GPU resources eg. using Kaggle’s Seng, who has been the supervisor of this
GPU T4 x2 shortened our training times to research project, for his time and kind guidance
just between 30min and 1hr. throughout the research and implementation
• Needs lots of training data: Training any process. It has been a fulfilling and insightful
GAN would require vast amounts of collaboration.
training examples, which is particularly
problematic in finance, where the further This research project was supported by Nanyang
back our price data is, the more different Technological University under the URECA
the price dynamics are due to the inherent Undergraduate Research Programme.
non-stationary.
• Needs hyperparameter tuning: The REFERENCES
hyperparameters in our complex
supervised loss function (as per Section [1] Li, X., Metsis, V., Wang, H., & Ngu, A. H. H.
(2022, June 26). TTS-GAN: a transformer-
3.3.2) would need to be retuned each time based Time-Series Generative adversarial
if we wish to apply our model on a different network.
asset. arXiv.org.https://arxiv.org/abs/2202.02691
• Needs feature engineering: This comes [2] Ding, X., Wang, Y., Xu, Z., Welch, W. J., &
with the flexibility of data – the user would Wang, Z. J. (2023, October 22). CCGAN:
Continuous conditional generative adversarial
need to find and engineer relevant networks for image generation. OpenReview.
features to be used as conditions. In https://openreview.net/forum?id=PrzjugOsDeE
addition, due to the way data points are [3] Vuletić, M., Cucuringu, M., & Prenzel, F.
sampled in training (as per Section 3.3.1), (2023). FIN-GAN: Forecasting and classifying
financial time series via generative adversarial
the user needs to ensure that the condition networks. Social Science Research Network.
embeddings are evenly distributed, though https://doi.org/10.2139/ssrn.4328302
this can be easily done with scikit-learn’s [4] OptionsDx. (2022, January 3). Free historical
preprocessing.QuantileTransformer() on options trading data - OptionsDX. optionsDX.
Python. https://www.optionsdx.com/
[5] Arjovsky, M., Chintala, S., & Bottou, L. (2017,
January 26). Wasserstein GAN. arXiv.org.
5.1 FUTURE WORK https://arxiv.org/abs/1701.07875
[6] Hsu, Y., Lin, T., & Lee, C. (2008). Constant
There are still many aspects of exploration and elasticity of variance (CEV) option pricing
improvement for the CC-TTS-GAN. One aspect is model: Integration and detailed derivation.
to implement and evaluate the model across other Mathematics and Computers in Simulation,
equities and ETFs, so that we can further validate 79(1), 60–71.
https://doi.org/10.1016/j.matcom.2007.09.012
its performance and robustness. [7] Karlsson, P. (2021). The Heston model -
Besides, we believe the CC-TTS-GAN can be stochastic volatility and approximation
[Thesis]. https://lup.lub.lu.se/student-
vastly improved if we make further modifications to papers/record/1436827/file/1646914.pdf
its architecture. We could consider using [8] De La Rosa, A. (2024, April 22). HESTON
discriminator/critic model that evaluates a single MODEL CALIBRATION USING QUANTLIB IN
distribution of time series as a whole, which PYTHON - Aaron De la Rosa - Medium.
removes the need for the complex supervised loss Medium.
https://medium.com/@aaron_delarosa/heston-
function in Section 3.3.2 and avoid any tedious model-calibration-using-quantlib-in-python-
hyperparameter tuning. However, evaluating 0089516430ef
distributions as a whole would mean we need a lot [9] Menéndez, M., Pardo, J., Pardo, L., & Pardo,
more data, which brings us to our next points. M. (1997). The Jensen-Shannon divergence.
Journal of the Franklin Institute, 334(2), 307–
The modified model could be applied on higher 318. https://doi.org/10.1016/s0016-
0032(96)00063-4
frequency prices and signals, where there are
[10] Heusel, M., Ramsauer, H., Unterthiner, T.,
more training examples in a shorter timeframe. Nessler, B., & Hochreiter, S. (2017, June 26).
Higher frequency prices may also exhibit greater GANs trained by a two Time-Scale update rule
stationarity due to short-run mean reversions, converge to a local Nash equilibrium.
volatility clusters and lower likelihood of consistent arXiv.org. https://arxiv.org/abs/1706.08500
trending, all of which are likely to allow us to [11] donshell. (2023, April 10). [D] A better way to
compute the Fréchet Inception Distance (FID).
Proceedings of the URECA@NTU 2023-24

https://www.reddit.com/r/MachineLearning/co
mments/12hv2u6/d_a_better_way_to_comput
e_the_fr%C3%A9chet_inception/
[12] Filho, M. (2023, April 13). 5 Dynamic Time
Warping (DTW) libraries in Python with
examples. Forecastegy.
https://forecastegy.com/posts/dynamic-time-
warping-dtw-libraries-python-examples/

You might also like