Professional Documents
Culture Documents
Generative Adversarial Networks For Time-Series Simulations Under Continuous Conditions
Generative Adversarial Networks For Time-Series Simulations Under Continuous Conditions
often only use categorical conditions (eg. “cat” and to add value beyond what existing models are
“dog”). This poses a problem if our conditions are capable of, and can be extended across different
continuous in nature, such as our aforementioned areas of research or applications within
financial variables. quantitative finance.
Hence, through this paper we hope to contribute
valuable insights within this area of research, by 2 DATA COLLECTION
developing a GAN model that is capable of
modelling the distribution of time-series In our paper, we will experiment with simulations
simulations given some continuous condition(s) – of the S&P500 ETF end-of-day price paths, as
a continuous, conditional GAN for financial time data available for the ETF and its options are most
series. We believe this capability will lead to more freely available. It will also be a good starting point
accurate forward-looking distributions of time- before we extend our model to other ETFs, single-
series data, from which users can extract more stocks, or even other asset classes. We were able
meaningful and valuable insights in an industry to retrieve S&P500 prices from Jan 2005 to
setting. present day easily using the Yahoo Finance API.
As for conditions, we considered variables such as
1.1 RELATED WORK options implied volatility, options put-call ratio and
RSI (relative strength index) of S&P500 prices.
There have been many experiments in leveraging Implied volatility data was extracted from the
the powerful capabilities of the traditional GAN Bloomberg LIVE (Listed Implied Volatility Engine),
framework. Each offering different benefits and
that provided the daily implied volatility of listed
insights with respect to this paper’s goal.
options for all global asset classes back until Jan
(Li et al., 2022) showcased how GANs can be very 2005, given days to expiry and moneyness. We
well adapted to time-series data and produce extracted the implied volatility of S&P500 at-the-
realistic simulations when coupled with the money options with days to expiry of 30 and 60,
transformer mechanism. However, generations using the BQL command below on MS Excel:
from this model were unconditional so users could
not incorporate any information prior to simulation. =@BQL("SPX
Moreover, experiments were mainly performed on Index","dropna(IMPLIED_VOLATILITY(PCT_MON
non-financial time series such as random EYNESS=100,
sinusoidal waves, human heartbeat signals and EXPIRY=30D))","dates=range(2005-01-01, 2024-
other human activity recognition data. Much of 05-10)")
these datasets were stationary and non-trending,
unlike financial price paths. Similarly, we were also able to extract S&P500
put-call ratios using the BDH command below:
On the other hand, (Ding et al., 2023) was the first
to introduce a GAN with a new vicinal loss function =@BDH("SPX Index",
that enabled the incorporation of continuous "PUT_CALL_VOLUME_RATIO_CUR_DAY",
conditions to generate images, such as objects at "20050101", "20240510")
specified angles and human faces at specified
The end-of-day RSI was easily computed directly
ages. This was possible by sampling training
from the S&P500 price paths.
examples within the vicinity of a target condition.
These results were promising and opened up Lastly, we also require end-of-day market prices of
further possibilities for incorporating conditions in S&P500 options to calibrate some of the Monte-
time series generation. However, in this setup, we Carlo benchmark models, namely CEV and
must assume that our conditions are relatively Heston. A brilliant source of free historical end-of-
uniformly distributed and we have sufficient day options chain data going back until Jan 2005
number of samples. This can be true for object can be found on OptionsDx at
angles and human ages, but not so for financial https://www.optionsdx.com/. Once retrieved, for
market variables. each day, we filtered for options with expiry of
Another interesting work by (Milena et al., 2023) within 30 and 90 days, then computed the mean
proposed implementing GANs focused on moneyness level (strike price / underlying price)
generating financial time-series, but under a semi- and average market price for all put and call
supervised setting. The generator model was options. With mean strike price, underlying price,
trained on a supervised economics-based loss mean days to expiry and option market price, we
function with terms that measure PnL, MSE and are now able to calibrate parameters for our
Sharpe Ratio. Monte-Carlo benchmarks.
Our paper attempts to draw upon ideas from these
various pieces to construct a GAN model capable
of achieving our aforementioned objective. We
hope to show that such a model has the potential
Proceedings of the URECA@NTU 2023-24
3 METHODOLOGY Min p-
Causality 0.0736 0.0197 0.0004
value
The objective of our experiment is to produce on close
forward-looking distributions of daily S&P500 prices Mean p-
0.3517 0.3921 0.0489
closing price paths within 2-month windows. value
Our time series from Jan 2005 to present day will
have a train-test split of 90-10. The training period Table 1: P-Value Statistics from Granger-Causality
is from Jan 2005 to Apr 2022 and the testing Tests
period is from Apr 2022 to May 2024. The training
size of 90% is to allow for a larger sample size to Table 1 summarises granger-causality test results
draw training examples from, and also to include of our standardized variables when applied on log-
as much of the volatile Covid-19 pandemic period returns and close prices. We see that the p-values
in training. for log-returns are substantially lower than close
prices, which proves our point that log-returns are
The model will be trained on the whole training set easier to model. As we observe relatively low
for multiple epochs, and then evaluated on the mean p-values for these 3 variables, this indicates
whole test set at once. As we do not use a rolling they have some predictive potential over log-
model that is retrained at each timestep, this returns. Hence, we will proceed to use them in our
actually renders the problem more challenging. conditions embeddings. Though note that we
excluded put-call ratio from our condition variables
as it was too noisy and would unlikely be a
3.1 PREPROCESSING AND FEATURE meaningful signal for an output window of 2
ENGINEERING months.
Before constructing the model, we first perform a
few necessary data preprocessing steps to ensure
our data can be fitted by the model. 3.2 MODEL ARCHITECTURE
Firstly, we get the log-returns of the S&P500 price The CC-TTS-GAN’s model architecture is
paths. Log-returns are more stationary in nature composed of the usual generator and
and thus easier to model, and this could also allow discriminator. The generator takes in a noise
the model to better capture volatility clusters. Most vector and condition embedding to generate a 2-
importantly, as we will cover later, the GAN model
month log-returns series. This is converted back to
will perform layer normalisation at each network
a price path that has an initial value of 1.0, which
layer and so its output will almost always be
in turn is fed alongside the same condition
stationary. After applying a Dickey-Fuller test on
the log-returns, we get a p-value of 0.00, which embedding into the discriminator, that produces a
indicates that the log-returns are indeed stationary. score that reflects the “realness” of the generator’s
output.
Besides that, we also normalise our conditions There are 2 aspects to the architecture that are
using a rolling z-score. This allows all our essential to creating a conditional transformer-
conditions to be on the same scale, which is based time-series GAN.
essential in sampling vicinal examples with respect
to a target condition using L2 distance, as we will
Aspect 1: Transformer-Based-Time-Series
see later. The rolling window we set for each
Architecture
variable is usually around 1 year or less, and
determined by the p-values of granger causality We first establish an unconditional GANs model
tests applied across 2-month windows, where the that leverages on the transformer architecture.
treatment are the standardised values and Transformers and their attention mechanisms
outcome are the log-returns. have grown in popularity as they have proven to
outperform previous neural network architectures
commonly used in GANs, such as RNNs, LSTMs
30d 60d and CNNs. This is attributed to their ability to
ATM ATM handle long sequences without suffering from
Variables RSI
implied implied gradient vanishing, which is extremely useful in
vol vol time series analysis.
Z-score rolling
252 189 189 The time series is divided and processed in
window size
patches similar to how the Vision Transformer
Min p- processes images. Both the generator and
Causality 0.0029 0.0015 0.0000
value discriminator model start with positional encoding,
on log- followed by 3 self-attention layers. The generator
returns Mean p-
0.1035 0.0563 0.0044 also has 5 attention heads in each transformer
value
layer to allow for more complex computations and
more diverse outputs. Note that our positional
Proceedings of the URECA@NTU 2023-24
This way, we can ensure the discriminator sees an fake_scores. This leads to an extremely low loss
equal number of real and fake examples, while for the generator, but the discriminator gets stuck
having a localised selection of 10 real examples in a state with a very large loss.
within the very immediate vicinity of the target
condition. At the same time, we have a large As such, we use an OLS version of the generator
enough sample of 30 fake examples, which we loss function, that forces the generator to generate
can use our supervised loss function on for further fake samples with scores as close to real samples
evaluation. as possible. This prevents the aforementioned
problem above, while ensuring that the generated
We use a batch size of 4 and 40 epochs, as this examples are trained to be indistinguishable from
was found to be most optimal. To clarify, each item the fake examples.
in the batch refers to 1 unique condition. As such,
a batch size of 4 translates means 4 unique Besides that, we extend the loss function
conditions, which in turn means 4*10 = 40 real described above to include other loss terms,
examples and 4*30 = 120 fake examples per forming the supervised component of the
batch. generator loss. These loss terms measure the
OLS loss between the first 3 moments and the
3.3.2 UPDATE RULES FOR GENERATOR & covariance matrices of the real and fake price
DISCRIMINATOR paths, and also the first 3 moments of real_scores
Once we have collected a few real and fake and fake_scores. Again, these loss terms evaluate
examples, these can be used to train and update the whole distribution of price paths for each
the weights of the discriminator and generator unique condition embedding.
models in a two-step process.
This modified semi-supervised loss function is as
Step 1: Update the Discriminator below, where a1, b1, b1,x, c1, a2, b2 and c2 are
The first step involves updating only the hyperparameters:
discriminator, which in this case, works like a critic.
Instead of acting like a classifier that assigns output_OLS =
binary or probit scores between 0 and 1 for fake a1 ∙ mean(
and real, it scores each example with a value. This (mean(fake_paths) – mean(real_paths))2
value can be positive or negative, what matters is )
that higher values reflect a higher degree of +
“realness”. This is essentially the Wasserstein loss b1 ∙ mean(
function (Martin et al., 2017) as below: (standard_deviation(fake_paths) –
standard_deviation(real_paths))2
disc_loss = mean(fake_scores) – )
mean(real_scores) + gradient penalty +
b1,x ∙ mean(
Note that we also apply a gradient penalty that (off_diagonal_elements(covariance(fake_paths) –
forces the norm of gradients to be close to 1. This off_diagonal_elements(covariance(real_paths))2
ensures the 1-Lipschitz continuity of the )
discriminator gradients, which helps the +
discriminator train better when using the c1 ∙ mean(
Wasserstein loss (Martin et al., 2017). (standardised_skew(fake_paths) –
standardised_skew(real_paths))2
For each condition data point, the loss function )
above will be applied on 10 real examples and 10
fake examples (randomly selected from the 30
fake examples). scores_OLS =
a2 ∙ (mean(fake_scores) – mean(real_scores))2
Step 2: Update the Generator +
The second step involves updating only the b2 ∙ (standard_deviation(fake_scores) –
generator. Through empirical experiments, we find standard_deviation(real_scores))2
that using the usual Wasserstein loss on the +
generator of just -mean(fake_scores) could c2 ∙ (standardised_skew(fake_scores) –
sometimes lead to a sub-optimal equilibrium, standardised_skew(real_scores))2
where the generator finds a pattern of fake
examples with poor quality but absurdly high
fake_scores, and the discriminator fails to produce gen_loss = output_OLS + scores_OLS
natural real_scores that are higher than the
Proceedings of the URECA@NTU 2023-24
For each unique condition embedding, the loss As the CEV model’s closed-form analytical
function above will be applied on 10 real examples solution is complex and difficult to compute, we
and also all 30 fake examples. calibrated this model’s parameters μ, σ and γ
using the S&P500 option market prices previously
retrieved from OptionsDx. Calibration was made
4 EVALUATION possible using Python’s scipy.optimize.minimize()
function, where we minimise the squared loss
Now that we have defined the model architecture between model prices and market prices. Details
and successfully trained the model, we can of said function can be found at
evaluate the model’s resulting outputs and https://docs.scipy.org/doc/scipy-
compare them with benchmarks. We will go over 1.13.1/reference/generated/scipy.optimize.minimiz
the few Monte-Carlo benchmarks and evaluation
e.html.
metrics that we use.
4.1.3 HESTON MODEL
4.1 MONTE-CARLO BENCHMARKS
The 3 benchmarks we use are popular models
used commonly in traditional Monte-Carlo
simulations. They are described below. Note that The Heston model is a stochastic volatility model,
we implement all these benchmark models with whereby the variance parameter ν follows its own
Empirical Martingale Correction and antithetic stochastic process with long-run average variance
variates. θ, rate of reversion κ, and volatility (vol of vol) ξ
4.1.1 GEOMETRIC BROWNIAN MOTION (GBM) (Karlsson et al., 2021).
As the Heston model has no analytical solution,
we utilise Python’s QuantLib library that contains
This is the standard and most commonly known classes and methods to calibrate all Heston
used dynamic to model underlying asset prices, parameters given market option prices (De La
where the drift μ and volatility σ are constant within Rosa, A., 2024).
each simulation and scales proportionately to the
asset price S. The GBM’s analytical solution 4.2 VISUALISATIONS
(which can be derived via Ito’s Lemma) is as
below: We developed some helpful visualisations to show
how different the distribution of the simulations
from the CC-TTS-GAN is from the benchmark
models’. Here we only use the testing period from
Apr 2022 to May 2024, so all synthetic simulations
To obtain values for μ and σ at each time point, we here by our CC-TTS-GAN are out-sampled. For
assume a risk-neutral setting. Keeping our 2- each timepoint within this testing period, we run n
month window in mind, μ is set to the 3-month simulations for 2 months ahead of that timepoint,
market yield on US treasury securities, which we and compare them with the actual price path
extracted from Yahoo Finance via the symbol realised after 2 months.
“DGS3MO”. σ is set to the implied volatility of
S&P500 at-the-money options with 60 days to 4.2.1 TIME-SERIES SIMULATIONS CHART
expiry, which was previously extracted from the Below are some time-series charts of the CC-TTS-
Bloomberg LIVE. GAN and benchmark’s simulations relative to the
real price paths. For readability, we only show 1
4.1.2 CONSTANT ELASTICITY OF VARIANCE
set of simulations for each 20 days, and we cut
(CEV)
each simulation down from 2 months to 1 month.
Note that our benchmark model simulations are
The CEV model is a local volatility model that has cut off after Jan 2024, as OptionsDx only provides
similar dynamics to GBM, with the exception of the options chain data up until Dec 2023.
γ variable (Hsu et al., 2008). γ is the elasticity of Nonetheless, we are still able to gain much
variance and is also constant for each simulation. valuable insights from visual comparisons.
γ is a useful feature because when γ<1, this
produces a phenomenan where the volatility
increases as its price falls, which is common in
equities as the stock’s leverage ratio increases.
Proceedings of the URECA@NTU 2023-24
As per the formulas, when P(x) and Q(x) are This makes DTW distance suitable for time-series
identical distributions, DJS(P||Q) = DJS(P||Q) = data like price paths, where we do not expect
DKL(P||Q) = DKL(P||Q) = 0. The more different the exact alignment between 2 series, but a general
distributions are, the larger JS-divergences and pattern.
KL-divergences get.
In our evaluation, we use the DTW distance of
To estimate the respective probability distributions mean, which is the DTW distance between the real
P(x) and Q(x) where x is a price path, we leverage price path, and the mean of generated price paths
Gaussian Kernel Density Estimation (KDE) (averaged at each time point). (Filho, M., 2024)
methods by Python’s scipy.stats.gaussian_kde() shared several examples of how we can
class, which can be found at implement DTW in Python with minimal latency at
https://docs.scipy.org/doc/scipy/reference/generat https://forecastegy.com/posts/dynamic-time-
ed/scipy.stats.gaussian_kde.html. warping-dtw-libraries-python-examples/.
However, when price path x has many dimensions
(42, in our case, for a 2-month simulation), the
estimation of P(x) and Q(x) becomes numerically 4.4 RESULTS
unstable and may fail. As such, we use PCA t- After training the model, we ran 20 trials of
SNE once again to reduce x’s dimensions from 42 evaluation. Each trial contains 50 simulations for
to 10, before running scipy.stats.gaussian_kde(), every available timepoint in the testing period,
which makes the estimation process easier. which results were compared with benchmarks.
4.3.2 FRECHET INCEPTION DISTANCE (FID) These results were then aggregated across the 20
FID is another distance metric introduced by trials and summarised below (lower scores are
(Heusel et al., 2017), traditionally used to better).
measures the difference between simulated CC-
images produced by GANs by looking at their GBM CEV Heston TTS-
means and covariance matrices, as below: GAN
JS-
Divergence 2.2182 2.3317 1.9198 1.5211
We intend to apply this metric on our simulated
(PCA)
and real price paths.
JS-
4.3.3 ROOT MEAN SQUARE ERROR (RMSE) OF
Divergence 0.2189 0.2323 0.1409 0.1368
MEAN
(t-SNE)
RMSE simply refers to the root of the mean
squared loss between 2 sequences. In our
evaluation, we use the RMSE of mean, which is FID 0.0047 0.0028 0.0020 0.0035
the RMSE between the real price path, and the
mean of generated price paths (averaged at each
time point) RMSE of
0.0596 0.0596 0.0596 0.0607
Mean
4.3.4 DYNAMIC TIME WARPING (DTW)
DISTANCE OF MEAN
DTW of
Instead of measuring Euclidean distances like 0.2027 0.2027 0.2027 0.1930
Mean
RMSE, DTW is a dynamic programming algorithm
that measures differences in temporal sequences
by aligning them in a way that minimises the
distance between corresponding points, even if the Table 2: Summary Table of Evaluation Results
sequences vary in speed or timing, as per the
image below. As per Table 2, CC-TTS-GAN beats all
benchmarks in 3 out of the 5 distance metrics, and
remains comparable with the benchmarks for the
other 2 metrics. Values in bold are highest within
that metric.
For each distance metric and benchmark, we also
conducted the hypothesis test below:
• H0: CC-TTS-GAN’s score = benchmark’s
score
Figure 9: Euclidean Distance vs DTW Distance
• H1: CC-TTS-GAN’s score > benchmark’s
score
Proceedings of the URECA@NTU 2023-24
Table 2 cells in grey are where the hypothesis test Below are the results we get (more iterations are
rejects the null hypothesis H0. better).
https://www.reddit.com/r/MachineLearning/co
mments/12hv2u6/d_a_better_way_to_comput
e_the_fr%C3%A9chet_inception/
[12] Filho, M. (2023, April 13). 5 Dynamic Time
Warping (DTW) libraries in Python with
examples. Forecastegy.
https://forecastegy.com/posts/dynamic-time-
warping-dtw-libraries-python-examples/