6406 seminar report

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Seminar Report

Hang Yue
Introduction:
In today's world, dealing with complex datasets is a big challenge in many areas, like finance
field. For example, think about financial experts trying to understand stock market data filled
with things like stock prices and trading volumes. Or consider scientists studying genetics,
where they have huge datasets with thousands of genes and their behaviors.

To make sense of all this data, we have tools like dimensionality reduction algorithms and
matrix visualizations. These tools help us simplify the data without losing important details.

Dimensionality reduction algorithms, like t-SNE and PCA, are like magic—they take lots of
complex data and make it simpler while keeping the most important stuff. And matrix
visualizations are like pictures that show how different pieces of data are connected, making it
easier for us to understand.

In this report, we're going to explore why these tools are so important and how they're used.
We'll look at real-life examples and see how they help experts in different fields make sense of
their data.

Literature Review:
Stock selection is a critical component of managing investments, directly impacting investment
returns (Markowitz, 1952; Ren et al., 2017). Over time, various strategies for selecting stocks
have emerged, including multi-factor models, momentum and contrarian approaches, volatility
strategies, and behavior bias strategies (Carvalho et al., 2010; Cooper et al., 2004; Huang et al.,
2011).

One notable strategy involves utilizing cluster analysis for stock selection, where similar stocks
are grouped together to inform decision-making. Research has demonstrated its effectiveness
in different markets, such as Thailand, where it has outperformed traditional methods
(Peachavanish, 2016). Researchers have applied cluster analysis to diverse sets of factors,
spanning from fundamental to more intricate relationships among stocks (Da Costa et al., 2005;
Brida & Risso, 2010; Tabak et al., 2010).

However, dealing with high-dimensional datasets poses a significant challenge due to the curse
of dimensionality (Ding et al., 2002; Tajunisha & Saravanan, 2010). While conventional methods
like principal component analysis (PCA) have been utilized to address this issue, they come with
limitations (Jolliffe & Cadima, 2016). To tackle these limitations, researchers have explored non-
parametric methods such as stacked autoencoder (SAE) and stacked restricted Boltzmann
machine (SRBM) based on neural networks (Cai et al., 2012; Hinton & Salakhutdinov, 2006;
Hinton et al., 2006).
Understanding the impact of dimensionality reduction on stock selection across different
market conditions is crucial. Market dynamics, including volatility and index fluctuations,
significantly influence outcomes. Studies suggest that dimensionality reduction tends to be
more effective in trending markets compared to sideways markets, with results varying based
on the specific market context (Fulga et al., 2009).

Given these insights, there is interest in developing strategies that integrate dimensionality
reduction with stock selection. Rotation strategies, which alternate between dimensionality
reduction and cluster analysis, hold promise for enhancing portfolio performance (Baser &
Saini, 2015). By comprehending the intricate relationship between dimensionality reduction,
noise trading, and market conditions, investors can devise more robust stock-selection
strategies tailored to diverse market scenarios. Wang and colleagues study how to make high-
speed trading smarter by simplifying the vast amount of data it generates. They use techniques
like PCA, SAE, and SRBM to pick out the most important parts of this data. This helps improve
how trading algorithms work, making them more efficient.

On the other hand, our paper looks at how to find patterns in financial data over time. We use
the same techniques—PCA, SAE, and SRBM—to make sense of complex data sets showing how
financial instruments change over time. This helps us group together similar financial products
based on how they behave over different periods.

The study explores how reducing the complexity of stock market data through techniques like
PCA, SAE, and SRBM affects the selection of stocks in different market situations. It finds that
while these techniques don't help much when the market is stable, they significantly improve
stock selection during market trends, depending on whether the market is going up or down.
The study suggests a new strategy that switches between using these techniques and
traditional methods, which performs better than just sticking to one approach. The researchers
split the data into two sets, trained the techniques on one, and tested them on stocks from two
major indices. Overall, the study gives practical advice for investors and adds to our
understanding of how to pick stocks effectively.

Comparison and Analysis:


Similarities
The methods and approaches used in the paper are quite similar to those discussed in the
literature review. Both discuss ways to simplify complex high-dimensional data, like using
techniques such as PCA, MDS, t-SNE, and UMAP. These methods help reduce the data's
complexity while preserving its important aspects.

Both the paper and the literature review emphasize the importance of visualization in
understanding this type of data. They highlight techniques like scatterplots and matrix
visualizations as effective ways to make sense of reduced-dimensional data.

Differences
However, the paper goes beyond just discussing these methods. It introduces a new tool called
Compadre and demonstrates its use through a case study using real data from IEEE VIS papers.
So, while the literature review gives a broad overview, the paper takes it further by creating a
practical tool and applying it to real-world data.

The Application of t-SNE and PCA in Literature Review


The literature review paper extensively explores the utilization of t-SNE (t-Distributed Stochastic
Neighbor Embedding) and PCA (Principal Component Analysis) in analyzing stock market data. It
likely delves into how these methods are employed to gain insights into the complex dynamics
of financial markets. For instance, t-SNE is known for its ability to visualize high-dimensional
data in lower-dimensional space, allowing researchers to uncover hidden patterns and
structures in stock market datasets. On the other hand, PCA serves as a powerful tool for
dimensionality reduction, enabling analysts to identify the most influential factors driving
variations in stock returns. By comparing and integrating the strengths of both techniques,
researchers can gain a comprehensive understanding of the underlying relationships within the
stock market. Additionally, the literature review may provide empirical studies or case
examples to demonstrate the practical applications of t-SNE and PCA in financial analysis. These
examples offer valuable insights into how these methods can be effectively utilized to support
investment decision-making processes, risk management strategies, and portfolio optimization
techniques in real-world scenarios.

The Application of t-SNE and PCA In the Paper Presented


Researcher analyzed the well-known SWISSROLL dataset, where points lie on a complex, non-
linear manifold resembling a 3D Swiss roll cake. By comparing the 2D projections generated by
different DR methods like t-SNE and PCA, the researcher can see how well each method
captures the underlying structure of the data. In the case of PCA, which is a linear DR method,
many errors are observed where points are projected closer together than they should be,
resulting in blue cells in the matrix. On the other hand, t-SNE, being a non-linear DR method,
shows red cells indicating larger distances in some areas, suggesting missed neighbors in the
projection.

Figure 1: The application of t-SNE and PCA

Compadre In the Paper Presented


The visualization matrix, like Compadre, helps researchers see how techniques like t-SNE and
PCA affect data when reducing its dimensions. For instance, when comparing projections from
t-SNE and PCA on the SWISSROLL dataset, the matrix shows where each method might make
mistakes. Blue cells mean points are too close together, showing errors in PCA, while red cells
suggest missed connections in t-SNE.

The comparison also lets researchers directly compare t-SNE and PCA projections. Analyzing the
matrices of these projections reveals differences in how they organize data. By using the matrix
to measure these differences, researchers can see where each method works best in keeping
the data's structure intact.

Figure 2: t-SNE comparison to PCA

Moreover, visualization matrices help compare features between different parts of the data,
even if they have different dimensions. For example, using Compadre to compare full and half
images of digits from the MNIST dataset, researchers can spot where digits are similar or
different. This helps understand how each part of the image affects its position in the
visualization.

Figure 3: Sub-Sub Comparison Analysis

If Using Visualization Matrix (Compadre) In Stock Analysis


Using visualization matrices to compare the performance of t-Distributed Stochastic Neighbor
Embedding (t-SNE) and Principal Component Analysis (PCA) in analyzing stock market data is an
intriguing hypothesis. Both t-SNE and PCA are widely used techniques to simplify and extract
meaningful insights from complex datasets in the realm of stock market analysis. However,
determining which method better captures the underlying structure of the data can be
challenging without proper visualization tools.

Visualization matrices offer a tangible way to illustrate the disparities between t-SNE and PCA
embeddings of stock market data. Each cell in the matrix signifies the similarity or dissimilarity
between two stocks, as represented in the t-SNE and PCA spaces. By scrutinizing these
matrices, analysts can discern how effectively each method preserves the relationships among
stocks, shedding light on their relative strengths and weaknesses.

By delving into visualization matrices, researchers can uncover nuanced patterns and
discrepancies in how stocks are clustered by t-SNE and PCA. This exploration enables a deeper
understanding of the performance of each approach and provides valuable insights for
decision-making in stock market analysis. Ultimately, leveraging visualization matrices as a tool
for comparing t-SNE and PCA empowers analysts to make more informed investment decisions
and effectively manage risks in the dynamic stock market landscape.

Discussion:
Limitation:
Limited Validation: While the Compadre tool is demonstrated through a case study, the paper
lacks extensive validation across diverse datasets. Further validation studies using different
types of high-dimensional data would strengthen the generalizability of the findings. For
instance, using stock market data sets for exploration could be beneficial. These datasets
include daily stock prices, trading volumes, financial ratios, and more, constituting high-
dimensional data due to the time-series nature and complex interrelations inherent in stock
market data.

Challenge:
Financial data often comprises numerous features and high dimensions, posing challenges in
computation and visualization. Additionally, financial datasets commonly contain noise and
missing values, which can affect the accuracy and robustness of Compadre's analysis of data
similarity. Therefore, effective noise handling and missing value imputation during data
preprocessing are essential to ensure the reliability of the analysis results.
Reflect on the implications of the findings from the comparison and analysis.

So, the next step of study needs to focus on how to ensure that the Compadre tool can
effectively handle large-scale and high-dimensional datasets while maintaining the clarity and
interpretability of the visualization results. Additionally, because the interpretability of financial
data is crucial for decision-makers, the Compadre tool needs to provide interactive functionality
and visual explanations, enabling users to gain a deeper understanding of the relationships and
patterns between data points and facilitate more effective decision-making and analysis.

Conclusion:
In summary, the content underscores the importance of clear data visualization, especially for
dealing with complex datasets like those in stock market analysis. Techniques like t-Distributed
Stochastic Neighbor Embedding (t-SNE) and Principal Component Analysis (PCA) help us
understand intricate data structures, aiding decision-making and comprehension.

The introduction of visualization matrices through tools like Compadre shows promise for
comparing different dimensionality reduction methods. However, further validation studies
across diverse datasets are needed to ensure these findings apply broadly. Challenges like
handling large datasets and noisy financial data highlight the need for robust visualization tools
that maintain clarity.

Looking forward, future research should focus on refining visualization techniques to address
these challenges. Exploring interactive and immersive visualization methods could further
enhance data understanding. By advancing data visualization, researchers can empower
decision-makers across domains to glean insights and make informed choices.

References:
[1] Markowitz, H. (1952). Portfolio selection. The Journal of Finance.
[2] Ren, R., et al. (2017). A review of factor models: Some new perspectives. The Review of
Financial Studies.
[3] Carvalho, C. M., et al. (2010). Dynamic stock selection strategies: A structured factor model
framework. In The ninth valencia international meeting.
[4] Cooper, M. J., et al. (2004). Market states and momentum. The Journal of Finance.
[5] Da Costa Jr, N., et al. (2005). Stock selection based on cluster analysis. Economics Bulletin.
[6] Ding, C., et al. (2002). Adaptive dimension reduction for clustering high-dimensional data. In
IEEE international conference on data mining.
[7] Brida, J. G., et al. (2010). Hierarchical structure of the German stock market. Expert Systems
with Applications.
[8] Huang, J., et al. (2011). Asset pricing in the frequency domain: Theory and empirical
evidence. Journal of Financial Economics.
[9] Fulga, C., et al. (2009). Dimensionality reduction in market data. Journal of Business
Research.
[10] Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with
neural networks. Science.
[11] Tajunisha, N., & Saravanan, V. (2010). A review on dimensionality reduction techniques.
International Journal of Data Mining & Knowledge Management Process.
[12] Peachavanish, R. (2016). Cluster-based stock selection strategies: Evidence from Thailand.
Journal of Multinational Financial Management.
[13] Tabak, B. M., et al. (2010). Topological properties of stock market networks: The case of
Brazil. Physica A: Statistical Mechanics and its Applications.
[14] Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: A review and recent
developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and
Engineering Sciences.

You might also like