Professional Documents
Culture Documents
6406 seminar report
6406 seminar report
6406 seminar report
Hang Yue
Introduction:
In today's world, dealing with complex datasets is a big challenge in many areas, like finance
field. For example, think about financial experts trying to understand stock market data filled
with things like stock prices and trading volumes. Or consider scientists studying genetics,
where they have huge datasets with thousands of genes and their behaviors.
To make sense of all this data, we have tools like dimensionality reduction algorithms and
matrix visualizations. These tools help us simplify the data without losing important details.
Dimensionality reduction algorithms, like t-SNE and PCA, are like magic—they take lots of
complex data and make it simpler while keeping the most important stuff. And matrix
visualizations are like pictures that show how different pieces of data are connected, making it
easier for us to understand.
In this report, we're going to explore why these tools are so important and how they're used.
We'll look at real-life examples and see how they help experts in different fields make sense of
their data.
Literature Review:
Stock selection is a critical component of managing investments, directly impacting investment
returns (Markowitz, 1952; Ren et al., 2017). Over time, various strategies for selecting stocks
have emerged, including multi-factor models, momentum and contrarian approaches, volatility
strategies, and behavior bias strategies (Carvalho et al., 2010; Cooper et al., 2004; Huang et al.,
2011).
One notable strategy involves utilizing cluster analysis for stock selection, where similar stocks
are grouped together to inform decision-making. Research has demonstrated its effectiveness
in different markets, such as Thailand, where it has outperformed traditional methods
(Peachavanish, 2016). Researchers have applied cluster analysis to diverse sets of factors,
spanning from fundamental to more intricate relationships among stocks (Da Costa et al., 2005;
Brida & Risso, 2010; Tabak et al., 2010).
However, dealing with high-dimensional datasets poses a significant challenge due to the curse
of dimensionality (Ding et al., 2002; Tajunisha & Saravanan, 2010). While conventional methods
like principal component analysis (PCA) have been utilized to address this issue, they come with
limitations (Jolliffe & Cadima, 2016). To tackle these limitations, researchers have explored non-
parametric methods such as stacked autoencoder (SAE) and stacked restricted Boltzmann
machine (SRBM) based on neural networks (Cai et al., 2012; Hinton & Salakhutdinov, 2006;
Hinton et al., 2006).
Understanding the impact of dimensionality reduction on stock selection across different
market conditions is crucial. Market dynamics, including volatility and index fluctuations,
significantly influence outcomes. Studies suggest that dimensionality reduction tends to be
more effective in trending markets compared to sideways markets, with results varying based
on the specific market context (Fulga et al., 2009).
Given these insights, there is interest in developing strategies that integrate dimensionality
reduction with stock selection. Rotation strategies, which alternate between dimensionality
reduction and cluster analysis, hold promise for enhancing portfolio performance (Baser &
Saini, 2015). By comprehending the intricate relationship between dimensionality reduction,
noise trading, and market conditions, investors can devise more robust stock-selection
strategies tailored to diverse market scenarios. Wang and colleagues study how to make high-
speed trading smarter by simplifying the vast amount of data it generates. They use techniques
like PCA, SAE, and SRBM to pick out the most important parts of this data. This helps improve
how trading algorithms work, making them more efficient.
On the other hand, our paper looks at how to find patterns in financial data over time. We use
the same techniques—PCA, SAE, and SRBM—to make sense of complex data sets showing how
financial instruments change over time. This helps us group together similar financial products
based on how they behave over different periods.
The study explores how reducing the complexity of stock market data through techniques like
PCA, SAE, and SRBM affects the selection of stocks in different market situations. It finds that
while these techniques don't help much when the market is stable, they significantly improve
stock selection during market trends, depending on whether the market is going up or down.
The study suggests a new strategy that switches between using these techniques and
traditional methods, which performs better than just sticking to one approach. The researchers
split the data into two sets, trained the techniques on one, and tested them on stocks from two
major indices. Overall, the study gives practical advice for investors and adds to our
understanding of how to pick stocks effectively.
Both the paper and the literature review emphasize the importance of visualization in
understanding this type of data. They highlight techniques like scatterplots and matrix
visualizations as effective ways to make sense of reduced-dimensional data.
Differences
However, the paper goes beyond just discussing these methods. It introduces a new tool called
Compadre and demonstrates its use through a case study using real data from IEEE VIS papers.
So, while the literature review gives a broad overview, the paper takes it further by creating a
practical tool and applying it to real-world data.
The comparison also lets researchers directly compare t-SNE and PCA projections. Analyzing the
matrices of these projections reveals differences in how they organize data. By using the matrix
to measure these differences, researchers can see where each method works best in keeping
the data's structure intact.
Moreover, visualization matrices help compare features between different parts of the data,
even if they have different dimensions. For example, using Compadre to compare full and half
images of digits from the MNIST dataset, researchers can spot where digits are similar or
different. This helps understand how each part of the image affects its position in the
visualization.
Visualization matrices offer a tangible way to illustrate the disparities between t-SNE and PCA
embeddings of stock market data. Each cell in the matrix signifies the similarity or dissimilarity
between two stocks, as represented in the t-SNE and PCA spaces. By scrutinizing these
matrices, analysts can discern how effectively each method preserves the relationships among
stocks, shedding light on their relative strengths and weaknesses.
By delving into visualization matrices, researchers can uncover nuanced patterns and
discrepancies in how stocks are clustered by t-SNE and PCA. This exploration enables a deeper
understanding of the performance of each approach and provides valuable insights for
decision-making in stock market analysis. Ultimately, leveraging visualization matrices as a tool
for comparing t-SNE and PCA empowers analysts to make more informed investment decisions
and effectively manage risks in the dynamic stock market landscape.
Discussion:
Limitation:
Limited Validation: While the Compadre tool is demonstrated through a case study, the paper
lacks extensive validation across diverse datasets. Further validation studies using different
types of high-dimensional data would strengthen the generalizability of the findings. For
instance, using stock market data sets for exploration could be beneficial. These datasets
include daily stock prices, trading volumes, financial ratios, and more, constituting high-
dimensional data due to the time-series nature and complex interrelations inherent in stock
market data.
Challenge:
Financial data often comprises numerous features and high dimensions, posing challenges in
computation and visualization. Additionally, financial datasets commonly contain noise and
missing values, which can affect the accuracy and robustness of Compadre's analysis of data
similarity. Therefore, effective noise handling and missing value imputation during data
preprocessing are essential to ensure the reliability of the analysis results.
Reflect on the implications of the findings from the comparison and analysis.
So, the next step of study needs to focus on how to ensure that the Compadre tool can
effectively handle large-scale and high-dimensional datasets while maintaining the clarity and
interpretability of the visualization results. Additionally, because the interpretability of financial
data is crucial for decision-makers, the Compadre tool needs to provide interactive functionality
and visual explanations, enabling users to gain a deeper understanding of the relationships and
patterns between data points and facilitate more effective decision-making and analysis.
Conclusion:
In summary, the content underscores the importance of clear data visualization, especially for
dealing with complex datasets like those in stock market analysis. Techniques like t-Distributed
Stochastic Neighbor Embedding (t-SNE) and Principal Component Analysis (PCA) help us
understand intricate data structures, aiding decision-making and comprehension.
The introduction of visualization matrices through tools like Compadre shows promise for
comparing different dimensionality reduction methods. However, further validation studies
across diverse datasets are needed to ensure these findings apply broadly. Challenges like
handling large datasets and noisy financial data highlight the need for robust visualization tools
that maintain clarity.
Looking forward, future research should focus on refining visualization techniques to address
these challenges. Exploring interactive and immersive visualization methods could further
enhance data understanding. By advancing data visualization, researchers can empower
decision-makers across domains to glean insights and make informed choices.
References:
[1] Markowitz, H. (1952). Portfolio selection. The Journal of Finance.
[2] Ren, R., et al. (2017). A review of factor models: Some new perspectives. The Review of
Financial Studies.
[3] Carvalho, C. M., et al. (2010). Dynamic stock selection strategies: A structured factor model
framework. In The ninth valencia international meeting.
[4] Cooper, M. J., et al. (2004). Market states and momentum. The Journal of Finance.
[5] Da Costa Jr, N., et al. (2005). Stock selection based on cluster analysis. Economics Bulletin.
[6] Ding, C., et al. (2002). Adaptive dimension reduction for clustering high-dimensional data. In
IEEE international conference on data mining.
[7] Brida, J. G., et al. (2010). Hierarchical structure of the German stock market. Expert Systems
with Applications.
[8] Huang, J., et al. (2011). Asset pricing in the frequency domain: Theory and empirical
evidence. Journal of Financial Economics.
[9] Fulga, C., et al. (2009). Dimensionality reduction in market data. Journal of Business
Research.
[10] Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with
neural networks. Science.
[11] Tajunisha, N., & Saravanan, V. (2010). A review on dimensionality reduction techniques.
International Journal of Data Mining & Knowledge Management Process.
[12] Peachavanish, R. (2016). Cluster-based stock selection strategies: Evidence from Thailand.
Journal of Multinational Financial Management.
[13] Tabak, B. M., et al. (2010). Topological properties of stock market networks: The case of
Brazil. Physica A: Statistical Mechanics and its Applications.
[14] Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: A review and recent
developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and
Engineering Sciences.