X X Faculty X Department: University

X UNIVERSITY
X Faculty
X Department
PROJECT PROPOSAL:
COVID-19 news impact on stock prices:
sentiment analysis, event studies, structural breaks analysis
FIRST NAME LAST NAME
COURSE NAME
PROFESSOR
PROFESSOR NAME
CITY
YEAR
1
Table of Contents
Abstract ......................................................................................................................................
Introduction ............................................................................................................................... 4
Literature Review....................................................................................................................... 7
Theoretical Context on Relationship Between News and Markets.......................................
Sentiment Analysis Techniques ...........................................................................................
Event Studies Techniques ....................................................................................................
Machine Learning Algorithms .............................................................................................
Methodology ............................................................................................................................
11
Data Collection and Description ........................................................................................
11
Procedures and Design .......................................................................................................
12
Anticipated Results ..................................................................................................................
14
Conclusion ...............................................................................................................................
15
References ............................................................................................................................... 16
2
Abstract
This paper examines the predictability of stock movements using sentiments from the
news coverage of the COVID-19 pandemic. A lot of research has been done on the empirical
relationship between stock movements using text mining. However, not enough attention has
been paid to immediate market sentiments during crises. In this paper, several state-of-the-art
machine learning techniques are implemented to determine whether the sentiment measures of
news coverage can improve predictive potential for investors’ models. The expected result of
the research is that the signal classification algorithms, which are based on sentiment analysis,
yield higher accuracy than guessing probability on a binary factor. Thus, it would distinguish
areas where the efficient market hypothesis does not hold for the American stock market.
3
Introduction
Background.
News related to the financial world plays a key role for the investors making decisions
on financial markets. Deriving information from certain news articles on different companies
or markets, investors can make predictions on future movements of stock valuations, leading
to potential financial gains. The methodology of news processing is substantially broad,
however, there are no superior methods to interpret information coming from the news. The
most common distinction of the news interpretation methods is the division between
fundamental analysis methods and technical analysis methods. Fundamental methods require
extensive analysis of the qualitative information of companies and markets, while technical
methods exclusively focus on historic quantitative data of stock valuations.
There is, however, a subset of methods that lies in-between these two major
categories, which involves quantification of news (which originally tend to be qualitative
rather than quantitative). This is done through the use of sentiment analysis methods –
methods of natural language processing that classify and categorize the set of texts into
divisions of positive and negative texts on a certain scale. Process of feature engineering via
sentiment analysis can create insights into the state of the stock market at given time
moments, hinting at what the investor expectations and decisions are. Previous research in
behavioral finance has suggested that investors tend to make emotion-driven decisions,
depending on the extent of their optimism towards certain markets and companies (Bollen et
al., 2011) [1].
Problem Statement.
In this paper, a specific subset of news is going to be studied – the news coverage of
the COVID-19 pandemic in 2020. The goal of the paper is to determine if there has existed
any discrepancy in the market efficiency regarding how the news coverage of COVID-19 has
impacted the valuations of stock of American corporations. More specifically, this paper aims
4
to construct a sentiment-based algorithm, which would determine if there has been a
consistent lag between the release of the news regarding certain companies and markets and
the time the market has efficiently reflected the information from the news.
Research Question.
To what extent has the Efficient Market Hypothesis (EMH) stood throughout the
COVID-19 pandemic on the American stock market, i.e. is it possible to obtain extra profits
on American stocks using sentiment-based analysis of the news coverage of the COVID-19
pandemic?
Aims and Objectives.
The key objectives of this research are the following:
 To identify major text analysis methods and techniques in the context of financial
markets research.
 To collect an extensive database of news coverage of the COVID-19 pandemic (news
articles, which detail the spread of the virus over the world and can be tracked to specific
companies, regions, and periods that they discuss/affect).
 To apply relevant sentiment metrics on the collected news articles, creating a timeline of
the overall news coverage sentiment towards the spread of the pandemic.
 To compile a machine learning-based dynamic portfolio on signals from the sentiment
analysis, and compare the performance of that portfolio to the performance of a baseline
portfolio, created using time series and structural breaks analysis.
Professional Significance.
Considering that the COVID-19 pandemic is a situation of a market shock, an analysis
of news sentiments may prove to be more relevant than an analysis based solely on technical
indicators, which naturally often overlook the risks of crisis shocks. Thus, this research has
the potential to determine whether news coverage plays a significant role in the decision-
making of investors. So far, there has been little discussion of news impact on financial
5
markets during a crisis. Hence, the research gap this paper will attempt to cover is specifically
the effectiveness of sentiment analysis use on stock markets during a crisis shock situation.
Delimitations of the Study.
As is the case for most machine learning problems, the best performing algorithm
greatly depends on the specifications of the problem, and with any of the specifications
changed, the performance of certain models can vastly differ. In this paper, only American
stocks included in the S&P 500 index will be considered, meaning that the conclusions and
implications might not fully apply to other stock markets, for instance, the Russian stock
market. This choice of stocks is explained by the overall stability of the American stock
market – compared to other regional stock exchanges, there is less systemic risk, which can
overshadow the actual impact of the news coverage. At the moment, the collected news
database consists of Bloomberg and Reuters articles from March 21 st, 2020 to December 31st,
2020. Since the pandemic has begun several months before that, the database ideally would
still have to be supplemented with the articles from the beginning of 2020 for completeness.
6
Literature Review
Theoretical context on relationship between news and markets.
Several key models of information impact on the financial markets were introduced,
out of which the most substantial is the Efficient Market Hypothesis (EMH), which was first
developed in (Fama, 1965) [4]. According to EMH, all asset prices on stock markets reflect
all available information, meaning that stock-associated news and events have an impact on
the dynamics of stock prices. The hypothesis was further empirically confirmed to different
extents in (Malkiel, Fama, 1970) [9], which discusses the three forms of EMH – weak, semi-
strong and strong, each of them implying different information sets being reflected in the
stock prices. Depending on the form of EMH, a conclusion is made on whether or not an
investor can achieve excess returns using specific information to make decisions. In (Fama,
1965) [4] a concept of Random Walk Hypothesis (RWH) is introduced – a model, which
suggests that all stock movements are independent of each other and have the same
distribution, meaning that it is not possible to predict future movements of the stock using
past trends. EMH and RWH concepts are interconnected, as RWH requires immediate
informational efficiency.
More recently, however, literature has emerged that offers a contradictory theory on
market efficiency. (Lo, 2005) [8] introduces the concept of Adaptive Market Hypothesis
(AMH) – a substantially different interpretation of the relationship between news and stock
movements, which analyzes the market efficiency from the standpoint of behavioral finance.
Lo stated, “Prices reflect as much information as dictated by the combination of
environmental conditions and the number and nature of "species" in the economy” (Lo, 2005,
p. 19), suggesting that market inefficiency is often present, for instance, in cases of bubbles
and crises. Indeed, empirical research suggests that adaptive market theory is more accurately
describing the stock behavior than the efficient market theory (Urquhart, Hudson, 2013) [13].
In this paper, US, UK, and Japanese stock markets are analyzed, and the overall conclusion is
7
that neither of the markets has consistently been efficient, which potentially suggests that
timely analysis of market information can serve as an instrument to capture excess returns on
stock markets.
Sentiment analysis and feature processing techniques.
Researchers have previously attempted to benchmark different approaches towards
feature processing based on textual information in the context of financial markets analysis.
(Hagenau et al., 2013) [5] presents a summary of 11 papers, which build machine learning
algorithms on features generated from text mining and sentiment analysis. The described 11
papers differ from each other in used stock datasets, text mining feature types (bag-of-words,
n-grams, word combinations), as well as the inclusion of market feedback and specific
machine learning models. The best performing paper yields the accuracy of 65.1% correctly
identified signals, operating based on a support vector machine and word combinations.
However, such results are not consistent throughout other papers, as many papers reach
inconclusive or incompatible results. Overall, there is still not much consensus amongst
researchers on whether sentiment analysis can consistently be used to predict stock market
returns (Schoen et al., 2013) [12].
Another systematic review of text mining research is presented in (Nassirtoussi et al.,
2014) [10], and examines 24 papers, updating the set of papers reviewed in (Hagenau et al.,
2013) [5]. This review attempts to introduce a well-rounded theoretical framework for feature
engineering based on text mining. Reviewed papers differ in types of mined texts (financial
news, corporate filings, financial disclosures, tweets), text sources (various platforms, e.g.
Bloomberg, Yahoo! Finance, Twitter), and other text-related specifications. (Kalyani et al.,
2016) [6] is a more recent example of research, which performs sentiment analysis to derive
signals based on bag-of-words analysis of the news related to stocks. This paper presents a
tidy step-by-step approach, which will potentially be replicated in this research. It should be
noted though that the majority of papers in both metareviews use daily historical stock data.
8
In this research intraday stock data will be looked at to capture the immediate investor
reaction to hardly predictable events that were present throughout the pandemic, such as
border closures and lockdown restrictions. This premise is justified by some of the previous
research, for instance, (Ding, 2014) [2] suggests that the impact of news coverage is mostly
short-term, providing empirical evidence with the Google stock.
Event studies techniques.
A more traditional approach to stock movement prediction through the use of news is
event studies, which require time series analysis of the historical stock data, as well as
structural breaks implementation at the time points of the most impactful events in the
reviewed period. Pesakovic et al. (2017) [11] provides an example of an event study involving
three multinational companies, and the described impactful events included American
presidential elections. The most obvious limitation of the event study method is the fact that
the noteworthy events should be handpicked, rather than automatically collected without
subjective preference. Thus, in this research event studies methodology potentially will be
applied to just a handful of companies as a baseline measure, whereas a more extensive
sentiment analysis will be applied to all stocks in consideration.
A lot of market-specific research has been done on the time series analysis of stock
data, using structural breaks due to major economic events. (Ewing, Malik, 2016) [3] provides
an example of an event study done to determine the indirect relationship between oil prices
and the US stock market. Researchers conduct a structural breaks analysis on the stock
market, in which breaks are detected through analysis of oil prices volatility. That approach
allows to capture the volatility, which is not directly explained purely by the quantitative data.
Machine learning algorithms in stock movement prediction.
9
Previously mentioned summary reviews of existing papers also discuss a variety of
different machine learning models used for market signal classification. (Nassirtoussi et al.,
2014) [10] categorized the papers by the following models present in previous research:
 Support Vector Machine (SVM).
 Regression algorithms (including Support Vector Regression (SVR)).
 Naïve Bayes classifier.
 Algorithms based on decision rules and trees.
 Combinatory algorithms.
 Multi-algorithm experiments.
A more recent work (Kraus, Feuerriegel, 2017) [7] aside from the aforementioned
methods considers several machine learning techniques developed after the release of
(Nassirtoussi et al., 2014) [10]. This paper mainly concentrates on decision trees-based
classification algorithms, such as Random Forest and Gradient Boosting, as well as deep
learning architectures RNN (Recurrent Neural Network), and its extension, LSTM (Long
Short-Term Memory network). In their work, the deep learning architectures yield higher
accuracy than the more traditional approaches, with the highest performing models reaching
an accuracy of 60.1% correctly classified abnormal returns, although the work was done using
financial disclosures rather than news articles.
10
Methodology
Data Collection, Description and Preprocessing.
As was discussed in the literature review, in the context of market research that is not
specifically tied to some period or major event, different sources of information could be used
for different analysis purposes (news, financial disclosures, tweets, etc.). Since this research is
aimed at capturing investor decision-making during a rapidly evolving crisis, daily news
articles would serve a better purpose rather than the financial disclosures of companies.
Previous research involving sentiment analysis of financial news agrees on Bloomberg [14]
and Reuters [15] being the key news platforms that cover the vast majority of relevant news.
The initial news database, which currently consists of roughly 187 thousand news articles, is
structured in the following way:
ID Timestamp Source Title Description

1 2020-03-21 Bloomberg New York City-area The FAA said air traffic was
18:28:18 airports halt air traffic halted at New York City-area
as coronavirus causes airports after coronavirus
staffing issues causes staffing...

2 2020-03-21 Reuters Airline CEOs promise CEOs from America’s largest
17:59:00 to eliminate dividends publicly traded airlines sent an
if Congress passes urgent letter…
$29B coronavirus bill

… … … … …
As can be seen in the initial database, the articles are not originally structured by
affected companies or markets, and some articles might not even discuss pandemic-related
issues. Thus, the following preprocessing measures would need to be undertaken:
11
 The database has to be filtered by a corpus of keywords that would indicate that the
article is related to the coronavirus pandemic (e.g. “coronavirus”, “pandemic”,
“lockdown”, etc.). If a given article does not include at least one of the words from the
corpus, then it has to be omitted from the database.
 The database has to be restructured so that each entry refers to a specific company, stock
of which is included in the S&P 500 index. If a given article mentions several companies,
the article needs to be included several times in the database, once per company.
As for the intraday historical stock data, Marketstack [16] was used to download the
hourly stock data for the stocks in the S&P 500 index. The stock database is structured in the
following way:
Ticker Company Timestamp Price

AAPL Apple Inc. 2020-03-23 12:00:00 57.02
AAPL Apple Inc. 2020-03-23 13:00:00 57.12
… … … …
At the moment, both the news database and the stock database include data from
March 21st, 2020 to December 31st, 2020.
Procedures and Design.
The pipeline of the experiment will be structured in the following way:
 News transformation. For each article, a measure of sentiment has to be calculated. This
will be done using the dictionary-based approach, as well as the previously mentioned
LSTM architecture.
 Feature Engineering. Stock data will be used to produce technical indicator data
(numerical), which would expand the feature corpus used by classification algorithms.
 Signal Classification. Predictive models, which were discussed in the literature review,
will be applied to the generated features. At each time point, a market signal about a
12
specific stock will be classified as either positive or negative using machine learning
algorithms.
 Portfolio Creation and Comparison. Based on the classified signals, a stock portfolio
for each machine learning algorithm will be built. The performance of these portfolios
would then be compared to the baseline portfolio built based on structural breaks analysis
based on the Sharpe ratio. Several sentiment thresholds will be tested while creating
portfolios, meaning that in some portfolios only articles with highly positive or negative
sentiments would lead to actions, while in other portfolios even mild sentiments would be
considered significant.
13
Anticipated Results
The main anticipated result of this research project is either the confirmation or the
rejection of the market efficiency in the American stock market during the COVID-19
pandemic. This result would mainly depend on the performance of the machine learning-
based stock portfolio – if these portfolios yield statistically higher returns than the baseline
portfolios, or the S&P 500 market index itself, then a conclusion would be made that the US
stock market is not informationally efficient. This result would be a testament to the Adaptive
Market Hypothesis, suggesting that the news coverage does not immediately get reflected in
the stock prices. Moreover, the optimal sentiment threshold for stock inclusion or omission in
the portfolio would be determined based on the performance of portfolios.
Another anticipated result of the project is that the accuracy of the signal classification
models turns out to be significantly higher than 50% (guessing probability for a binary
predicted parameter). Even if the portfolio returns are not higher than the S&P 500 returns, an
accurate sentiment classification algorithm would be considered a success in the context of
this research.
14
Conclusion
In this research, the market efficiency of the US stock market is tested in the context
of the COVID-19 pandemic in 2020. A lot of research has been previously done on the
empirical relationship between stock movements using text mining, however, not enough
attention has been paid to immediate market sentiments during crises.
After the review of the previous work, sentiment analysis of news articles covering the
development of the pandemic is conducted, creating a timeline of the overall market
sentiment. The sentiments, along with other features derived from technical analysis, are then
used in machine-learning algorithms to classify market signals, on which stock portfolios are
based. The overall expectation is that the portfolios based on sentiment analysis features
would outperform the baseline portfolios based on structural breaks analysis, although it is not
clear, which machine learning models would prove to be the most successful. This result
would allow to identify a more accurate model of news and stock market relationship
(Efficient Market Hypothesis or Adaptive Market Hypothesis), therefore confirming or
rejecting market efficiency during the crisis.
This research could potentially be expanded in various ways, for instance, news
articles can be further classified by categories (industry, country, market) to generate extra
features. Furthermore, stock markets of other countries can be analyzed similarly, as results
are not guaranteed to be transitional across all stock markets.
15
References
1. Bollen, J., Mao, H. and Zeng, X. (2011). Twitter mood predicts the stock market.
Journal of computational science, 2(1), 1-8.
2. Ding, X., Zhang, Y., Liu, T., & Duan, J. (2014, October). Using structured events to
predict stock price movement: An empirical investigation. In Proceedings of the
2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
(pp. 1415-1425).
3. Ewing, B. T., & Malik, F. (2016). Volatility spillovers between oil prices and the
stock market under structural breaks. Global Finance Journal, 29, 12-23.
4. Fama, E. F. (1965). The behavior of stock-market prices. The Journal of Business,
38(1), 34-105.
5. Hagenau, M., Liebmann, M., & Neumann, D. (2013). Automated news reading:
Stock price prediction based on financial news using context-capturing features.
Decision Support Systems, 55(3), 685-697.
6. Kalyani, J., Bharathi, P., & Jyothi, P. (2016). Stock trend prediction using news
sentiment analysis.
7. Kraus, M., & Feuerriegel, S. (2017). Decision support from financial disclosures
with deep neural networks and transfer learning. Decision Support Systems, 104, 38-
48.
8. Lo, A. W. (2005). Reconciling efficient markets with behavioral finance: the
adaptive markets hypothesis. Journal of investment consulting, 7(2), 21-44.
9. Malkiel, B. G. and Fama, E. F. (1970). Efficient capital markets: A review of theory
and empirical work. The Journal of Finance, 25(2), 383-417.
10. Nassirtoussi, A. K., Aghabozorgi, S., Wah, T. Y., & Ngo, D. C. L. (2014). Text
mining for market prediction: A systematic review. Expert Systems with
Applications, 41(16), 7653-7670.
16
11. Pesakovic, G., & Ndekugri, A. (2017). Using Event Studies to Evaluate Stock
Market Return Performance. Global Journal of Management And Business
Research.
12. Schoen, H., Gayo-Avello, D., Metaxas, P. T., Mustafaraj, E., Strohmaier, M., &
Gloor, P. (2013). The power of prediction with social media. Internet Research,
23(5), 528-543.
13. Urquhart, A., & Hudson, R. (2013). Efficient or adaptive markets? Evidence from
major stock markets using very long run historic data. International Review of
Financial Analysis, 28, 130-142.
14. Bloomberg. Bloomberg.com. URL: https://www.bloomberg.com/.
15. Reuters. Reuters.com. URL: https://www.reuters.com/.
16. Marketstack. Marketstack.com. URL: https://www.marketstack.com/.
17

X X Faculty X Department: University

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

X X Faculty X Department: University

Uploaded by

Copyright:

Available Formats

X UNIVERSITY

COVID-19 news impact on stock prices:

sentiment analysis, event studies, structural breaks analysis

FIRST NAME LAST NAME

Theoretical Context on Relationship Between News and Markets.......................................

Sentiment Analysis Techniques ...........................................................................................

Event Studies Techniques ....................................................................................................

Machine Learning Algorithms .............................................................................................

Data Collection and Description ........................................................................................

Procedures and Design .......................................................................................................

Anticipated Results ..................................................................................................................

to potential financial gains. The methodology of news processing is substantially broad,

methods exclusively focus on historic quantitative data of stock valuations.

categories, which involves quantification of news (which originally tend to be qualitative

al., 2011) [1].

Aims and Objectives.

The key objectives of this research are the following:

 To collect an extensive database of news coverage of the COVID-19 pandemic (news

companies, regions, and periods that they discuss/affect).

 To compile a machine learning-based dynamic portfolio on signals from the sentiment

portfolio, created using time series and structural breaks analysis.

Considering that the COVID-19 pandemic is a situation of a market shock, an analysis

Delimitations of the Study.

Theoretical context on relationship between news and markets.

Lo stated, “Prices reflect as much information as dictated by the combination of

Sentiment analysis and feature processing techniques.

Researchers have previously attempted to benchmark different approaches towards

returns (Schoen et al., 2013) [12].

Another systematic review of text mining research is presented in (Nassirtoussi et al.,

short-term, providing empirical evidence with the Google stock.

Event studies techniques.

applied to just a handful of companies as a baseline measure, whereas a more extensive

sentiment analysis will be applied to all stocks in consideration.

Machine learning algorithms in stock movement prediction.

 Support Vector Machine (SVM).

 Regression algorithms (including Support Vector Regression (SVR)).

 Naïve Bayes classifier.

 Algorithms based on decision rules and trees.

financial disclosures rather than news articles.

Data Collection, Description and Preprocessing.

structured in the following way:

ID Timestamp Source Title Description

18:28:18 airports halt air traffic halted at New York City-area

as coronavirus causes airports after coronavirus

staffing issues causes staffing...

17:59:00 to eliminate dividends publicly traded airlines sent an

if Congress passes urgent letter…

$29B coronavirus bill

issues. Thus, the following preprocessing measures would need to be undertaken:

article is related to the coronavirus pandemic (e.g. “coronavirus”, “pandemic”,

corpus, then it has to be omitted from the database.

Ticker Company Timestamp Price

March 21st, 2020 to December 31st, 2020.

Procedures and Design.

The pipeline of the experiment will be structured in the following way:

the portfolio would be determined based on the performance of portfolios.

accurate sentiment classification algorithm would be considered a success in the context of

attention has been paid to immediate market sentiments during crises.

development of the pandemic is conducted, creating a timeline of the overall market

(Efficient Market Hypothesis or Adaptive Market Hypothesis), therefore confirming or

rejecting market efficiency during the crisis.

are not guaranteed to be transitional across all stock markets.