Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

X UNIVERSITY

X Faculty

X Department

PROJECT PROPOSAL:

COVID-19 news impact on stock prices:

sentiment analysis, event studies, structural breaks analysis

FIRST NAME LAST NAME

COURSE NAME

PROFESSOR

PROFESSOR NAME

CITY

YEAR

1
Table of Contents

Abstract ......................................................................................................................................

Introduction ............................................................................................................................... 4

Literature Review....................................................................................................................... 7

Theoretical Context on Relationship Between News and Markets.......................................

Sentiment Analysis Techniques ...........................................................................................

Event Studies Techniques ....................................................................................................

Machine Learning Algorithms .............................................................................................

Methodology ............................................................................................................................

11

Data Collection and Description ........................................................................................

11

Procedures and Design .......................................................................................................

12

Anticipated Results ..................................................................................................................

14

Conclusion ...............................................................................................................................

15

References ............................................................................................................................... 16

2
Abstract

This paper examines the predictability of stock movements using sentiments from the

news coverage of the COVID-19 pandemic. A lot of research has been done on the empirical

relationship between stock movements using text mining. However, not enough attention has

been paid to immediate market sentiments during crises. In this paper, several state-of-the-art

machine learning techniques are implemented to determine whether the sentiment measures of

news coverage can improve predictive potential for investors’ models. The expected result of

the research is that the signal classification algorithms, which are based on sentiment analysis,

yield higher accuracy than guessing probability on a binary factor. Thus, it would distinguish

areas where the efficient market hypothesis does not hold for the American stock market.

3
Introduction

Background.

News related to the financial world plays a key role for the investors making decisions

on financial markets. Deriving information from certain news articles on different companies

or markets, investors can make predictions on future movements of stock valuations, leading

to potential financial gains. The methodology of news processing is substantially broad,

however, there are no superior methods to interpret information coming from the news. The

most common distinction of the news interpretation methods is the division between

fundamental analysis methods and technical analysis methods. Fundamental methods require

extensive analysis of the qualitative information of companies and markets, while technical

methods exclusively focus on historic quantitative data of stock valuations.

There is, however, a subset of methods that lies in-between these two major

categories, which involves quantification of news (which originally tend to be qualitative

rather than quantitative). This is done through the use of sentiment analysis methods –

methods of natural language processing that classify and categorize the set of texts into

divisions of positive and negative texts on a certain scale. Process of feature engineering via

sentiment analysis can create insights into the state of the stock market at given time

moments, hinting at what the investor expectations and decisions are. Previous research in

behavioral finance has suggested that investors tend to make emotion-driven decisions,

depending on the extent of their optimism towards certain markets and companies (Bollen et

al., 2011) [1].

Problem Statement.

In this paper, a specific subset of news is going to be studied – the news coverage of

the COVID-19 pandemic in 2020. The goal of the paper is to determine if there has existed

any discrepancy in the market efficiency regarding how the news coverage of COVID-19 has

impacted the valuations of stock of American corporations. More specifically, this paper aims

4
to construct a sentiment-based algorithm, which would determine if there has been a

consistent lag between the release of the news regarding certain companies and markets and

the time the market has efficiently reflected the information from the news.

Research Question.

To what extent has the Efficient Market Hypothesis (EMH) stood throughout the

COVID-19 pandemic on the American stock market, i.e. is it possible to obtain extra profits

on American stocks using sentiment-based analysis of the news coverage of the COVID-19

pandemic?

Aims and Objectives.

The key objectives of this research are the following:

 To identify major text analysis methods and techniques in the context of financial

markets research.

 To collect an extensive database of news coverage of the COVID-19 pandemic (news

articles, which detail the spread of the virus over the world and can be tracked to specific

companies, regions, and periods that they discuss/affect).

 To apply relevant sentiment metrics on the collected news articles, creating a timeline of

the overall news coverage sentiment towards the spread of the pandemic.

 To compile a machine learning-based dynamic portfolio on signals from the sentiment

analysis, and compare the performance of that portfolio to the performance of a baseline

portfolio, created using time series and structural breaks analysis.

Professional Significance.

Considering that the COVID-19 pandemic is a situation of a market shock, an analysis

of news sentiments may prove to be more relevant than an analysis based solely on technical

indicators, which naturally often overlook the risks of crisis shocks. Thus, this research has

the potential to determine whether news coverage plays a significant role in the decision-

making of investors. So far, there has been little discussion of news impact on financial
5
markets during a crisis. Hence, the research gap this paper will attempt to cover is specifically

the effectiveness of sentiment analysis use on stock markets during a crisis shock situation.

Delimitations of the Study.

As is the case for most machine learning problems, the best performing algorithm

greatly depends on the specifications of the problem, and with any of the specifications

changed, the performance of certain models can vastly differ. In this paper, only American

stocks included in the S&P 500 index will be considered, meaning that the conclusions and

implications might not fully apply to other stock markets, for instance, the Russian stock

market. This choice of stocks is explained by the overall stability of the American stock

market – compared to other regional stock exchanges, there is less systemic risk, which can

overshadow the actual impact of the news coverage. At the moment, the collected news

database consists of Bloomberg and Reuters articles from March 21 st, 2020 to December 31st,

2020. Since the pandemic has begun several months before that, the database ideally would

still have to be supplemented with the articles from the beginning of 2020 for completeness.

6
Literature Review

Theoretical context on relationship between news and markets.

Several key models of information impact on the financial markets were introduced,

out of which the most substantial is the Efficient Market Hypothesis (EMH), which was first

developed in (Fama, 1965) [4]. According to EMH, all asset prices on stock markets reflect

all available information, meaning that stock-associated news and events have an impact on

the dynamics of stock prices. The hypothesis was further empirically confirmed to different

extents in (Malkiel, Fama, 1970) [9], which discusses the three forms of EMH – weak, semi-

strong and strong, each of them implying different information sets being reflected in the

stock prices. Depending on the form of EMH, a conclusion is made on whether or not an

investor can achieve excess returns using specific information to make decisions. In (Fama,

1965) [4] a concept of Random Walk Hypothesis (RWH) is introduced – a model, which

suggests that all stock movements are independent of each other and have the same

distribution, meaning that it is not possible to predict future movements of the stock using

past trends. EMH and RWH concepts are interconnected, as RWH requires immediate

informational efficiency.

More recently, however, literature has emerged that offers a contradictory theory on

market efficiency. (Lo, 2005) [8] introduces the concept of Adaptive Market Hypothesis

(AMH) – a substantially different interpretation of the relationship between news and stock

movements, which analyzes the market efficiency from the standpoint of behavioral finance.

Lo stated, “Prices reflect as much information as dictated by the combination of

environmental conditions and the number and nature of "species" in the economy” (Lo, 2005,

p. 19), suggesting that market inefficiency is often present, for instance, in cases of bubbles

and crises. Indeed, empirical research suggests that adaptive market theory is more accurately

describing the stock behavior than the efficient market theory (Urquhart, Hudson, 2013) [13].

In this paper, US, UK, and Japanese stock markets are analyzed, and the overall conclusion is

7
that neither of the markets has consistently been efficient, which potentially suggests that

timely analysis of market information can serve as an instrument to capture excess returns on

stock markets.

Sentiment analysis and feature processing techniques.

Researchers have previously attempted to benchmark different approaches towards

feature processing based on textual information in the context of financial markets analysis.

(Hagenau et al., 2013) [5] presents a summary of 11 papers, which build machine learning

algorithms on features generated from text mining and sentiment analysis. The described 11

papers differ from each other in used stock datasets, text mining feature types (bag-of-words,

n-grams, word combinations), as well as the inclusion of market feedback and specific

machine learning models. The best performing paper yields the accuracy of 65.1% correctly

identified signals, operating based on a support vector machine and word combinations.

However, such results are not consistent throughout other papers, as many papers reach

inconclusive or incompatible results. Overall, there is still not much consensus amongst

researchers on whether sentiment analysis can consistently be used to predict stock market

returns (Schoen et al., 2013) [12].

Another systematic review of text mining research is presented in (Nassirtoussi et al.,

2014) [10], and examines 24 papers, updating the set of papers reviewed in (Hagenau et al.,

2013) [5]. This review attempts to introduce a well-rounded theoretical framework for feature

engineering based on text mining. Reviewed papers differ in types of mined texts (financial

news, corporate filings, financial disclosures, tweets), text sources (various platforms, e.g.

Bloomberg, Yahoo! Finance, Twitter), and other text-related specifications. (Kalyani et al.,

2016) [6] is a more recent example of research, which performs sentiment analysis to derive

signals based on bag-of-words analysis of the news related to stocks. This paper presents a

tidy step-by-step approach, which will potentially be replicated in this research. It should be

noted though that the majority of papers in both metareviews use daily historical stock data.

8
In this research intraday stock data will be looked at to capture the immediate investor

reaction to hardly predictable events that were present throughout the pandemic, such as

border closures and lockdown restrictions. This premise is justified by some of the previous

research, for instance, (Ding, 2014) [2] suggests that the impact of news coverage is mostly

short-term, providing empirical evidence with the Google stock.

Event studies techniques.

A more traditional approach to stock movement prediction through the use of news is

event studies, which require time series analysis of the historical stock data, as well as

structural breaks implementation at the time points of the most impactful events in the

reviewed period. Pesakovic et al. (2017) [11] provides an example of an event study involving

three multinational companies, and the described impactful events included American

presidential elections. The most obvious limitation of the event study method is the fact that

the noteworthy events should be handpicked, rather than automatically collected without

subjective preference. Thus, in this research event studies methodology potentially will be

applied to just a handful of companies as a baseline measure, whereas a more extensive

sentiment analysis will be applied to all stocks in consideration.

A lot of market-specific research has been done on the time series analysis of stock

data, using structural breaks due to major economic events. (Ewing, Malik, 2016) [3] provides

an example of an event study done to determine the indirect relationship between oil prices

and the US stock market. Researchers conduct a structural breaks analysis on the stock

market, in which breaks are detected through analysis of oil prices volatility. That approach

allows to capture the volatility, which is not directly explained purely by the quantitative data.

Machine learning algorithms in stock movement prediction.

9
Previously mentioned summary reviews of existing papers also discuss a variety of

different machine learning models used for market signal classification. (Nassirtoussi et al.,

2014) [10] categorized the papers by the following models present in previous research:

 Support Vector Machine (SVM).

 Regression algorithms (including Support Vector Regression (SVR)).

 Naïve Bayes classifier.

 Algorithms based on decision rules and trees.

 Combinatory algorithms.

 Multi-algorithm experiments.

A more recent work (Kraus, Feuerriegel, 2017) [7] aside from the aforementioned

methods considers several machine learning techniques developed after the release of

(Nassirtoussi et al., 2014) [10]. This paper mainly concentrates on decision trees-based

classification algorithms, such as Random Forest and Gradient Boosting, as well as deep

learning architectures RNN (Recurrent Neural Network), and its extension, LSTM (Long

Short-Term Memory network). In their work, the deep learning architectures yield higher

accuracy than the more traditional approaches, with the highest performing models reaching

an accuracy of 60.1% correctly classified abnormal returns, although the work was done using

financial disclosures rather than news articles.

10
Methodology

Data Collection, Description and Preprocessing.

As was discussed in the literature review, in the context of market research that is not

specifically tied to some period or major event, different sources of information could be used

for different analysis purposes (news, financial disclosures, tweets, etc.). Since this research is

aimed at capturing investor decision-making during a rapidly evolving crisis, daily news

articles would serve a better purpose rather than the financial disclosures of companies.

Previous research involving sentiment analysis of financial news agrees on Bloomberg [14]

and Reuters [15] being the key news platforms that cover the vast majority of relevant news.

The initial news database, which currently consists of roughly 187 thousand news articles, is

structured in the following way:

ID Timestamp Source Title Description


1 2020-03-21 Bloomberg New York City-area The FAA said air traffic was

18:28:18 airports halt air traffic halted at New York City-area

as coronavirus causes airports after coronavirus

staffing issues causes staffing...


2 2020-03-21 Reuters Airline CEOs promise CEOs from America’s largest

17:59:00 to eliminate dividends publicly traded airlines sent an

if Congress passes urgent letter…

$29B coronavirus bill


… … … … …

As can be seen in the initial database, the articles are not originally structured by

affected companies or markets, and some articles might not even discuss pandemic-related

issues. Thus, the following preprocessing measures would need to be undertaken:

11
 The database has to be filtered by a corpus of keywords that would indicate that the

article is related to the coronavirus pandemic (e.g. “coronavirus”, “pandemic”,

“lockdown”, etc.). If a given article does not include at least one of the words from the

corpus, then it has to be omitted from the database.

 The database has to be restructured so that each entry refers to a specific company, stock

of which is included in the S&P 500 index. If a given article mentions several companies,

the article needs to be included several times in the database, once per company.

As for the intraday historical stock data, Marketstack [16] was used to download the

hourly stock data for the stocks in the S&P 500 index. The stock database is structured in the

following way:

Ticker Company Timestamp Price


AAPL Apple Inc. 2020-03-23 12:00:00 57.02
AAPL Apple Inc. 2020-03-23 13:00:00 57.12
… … … …

At the moment, both the news database and the stock database include data from

March 21st, 2020 to December 31st, 2020.

Procedures and Design.

The pipeline of the experiment will be structured in the following way:

 News transformation. For each article, a measure of sentiment has to be calculated. This

will be done using the dictionary-based approach, as well as the previously mentioned

LSTM architecture.

 Feature Engineering. Stock data will be used to produce technical indicator data

(numerical), which would expand the feature corpus used by classification algorithms.

 Signal Classification. Predictive models, which were discussed in the literature review,

will be applied to the generated features. At each time point, a market signal about a

12
specific stock will be classified as either positive or negative using machine learning

algorithms.

 Portfolio Creation and Comparison. Based on the classified signals, a stock portfolio

for each machine learning algorithm will be built. The performance of these portfolios

would then be compared to the baseline portfolio built based on structural breaks analysis

based on the Sharpe ratio. Several sentiment thresholds will be tested while creating

portfolios, meaning that in some portfolios only articles with highly positive or negative

sentiments would lead to actions, while in other portfolios even mild sentiments would be

considered significant.

13
Anticipated Results

The main anticipated result of this research project is either the confirmation or the

rejection of the market efficiency in the American stock market during the COVID-19

pandemic. This result would mainly depend on the performance of the machine learning-

based stock portfolio – if these portfolios yield statistically higher returns than the baseline

portfolios, or the S&P 500 market index itself, then a conclusion would be made that the US

stock market is not informationally efficient. This result would be a testament to the Adaptive

Market Hypothesis, suggesting that the news coverage does not immediately get reflected in

the stock prices. Moreover, the optimal sentiment threshold for stock inclusion or omission in

the portfolio would be determined based on the performance of portfolios.

Another anticipated result of the project is that the accuracy of the signal classification

models turns out to be significantly higher than 50% (guessing probability for a binary

predicted parameter). Even if the portfolio returns are not higher than the S&P 500 returns, an

accurate sentiment classification algorithm would be considered a success in the context of

this research.

14
Conclusion

In this research, the market efficiency of the US stock market is tested in the context

of the COVID-19 pandemic in 2020. A lot of research has been previously done on the

empirical relationship between stock movements using text mining, however, not enough

attention has been paid to immediate market sentiments during crises.

After the review of the previous work, sentiment analysis of news articles covering the

development of the pandemic is conducted, creating a timeline of the overall market

sentiment. The sentiments, along with other features derived from technical analysis, are then

used in machine-learning algorithms to classify market signals, on which stock portfolios are

based. The overall expectation is that the portfolios based on sentiment analysis features

would outperform the baseline portfolios based on structural breaks analysis, although it is not

clear, which machine learning models would prove to be the most successful. This result

would allow to identify a more accurate model of news and stock market relationship

(Efficient Market Hypothesis or Adaptive Market Hypothesis), therefore confirming or

rejecting market efficiency during the crisis.

This research could potentially be expanded in various ways, for instance, news

articles can be further classified by categories (industry, country, market) to generate extra

features. Furthermore, stock markets of other countries can be analyzed similarly, as results

are not guaranteed to be transitional across all stock markets.

15
References

1. Bollen, J., Mao, H. and Zeng, X. (2011). Twitter mood predicts the stock market.

Journal of computational science, 2(1), 1-8.

2. Ding, X., Zhang, Y., Liu, T., & Duan, J. (2014, October). Using structured events to

predict stock price movement: An empirical investigation. In Proceedings of the

2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

(pp. 1415-1425).

3. Ewing, B. T., & Malik, F. (2016). Volatility spillovers between oil prices and the

stock market under structural breaks. Global Finance Journal, 29, 12-23.

4. Fama, E. F. (1965). The behavior of stock-market prices. The Journal of Business,

38(1), 34-105.

5. Hagenau, M., Liebmann, M., & Neumann, D. (2013). Automated news reading:

Stock price prediction based on financial news using context-capturing features.

Decision Support Systems, 55(3), 685-697.

6. Kalyani, J., Bharathi, P., & Jyothi, P. (2016). Stock trend prediction using news

sentiment analysis.

7. Kraus, M., & Feuerriegel, S. (2017). Decision support from financial disclosures

with deep neural networks and transfer learning. Decision Support Systems, 104, 38-

48.

8. Lo, A. W. (2005). Reconciling efficient markets with behavioral finance: the

adaptive markets hypothesis. Journal of investment consulting, 7(2), 21-44.

9. Malkiel, B. G. and Fama, E. F. (1970). Efficient capital markets: A review of theory

and empirical work. The Journal of Finance, 25(2), 383-417.

10. Nassirtoussi, A. K., Aghabozorgi, S., Wah, T. Y., & Ngo, D. C. L. (2014). Text

mining for market prediction: A systematic review. Expert Systems with

Applications, 41(16), 7653-7670.

16
11. Pesakovic, G., & Ndekugri, A. (2017). Using Event Studies to Evaluate Stock

Market Return Performance. Global Journal of Management And Business

Research.

12. Schoen, H., Gayo-Avello, D., Metaxas, P. T., Mustafaraj, E., Strohmaier, M., &

Gloor, P. (2013). The power of prediction with social media. Internet Research,

23(5), 528-543.

13. Urquhart, A., & Hudson, R. (2013). Efficient or adaptive markets? Evidence from

major stock markets using very long run historic data. International Review of

Financial Analysis, 28, 130-142.

14. Bloomberg. Bloomberg.com. URL: https://www.bloomberg.com/.

15. Reuters. Reuters.com. URL: https://www.reuters.com/.

16. Marketstack. Marketstack.com. URL: https://www.marketstack.com/.

17

You might also like