Sentiment Analysis of Tweets and Financial News Headlines: Wyatt Steen

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 7

Sentiment Analysis of

Tweets and Financial News


Headlines
Wyatt Steen
Machine Learning – Sentiment Analysis

Project Overview

Goal:

- Retrieve recent tweets and financial news headlines that mention a keyword (in this case a stock ticker).

- Predict the sentiment of each word in the headline or tweet with machine learning.

- Calculate the sentiment of each headline or tweet by averaging the sentiment score of each word in the

string.

Research for a reason.


Machine Learning – Sentiment Analysis

Datasets
Lexicons: Word Embeddings:

-Labeled English Words (Chen, Y., & - English (fastText English 16


Skiena, S. (2014). Building Sentiment Billion Token Word Embedding
Lexicons for All Major Languages. In Support Package, MATLAB®)
ACL (2) (pp. 383-389).)
- German (E. Grave*, P. Labeled Tweets:
-Labeled German Words Bojanowski*, P. Gupta, A. Joulin,
(Chen, Y., & Skiena, S. (2014). T. Mikolov, Sentiment140 Dataset with 1.6
Building Sentiment Lexicons for All Learning Word Vectors for 157 La million tweets
Major Languages. In ACL (2) (pp. nguages (Go, A., Bhayani, R. and Huang, L.,
383-389).) ) 2009. Twitter sentiment classification
using distant supervision. CS224N
-Labeled Japanese Words - Japanese (E. Grave*, P. Project Report, Stanford, 1(2009),
(Chen, Y., & Skiena, S. (2014). Bojanowski*, P. Gupta, A. Joulin, p.12.)
Building Sentiment Lexicons for All T. Mikolov,
Major Languages. In ACL (2) (pp. Learning Word Vectors for 157 La
383-389).) nguages
)
-Labeled Korean Words
(Chen, Y., & Skiena, S. (2014). - Korean(E. Grave*, P.
Building Sentiment Lexicons for All Bojanowski*, P. Gupta, A. Joulin,
Major Languages. In ACL (2) (pp. T. Mikolov,
383-389).) Learning Word Vectors for 157 La
nguages
Research for a reason.
)
Machine Learning – Sentiment Analysis

Feature Extraction
Feature Extraction
- Convert each word in the processed text into word vectors.
- Word vectors are vectors of numbers that represent the meaning of a word. This allows for mapping of words
based on their relationship with other words. With word vectors, one can execute mathematical operations with
words.

Example:

King – Man + Woman = Queen


or
King is to Man as Queen is to Woman

Where each word above is represented as a 1x300 numerical vector.

- This was done by using the pretrained word embedding models. Each word embedding model used has a
vocabulary of one million or more words and their vector representation (text corpus). These pretrained models
predict the vector representation of new words by their relationships with word vectors in the model’s existing
text corpus.

Preprocessing
- Tokenize the text
- Erase punctuation in the text
- Remove stop words (I, am, the, was, etc.)
- Convert text to lowercase
- Remove HTML tags and URLs.

Research for a reason.


Machine Learning – Sentiment Analysis

Models Investigated
- 150 machine learning models were

investigated.

- Each of them were of type Ensemble, K

Nearest Neighbor, Naïve Bayes, Support

Vector Machine, or Decision Tree with

different parameters.

- The MATLAB® function, fitcauto() was

used to train and evaluate each of these

models on the provided data using

Bayesian optimization to optimize models

accordingly and output their performance

metrics.

Research for a reason.


Machine Learning – Sentiment Analysis

Model Performance Comparison


- Table of Top 5 machine learning model evaluation

Research for a reason.


Machine Learning – Sentiment Analysis

Data Websites
Tweets - https://www.kaggle.com/kazanova/sentiment140
Foreign Lexicons - https://www.kaggle.com/rtatman/sentiment-lexicons-for-81-languages
English Lexicons – https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html#lexicon
Pretrained Word Embeddings (Foreign) – https://fasttext.cc/docs/en/crawl-vectors.html
Pretrained Word Embeddings (English) -
https://www.mathworks.com/matlabcentral/fileexchange/66229-text-analytics-toolbox-model-for-fasttext-engli
sh-16-billion-token-word-embedding

Thank You!

Research for a reason.

You might also like