Professional Documents
Culture Documents
Sentiment Analysis of Tweets and Financial News Headlines: Wyatt Steen
Sentiment Analysis of Tweets and Financial News Headlines: Wyatt Steen
Sentiment Analysis of Tweets and Financial News Headlines: Wyatt Steen
Project Overview
Goal:
- Retrieve recent tweets and financial news headlines that mention a keyword (in this case a stock ticker).
- Predict the sentiment of each word in the headline or tweet with machine learning.
- Calculate the sentiment of each headline or tweet by averaging the sentiment score of each word in the
string.
Datasets
Lexicons: Word Embeddings:
Feature Extraction
Feature Extraction
- Convert each word in the processed text into word vectors.
- Word vectors are vectors of numbers that represent the meaning of a word. This allows for mapping of words
based on their relationship with other words. With word vectors, one can execute mathematical operations with
words.
Example:
- This was done by using the pretrained word embedding models. Each word embedding model used has a
vocabulary of one million or more words and their vector representation (text corpus). These pretrained models
predict the vector representation of new words by their relationships with word vectors in the model’s existing
text corpus.
Preprocessing
- Tokenize the text
- Erase punctuation in the text
- Remove stop words (I, am, the, was, etc.)
- Convert text to lowercase
- Remove HTML tags and URLs.
Models Investigated
- 150 machine learning models were
investigated.
different parameters.
metrics.
Data Websites
Tweets - https://www.kaggle.com/kazanova/sentiment140
Foreign Lexicons - https://www.kaggle.com/rtatman/sentiment-lexicons-for-81-languages
English Lexicons – https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html#lexicon
Pretrained Word Embeddings (Foreign) – https://fasttext.cc/docs/en/crawl-vectors.html
Pretrained Word Embeddings (English) -
https://www.mathworks.com/matlabcentral/fileexchange/66229-text-analytics-toolbox-model-for-fasttext-engli
sh-16-billion-token-word-embedding
Thank You!