Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

CHRIST (Deemed to be University)

YESHWANTHPUR CAMPUS,

BENGALURU, KARNATAKA

IV M.Sc. DATA SCIENCE

SentimentWatch: Tracking Stock Sentiments

Submitted By

Bhoomika Hingorani (2348426),

Anushka Biswas(2348410),

Jayden Dsouza(2348428),

Keerthi TN (2348434),

Sakshi Purswani(2348453)

Submitted to

Dr. Rashmi Siddalingappa


1
SOFTWARE REQUIREMENT SPECIFICATION
REQUIREMENT ANALYSIS:

Objective:

The main objective of this project is to develop a sentiment analysis and stock search engine application. The
application will analyze financial news and social media sentiment to predict stock market movements,
providing real-time data to investors and analysts.

Problem Definition:

In the fast-paced world of stock trading, timely and accurate information is crucial. Financial news and social
media are rich sources of information that influence investor sentiment and, consequently, stock prices.
However, the sheer volume of data makes it challenging to process and analyze these sources for actionable
insights manually. This project aims to automate sentiment analysis to assist investors in making informed
decisions based on real-time sentiment data.

1. Why:

The stock market is highly volatile and influenced by numerous factors, including public sentiment.
Automating sentiment analysis can provide a significant advantage by offering timely insights and
predictions, which manual analysis cannot achieve due to the sheer volume and speed of data
generation.

2. What:

The project will create a web-based application that collects financial news and social media data,
processes it using sentiment analysis techniques, and integrates this data with real-time and historical
stock market information. The application will provide users with a search engine to query sentiment
scores and stock data.

3. Beneficiaries:

● Investors: Gain insights into market sentiment and make informed trading decisions.

● Financial Analysts: Enhance analysis with sentiment data.

● Trading Firms: Improve trading strategies and algorithms.

● Academic Researchers: Utilize the tool for market behavior and sentiment analysis research.
2
Project Study:

Existing System:

Currently, several platforms and tools provide stock market data, and some offer sentiment analysis of
financial news or social media. However, these systems often lack real-time integration and a unified
interface for both sentiment analysis and stock market data.

Limitations of Existing System:

● Lack of Real-Time Data: Many systems do not offer real-time sentiment analysis.
● Separate Interfaces: Users often need to use different platforms for sentiment analysis and stock data.
● Limited Accuracy: Some existing tools do not leverage advanced machine learning techniques for
sentiment analysis.

Proposed System:

The proposed system will:

● Collect and preprocess real-time data from financial news sources and social media.
● Perform sentiment analysis using advanced NLP techniques.
● Integrate sentiment data with historical and real-time stock market data.
● Provide a user-friendly web interface for searching and visualizing data.

Benefits of the Proposed System:

● Real-Time Analysis: Offers up-to-date sentiment analysis.


● Unified Interface: Combines sentiment analysis and stock market data in one platform.
● Advanced Techniques: Utilizes state-of-the-art NLP models for improved accuracy.

LITERATURE SURVEY:
Year of Author/Company Techniques/Algorithm Gap or Drawback
Implement
ation

2023 Nitish Garg, Mayukh, Long Short Term Memory: - Need to enhance user
Shruti, Riya, Himank - Can learn and predict reliability and experience by
[https://www.slideshar long sequences improving GUI.
e.net/slideshow/stock-p - Stores the information
rice-prediction-using-s over a long period - Need to include the addition
entiment-analysis/2649 of other variables that
32403] Natural Language Processing: influence stock market
- Translates text from one forecasting.
language to another,
responds to voice
3
commands, and quickly
summarizes large
amounts of data in real
time.

2023 Ward de Lange Convolutional Neural Network: Understanding the impact of


[https://essay.utwente.n - Used for stock combining multiple algorithms that
l/96019/1/de%20Lange prediction may outperform traditional ones
_BA_EEMCS.pdf] - Can analyze images,
thus used with stock
price graphs as input

Support Vector Machine:


- Used for classification
of non-linear data
- Allocates every data
point to a point in a
vector space and uses
that to classify
them into certain
groups.

Long Short Term Memory:


- Memory cells in LSTM
are designed to
remember strong
influences from the
past, crucial for
predicting stock market
movements where some
information has
long-term effects while
others are short-lived

- The ability to use


textual data as input,
along with its memory
cell structure enables
the capture of complex
patterns and
dependencies in stock
4
market data, enhancing
prediction accuracy

Random Forest:
- Employs multiple
decision trees with
randomized features for
each decision node.
- A feature bagging
method generates a
separate dataset from
the test data with a
replacement
for each tree.
- Each tree is then given
the same input,
which all the different
trees will then be used
to generate an output.

2023 Kalyani Joshi, Bharathi Three different Machine For those companies where the
H. N., Jyothi M. Rao Learning Models: Support availability of financial news is a
[https://arxiv.org/pdf/1 Vector Machine, Random challenge, Twitter data will be used
607.01958] Forest, and Naive Bayes were for similar analysis.
built to classify financial news
articles' polarity as positive or
negative.

Sentiment Detection
Algorithm:
- Use the Bag of Word
technique for text
mining.
- Building the polarity
dictionary, using lists of
positive words and
negative words.
- Match the article’s
words against both
these word lists count
the number of words
that appear in both
dictionaries and
5
calculate the score of
that document.

2023 I.A.Amaunam, Support Vector Machine: Complexity in data integration


J.Iworiso, I. O. - Classifies text data into
Olawale positive, negative, or
[https://pubs.sciepub.c neutral sentiment
om/ijdeaor/4/1/1/] categories.
- Predicting stock market
movements by
conducting sentiment
analysis of financial
news articles.
- Utilized a mix of
textual and numerical
features to boost the
accuracy of the model's
forecast.

Recurrent Neural Networks


(RNNs):
- Well-suited for
sentiment analysis tasks
due to their ability to
process the sequential
nature of textual data.
- Analyzing input
sequences element by
element, RNNs can
grasp the context of
words and phrases,
associating them with
positive or negative
sentiment.

BERT:
- Generates highly
accurate representations
of textual data.
- Employs self-attention
mechanisms to
simultaneously process
input sequences rather
6
than sequentially, as in
RNNs or LSTMs.
- Can be fine-tuned to
classify text based on
the underlying
sentiment, such as
positive, negative, or
neutral.

Specialization Concepts (to be implemented):

● NLP: Natural Language Processing techniques for text preprocessing, and sentiment analysis.
● Machine Learning: Algorithms for predicting stock market trends based on sentiment data.
● Web Scraping: Collecting data from financial news websites and social media.

● Real-Time Processing: Handling and analyzing data in real-time to provide timely

REQUIREMENTS SPECIFICATION:

Functional Requirements:

1. Data Collection :

Task: Collect data from financial news websites, social media platforms, and stock market databases.
Input: URLs of financial news websites, social media handles or hashtags, stock market APIs.
Process:
● Web scraping and API calls to collect the latest financial news articles and social media posts.
● Storing collected data in a database.
Output: A database containing structured data from news articles and social media posts.

2. Data Preprocessing :

Task: Clean and preprocess the collected data for sentiment analysis.
Input: Raw data from the data collection phase.
Process:
● Tokenization, stop-word removal, lemmatization, and normalization of text data.
● Filtering out irrelevant data and handling missing values.
Output: Cleaned and preprocessed text data ready for sentiment analysis.

3. Sentiment Analysis :

Task: Perform sentiment analysis on the preprocessed data.


Input: Preprocessed text data.
7
Process:
● Use machine learning models like BERT or GPT to analyze sentiment.
● Classify the sentiment as positive, negative, or neutral.
Output: Sentiment scores for each piece of text data.

4. Predictive Analytics :

Task: Predict stock price movements based on sentiment analysis and other features.
Input: Sentiment scores, historical stock prices, trading volumes, and macroeconomic indicators.
Process:
● Feature engineering to create input features for the predictive model.
● Training machine learning models (e.g., LSTM, Random Forest) to predict stock price
movements.
Output: Predicted stock price movements.

5. Real-Time Alerts :

Task: Provide real-time alerts based on significant sentiment changes.


Input: Real-time sentiment analysis results.
Process:
● Monitor sentiment changes continuously.
● Set thresholds for triggering alerts.
Output: Real-time alerts to users about significant sentiment changes.

6. Interactive Dashboard :

Task: Develop an interactive dashboard to visualize the data and predictions.


Input: Sentiment analysis results, predicted stock prices, user preferences.
Process:
● Design and implement interactive visualizations (e.g., sentiment heatmaps, stock price charts).
● Integrate the dashboard with the backend to fetch real-time data.
Output: A user-friendly dashboard providing insights and visualizations.

7. User Management :

Task: Manage user accounts and preferences.


Input: User registration details, login credentials, user preferences.
Process:
● Implement user authentication and authorization.
● Store and manage user preferences.
Output: Secure user accounts and personalized dashboard settings.

8. Testing and Validation :

Task: Test the system to ensure accuracy and reliability.


Input: System components and user feedback.
8
Process:
● Unit testing, integration testing, and user acceptance testing.
● Collect and analyze user feedback for improvements.
Output: A validated and reliable system.

Non-Functional Requirements:

● Scalability: The system must handle large data volumes and user load.
● Performance: Real-time processing with minimal latency.
● Security: Secure data handling and access control.
● Reusability: Modular components for easy updates and integration.
● Modifiability: Flexible architecture for future changes.

System Requirement:

● Software:
○ OS: Windows, macOS, Linux.
○ Languages: Python, JavaScript.
○ Frameworks: Streamlit, React.js, Node.js, Django, TensorFlow, PyTorch.
○ Databases: PostgreSQL, MongoDB.
○ Tools: Git, Docker, Jupyter Notebook.
● Hardware:
○ Server: Multi-core CPU, 32 GB RAM, 1 TB SSD, high-speed internet.
○ Client: Multi-core CPU, 8 GB RAM, 256 GB SSD, Full HD display.

SYSTEM MODELS:

Abstract Description: The system consists of modules for data collection, preprocessing, sentiment analysis,
data integration, user interface, and notifications. These modules interact to provide real-time sentiment
analysis and stock data to users.

Block Diagrams:

1. Overall System:

2. Module Descriptions:

● Data Collection: Gather data from financial news and social media using web scraping and
APIs.
● Data Preprocessing: Cleans and prepares text data for analysis.
● Sentiment Analysis: Uses NLP models to analyze and score sentiment.
● Data Integration: Aligns and combines sentiment scores with stock market data.
9
● User Interface: Provides a search engine and visualizations for users.

ROLES AND RESPONSIBILITIES:

Project Manager and DevOps Engineer:

● Coordinate project activities and timelines.


● Manage communication and stakeholder interactions.
● Oversee budget and resource allocation.
● Set up and maintain CI/CD pipelines.
● Manage deployments and scalability.
● Monitor system performance.

Data Scientist:

● Develop and validate sentiment analysis models.


● Conduct exploratory data analysis.
● Ensure the quality and accuracy of the data.
● Perform feature engineering for predictive models.

Frontend Developer and UX/UI Designer:

● Design and develop the user interface.


● Implement user authentication and management.
● Ensure responsive and user-friendly design.
● Conduct user research for feedback.
● Create mockups and prototypes.

Backend Developer and Full Stack Developer:

● Develop server-side logic and APIs.


● Integrate data pipelines with the backend.
● Ensure data security and database management.
● Work on both frontend and backend tasks.
● Integrate sentiment models with the UI.
● Maintain system architecture.

Quality Assurance (QA) Engineer and Data Engineer:

● Create and execute test plans.


● Identify and document bugs.
● Verify fixes and updates.
● Set up and manage data infrastructure.
● Collect and preprocess data through web scraping and APIs.
● Ensure efficient data pipelines.
● Support data preprocessing efforts.
10
PLANNING:

Timeline:

Week 1: Project Planning and Data Collection


Milestone: Finalize project plan and gather all necessary data sources.

Activities:

● Define project objectives and deliverables.

● Identify and collect historical stock price data.


● Identify and collect sentiment data from news articles, social media posts, and financial
reports.
● Set up project management tools and schedule regular team meetings.

Week 2: Data Preprocessing

Milestone: Complete data cleaning and preprocessing.

Activities:

● Clean and preprocess stock price data (handling missing values, outliers, etc.).
● Clean and preprocess sentiment data (tokenization, stop-word removal, normalization).
● Perform exploratory data analysis (EDA) to understand data distributions and correlations.
● Prepare data for sentiment analysis.
11
Week 3: Sentiment Analysis Implementation

Milestone: Implement and validate sentiment analysis models.

Activities:

● Choose appropriate sentiment analysis techniques (lexicon-based, machine learning-based, or


deep learning-based).
● Train sentiment analysis models on preprocessed data.
● Validate models using cross-validation and other techniques.
● Fine-tune models based on validation results.

Week 4: Correlation Analysis

Milestone: Analyze the correlation between sentiment scores and stock price movements.

Activities:

● Compute sentiment scores for historical data.


● Analyze the relationship between sentiment scores and stock prices using statistical methods.
● Identify significant patterns and trends.
● Document findings and prepare initial insights.

Week 5: Search Engine Development

Milestone: Develop a functional prototype of the search engine.

Activities:

● Design the search engine architecture and choose appropriate technologies.


● Implement backend components (data processing, database integration, API development).
● Implement basic frontend components (search interface, results display).
● Integrate sentiment analysis results into the search engine.

Week 6: User Interface and Visualization

Milestone: Develop and refine the user interface and visualizations.

Activities:

● Design intuitive and visually appealing user interfaces.


● Implement interactive visualizations for sentiment trends and stock analysis.
● Conduct user testing and gather feedback.
● Refine the user interface and visualizations based on feedback.

Week 7: Testing, Deployment, and Documentation

Milestone: Complete testing, deploy the project, and finalize documentation.


12
Activities:

● Perform comprehensive testing (unit testing, integration testing, user acceptance testing).
● Fix bugs and optimize performance.
● Deploy the search engine to a web server or cloud platform.
● Prepare detailed project documentation (methodology, code documentation, user guides).
● Present the project and its outcomes.

Budget:

Estimated Costs for Development, Deployment, and Maintenance:

● Development:

Free Resources: Open-source libraries (NLTK, SpaCy, Pandas), free APIs (Alpha Vantage, IEX
Cloud), development tools (VS Code, GitHub).

Necessary Costs: Paid APIs for extended data access (if needed), additional storage or computational
power (minimal budget allocation).

● Deployment:

Free Resources: Heroku free tier, GitHub Pages for documentation.

Necessary Costs: AWS for scalable deployment, and domain name registration (minimal budget
allocation).

● Maintenance:

Free Resources: Monitoring tools (Prometheus, Grafana free tiers), bug tracking (GitHub Issues).

Necessary Costs: Minimal budget for unexpected maintenance or upgrades.

Strategies:

Agile Development, Continuous Integration, Regular Updates:

Agile Development: The team will follow agile methodologies, with iterative development cycles, regular
stand-up meetings, and sprint reviews to ensure continuous progress and flexibility.

Continuous Integration: Implement CI/CD pipelines using GitHub Actions to automate testing and
deployment, ensuring code quality and rapid iteration.

Regular Updates: Schedule regular updates and maintenance windows to address user feedback, fix bugs, and
add new features.
13
DATASET DESCRIPTION:

Scope:

● Determine the availability and reliability of data sources.


● Assess the compatibility of selected tools with the project requirements.
● Evaluate the clarity and completeness of the collected data.

Data Sources:

● Financial News Websites: Major news outlets like Bloomberg, Reuters, and financial sections of
newspapers.
● Social Media Platforms: Twitter, Reddit (specifically subreddits related to stock trading), and
financial blogs.
● Stock Market Databases: APIs from sources like Alpha Vantage, IEX Cloud, and Yahoo Finance.

Tools Requirement:

● Web Scraping: Beautiful Soup, Scrapy


● API Access: Tweepy (for Twitter), Alpha Vantage, IEX Cloud
● Data Storage: PostgreSQL (for structured data), MongoDB (for unstructured data)
● Data Preprocessing: NLTK, SpaCy, Pandas
● Sentiment Analysis: Hugging Face Transformers, TensorFlow, PyTorch
● Predictive Analytics: Scikit-learn, XGBoost, LightGBM
● Visualization: D3.js, Plotly
● Frontend Development: React, Angular
● Deployment: AWS, Heroku

Data Clarity and Understanding- Relevance and Accuracy:

The data will be collected from reputable financial news websites, social media platforms, and stock market
databases to ensure relevance and accuracy. This will guarantee that the data reflects current market
conditions and sentiments accurately.

Quality and Completeness:

The dataset will include all necessary information, such as stock prices, news headlines, social media posts,
and sentiment scores. Any irrelevant or duplicate entries will be removed to maintain high data quality.

Standardization and Preprocessing:

Text data will be standardized and preprocessed, including converting to lowercase, removing special
characters, and handling abbreviations. Tokenization, stop-word removal, and lemmatization will be applied
to ensure consistency in sentiment analysis.
14
Sentiment Assignment:

Random sentiment scores (Positive, Negative, Neutral) will be assigned to each generated data entry. This
will help in testing the sentiment analysis models under different conditions.

Annotation and Validation:

Each data entry will be assigned sentiment labels (Positive, Negative, Neutral) based on predefined criteria.
Multiple annotators will cross-validate these labels to ensure consistency and reliability.

Exploratory Data Analysis (EDA) and Visualization:

Exploratory data analysis will be conducted to understand the distribution, trends, and patterns in the data.
Visualization tools, such as histograms and word clouds, will be used to present key insights clearly and
effectively.

PILOT DATA:

Objective: To evaluate the feasibility of collecting, processing, and analyzing data for the project.

Data simulated for reference:

(Refer to attached CSV file)

You might also like