Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

INTRODUCTION TO THE STOCK RECOMMEND SYSTEM

The main driving force behind this research is the idea that making a steady profit from the
stock is difficult, especially given how nonlinearly the stock price moves. The majority of users
are confused of which stock to buy or when to begin or leave a trade. Wallstreetbets amply
demonstrated the impact that social media may have on stock prices, even outperforming stock
regulators. We believe that many people undervalue some of these self-taught or social media
traders' intellect. The majority of people are aware that it is difficult to anticipate the price of
stocks, and that business is often seasonal, with holidays, quarterly earnings reports, and four-
quarter sales all having an impact on stock prices. This subject is highly intriguing because of
how the stock markets are affected by the subreddit r/WallStreetBets and how they've
managed to bring up significant concerns about opportunity and justice in our financial system.
It is more difficult since user tastes vary widely and there are many options. Our goal is to
create a financial recommendation system that will advise users on whether to purchase, sell,
or keep a stock based on the messages or information we have gathered. This will make it
easier for investors to choose a strong company to invest in.
DATA COLLECTION
Through the praw library, we have accessed the Reddit API. The information required to make a
clean, orderly request to the Reddit API is packaged together by the praw library. We gathered
about 3000 Reddit postings from the r/WallstreetBets forum. It is simple to select a subreddit
and grab the "top" or "hot" posts using the PRAW library. The most upvotes are on the top
posts, and hot votes are quickly gaining popularity. Additionally, there are "best" postings that
have the most upvotes and the fewest downvotes.
We used the regex package in Python to extract the stocks from the Reddit data. We looked for
any capitalised, three-character sequences. After repeatedly reading each message, we
extracted a list of stocks. The Yahoo Finance API is another data source that we have used. A
media outlet called Yahoo Finance offers financial news, information on stock quotations, press
announcements, and financial reports. We use this API to verify the stocks we have pulled.
Therefore, if we locate a stock, such as GME, for example, we call the Yahoo Finance API, and if
we receive a response we believe our extraction was successful. We assume that our extraction
was unsuccessful if we receive an error. Numerous financial details about the firm are available
via the Yahoo Finance API, including stock prices, corporate information, short positions in the
stock, and other details.
KEY IDEA
This system's basic idea is to use our retrieved stocks to establish baselines for comparison.
Based on similarity scores and groupings of the stock data, we would respond to a user's input
of a stock by offering one of three sorts of suggestions. We will use clustering to organize our
data into groups as one measure to assist assess a stock and make a recommendation. The top
5 stocks with comparable pricing and performance will then be returned using similarity scoring

18CS58
INTRODUCTION TO THE STOCK RECOMMEND SYSTEM

techniques like cosine similarity. In order to guarantee that our groups of suggestions are stable
and accurate, we will finally verify them using a networkX graph.
CLUSTERING
We wanted to analyze how our equities clustered in order to provide solid recommendations.
To guarantee that our suggestions were solid and reliable, we needed to base them on many
metrics. Data on Financial Results, including Reported Revenue and the stock price's 12-month
average, are available for 2020. Three distinct methods of clustering were applied to the
dataset. We chose the best epsilon value using an elbow curve, which may also be used to
determine the ideal K-mean cluster size. We discovered that the data had three clusters and
outliers that required distinct analysis.
CONCLUSION
The working model for the stock prediction recommendation engine is complete, and it will
undoubtedly offer content-based suggestions to individuals who are keen on stock investment.
Since we obtained the information from r/wallstreetbets, where individuals occasionally plot
and defy the stock market system, even a layperson may comprehend the scenario, profit from
it, and invest in stocks. Our content-based recommendation approach closely matches the
corresponding rises and falls in the stock market. However, the existing model has to be
upgraded in order to do real-time suggestions. To achieve this, it must be able to calculate the
real-time data from Reddit postings. The present recommendation system also lacks resilience
because there is no user data and it is impossible to
EDA
Data scientists utilize exploratory data analysis (EDA), which frequently makes use of data
visualization techniques, to examine and analyses data sets and summaries their key properties.
Exploratory data analysis is a technique for analyzing data to discover insights or important
features. Graphical analysis and non-graphical analysis are the two subtypes of EDA. Any data
science or machine learning process must include EDA. It is supervised learning model. Data
scientists may obtain the answers they need by identifying data trends, recognizing anomalies,
verifying presumptions, or testing a hypothesis thanks to EDA tools, which effectively
manipulate data sources.

18CS58

You might also like