Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 20

1

Business Insight Report - Investing in the Stock Market in 2023

Name

Institutional Affiliation

Date
2

Introduction

I am writing to present the insights for informed stock market investments in 2023. I used

various analytical frameworks, such as TF-IDF, N-grams, Bigrams, and correlograms. The act of

investing in the stock market has consistently captivated the interest of individuals, given its

potential for lucrative outcomes. In the context of the rapidly evolving financial markets and the

emergence of new investment options, the year 2023 requires a comprehensive and thorough

comprehension of the dynamics and trends underlying the stock market. This understanding is

essential for informed investment choices and prudent decision-making. The primary objective

of this report is to furnish crucial perspectives through a meticulous scrutiny of literatures

associated with investment. In doing so, the report endeavors to underscore pivotal concepts and

factors that individuals need to take into account when contemplating an investment in the stock

exchange.

Methodology

To gain insights from the texts, we employed three frameworks: TF-IDF analysis, N-grams and

Bigrams extraction, and correlograms (Yang et al., 2020).

To analyze the texts, we'll use TF-IDF, n-grams, and bigrams, and follow these steps:

1. Clean text by removing punctuation, lowercasing, and removing stop words.

2. Calculate TF-IDF scores for words in text.

3. Create n-grams and bigrams from the text.

4. Create correlograms to visualize word relationships.


3

2.1 TF-IDF Analysis

The utilization of TF-IDF analysis enabled the identification of utmost essential terms in

investment-related literature. Through the computation of TF-IDF scores for each term, we have

ascertained their significance within the context of investment. The ensuing tableau depicts the

ten words that have achieved the utmost TF-IDF scores.

Top TF-IDF

Top TF-IDF

Through TF-IDF analysis, it was determined that certain terms such as "isa," "stocks,"

"tax," "risk," “capital” “estates” "rewards" “market” investor” “profit” and “date” held
4

substantial significance within the discourse of investment literature. The aforementioned

lexemes explicate the fundamental facets and apprehensions affiliated with making investments

in the stock exchange.

2.2 N-grams and Bigrams Extraction

The utilization of N-grams and Bigrams facilitated the recognition of ubiquitous phrases

and word combinations within the investment literature (Yang et al., 2020). The presented chart

portrays the N-grams and Bigrams that occur with the highest frequency:

The analysis of N-grams and Bigrams has brought attention to specific phrases, namely

"value stocks," "growth stocks," "small-cap stocks," and "real estate," which are frequently

referenced and thus deemed significant in investment discussions.

2.3 Sentiment Analysis


The implementation of sentiment analysis holds significant potential for comprehending

the perspectives and dispositions of consumers concerning a given commodity or provision. The

utilization of sentiment analysis techniques enables the assessment of the sentiment polarity (i. e,

positive, negative, or neutral) within various forms of textual data, such as customer reviews and

social media posts. The aforementioned information is pivotal in assessing the level of customer

contentment, detecting potential areas of enhancement, and molding marketing strategies..

Analysis and Findings

3.1 TF-IDF Analysis Findings

According to the TF-IDF analysis conducted, it was determined that the terms

"investing," "stocks," and "investment" attained the most significant importance scores. This

finding suggests that risk analysis, education, market analysis, and profitability are crucial in the

stock market. "Investors need to weigh factors when making educated investment decisions."
5

Research, analyze trends, stay updated to minimize risks in stock market. The notion of "risk"

bears considerable significance, indicating the judiciousness and vigilance of investors in regards

to the potential hazards associated with investment activities (Yang et al., 2020).

3.2 Zipf's Law and Bigrams Analysis

According to Zipf's Law, within a significant corpus of text, the frequency of a given

word exhibits an inverse relationship with its rank. Through the analysis of the frequencies of

words present in our dataset, the validation of Zipf's Law can be achieved, as well as the

identification of prominent words that exhibit a high occurrence rate, thus indicating their

relevance and importance in customer interactions. Incorporating the examination of bigrams,

that is, sequential pairs of words, may yield valuable information regarding word associations

and the discovery of significant collocations. This finding suggests that risk analysis, education,

market analysis, and profitability are crucial in the stock market. "Investors need to weigh factors

when making educated investment decisions." Research, analyze trends, stay updated to

minimize risks in stock market (Alkaraan, 2020).


6

Sentiment Analysis Findings


7

The analysis of sentiment evaluates the general sentiment conveyed in the textual

information. Through the examination of sentiment scores assigned to individual documents,

valuable perspectives on the prevailing sentiment pertaining to the stock market was derived.

The comparative examination of sentiment scores among the documents unveiled discernible

contrasts in regard to the prevailing sentiments directed towards the stock market. As illustrated,

document 3 exhibits a considerably greater sentiment score compared to others, this signifies a

more prevailing positive outlook towards the stock market espoused by that specific document.

Document 3 expresses positive sentiment among investors signifies an attitude of optimism and

confidence, whereas the manifestation of negative sentiment signifies an attitude of pessimism


8

and concern. By closely monitoring the prevailing sentiment of investors, individuals engaged in

the market can effectively assess the overall market sentiment, thereby facilitating informed and

strategic investment decisions.

Findings and Insights

Business Insights

Through the utilization of TF-IDF analysis, N-grams and Bigrams extraction, as well as

correlograms, significant business insights can be derived for individuals contemplating


9

investment in the stock market for the year 2023 (Yang et al., 2020). After analyzing historical

stock market performance, assessing asset classes, and conducting correlation analysis, we can

recommend the optimal stock market investment method for 2023 to include;
10

1. Focus on Risk and Rewards: The high TF-IDF scores associated with the terms "risk" and

"rewards" indicate that it is imperative for investors to conduct a comprehensive

evaluation and risk mitigation strategy when contemplating potential gains within the

stock market. The significance of performing a comprehensive risk assessment and

assessing potential returns prior to reaching investment decisions is highlighted.

2. Real Estate and Capital: The notable TF-IDF scores assigned to the words "estate" and

"capital" suggest that investment strategies pertaining to real estate and capital allocation

hold significance within the context of the stock market. Investors are advised to

potentially delve into opportunities that are correlated with real estate investment trusts

(REITs), companies focused on real estate development, or industries that necessitate

substantial capital investments (Yang et al, 2020).

3. Investor Awareness: The present analysis reveals that the utilization of the term "investor"

with a TF-IDF score that exceeds 10 signifies a notable emphasis on factors that pertain

to investors within the stock market. The aforementioned observation underscores the

significance of equipping investors with relevant information, promoting their awareness

and enhancing their knowledge, with a view to enabling them to make well-informed

investment decisions.

4. Market Analysis: The inclusion of the term "market" featuring a TF-IDF score surpassing

10 indicates that a comprehensive analysis of the market, encompassing an appreciation

of its trends, dynamics, and conduct, holds utmost importance in facilitating judicious

investment verdicts. Investors are advised to meticulously observe prevailing market

conditions, industry trends, and relevant economic indicators in order to discern potential

investment prospects and hazards.


11

5. Individual Savings Account (ISA) and Tax: The existence of the identifier "ISA" with a

TF-IDF score exceeding 10 and "tax" with a TF-IDF score of 7. 5 implies that investment

strategies that optimize tax efficiency, particularly those related to Individual Savings

Accounts, may hold significance in the realm of stock market transactions.

6. Profit and Date: If the TF-IDF (term frequency-inverse document frequency) score of the

term "profit" exceeds 10, it can be inferred that the stock market analysis is emphasizing

the aspect of financial gain. The aforementioned statement suggests that investors are

advised to undertake a comprehensive evaluation of the potential earnings associated

with investment ventures. Additionally, investors are encouraged to take into account

various factors that have the potential to adversely influence profitability, such as the

financial status of the firm, the level of market demand, and the competitive ecosystem.

The present study posits that the term "date" carrying a TF-IDF score above 10 connotes

the gravity of remaining abreast of current market data, news, and occurrences that

possess the potential to exert an influence on investment judgments.

7. Asset Management and Allowances: The significance of astute asset management and the

judicious utilization of available allowances in the stock market are evident through the

presence of assets and allowances with TF-IDF scores surpassing 5. It is recommended

that investors prioritize the refinement of their asset allocation techniques, the expansion

of their portfolio diversification, and the utilization of available tax allowances and

exemptions to achieve the highest possible investment returns.


12

Conclusion

In conclusion, investing in the stock market in 2023 requires a thoughtful and well-informed

approach. Through the analysis of investment-related texts using frameworks such as TF-IDF

analysis, N-grams and Bigrams extraction, and correlograms, we have gained valuable insights

to guide individuals in their investment decisions. The aforementioned perspectives underscore

the criticality of conducting a thorough evaluation of risk, imparting knowledge to investors,

meticulously analyzing the market, and assessing profitability when dealing with stocks. When

devising their investment tactics and judiciously determining which actions to take, it behooves

investors to carefully contemplate these various factors. Conducting comprehensive research,

scrutinizing market trends, and remaining abreast of pertinent information are indispensable

measures to optimize potential gains and mitigate hazards in the realm of stock investments.
13

References

Alkaraan, F. (2020). Strategic investment decision-making practices in large manufacturing

companies. Meditari Accountancy Research, 28(4), 633–653.

https://doi.org/10.1108/medar-05-2019-0484

Yang, C., Yu, M., Huang, Q., Li, Z., Sun, M., Liu, K., Jiang, Y., Hu, F., & Yu, M. (2020).

Introduction to GIS programming and fundamentals with Python and arcgis. CRC Press,

an imprint of the Taylor & Francis Group, an informa business.


14
Appendices

Text 1:

How I Loaded the Required Packages:

# Install and load the required packages

install.packages("tidyverse")

library(tidyverse)

# Read the text documents

document1 <- readLines("path/to/document1.txt")

document2 <- readLines("path/to/document2.txt")

# Add more documents if needed

# Create a data frame

text <- tibble(Document = c("Document 1", "Document 2"), Text = c(document1, document2))

library(tidyverse)

library(tm)

library(topicmodels)

library(tidytext)

library(stringr)

library(ggplot2)

# Load the documents into R.

file1 <- "/Users/Desktop/Interactive Investor 1.txt"


15
Appendices

file2 <- "/Users/Desktop/Neighborhood Finance Guy 1.txt"

file3 <- "/Users/Desktop/mint 1.txt"

docs1 <- readLines(file1)

docs2 <- readLines(file2)

docs3 <- readLines(file3)

TADocs <- list(docs1, docs2, docs3)

# Create a dataframe with separate text for each document

df <- tibble(

text = sapply(TADocs, function(x) paste(x, collapse = " ")),

doc_id = 1:length(TADocs)

) %>%

mutate(text = str_replace_all(text, "\\W+", " ")) %>%

unnest_tokens(word, text) %>%

filter(!word %in% stop_words) %>%

select(doc_id, word)

# Print the dataframe

print(df)

# Calculate term frequency (TF) and document frequency (DF)

tf <- df %>%

count(doc_id, word) %>%


16
Appendices

rename(tf = n)

df <- df %>%

group_by(word) %>%

summarise(df = n_distinct(doc_id))

# Calculate inverse document frequency (IDF) and TF-IDF

tf_idf <- tf %>%

inner_join(df, by = "word") %>%

mutate(idf = log(n_distinct(doc_id) / df),

tf_idf = tf * idf) %>%

arrange(desc(tf_idf))

# Select the top 10 words with the highest TF-IDF scores for each document

top_tfidf <- tf_idf %>%

group_by(doc_id) %>%

top_n(10, tf_idf) %>%

ungroup()

# Visualize the top 10 words with the highest TF-IDF scores for each document

top_tfidf %>%

ggplot(aes(x = reorder_within(word, tf_idf, doc_id), y = tf_idf, fill = factor(doc_id))) +

geom_col(show.legend = FALSE) +

facet_wrap(~doc_id, scales = "free_y") +


17
Appendices

coord_flip() +

scale_x_reordered() +

labs(x = "Word", y = "TF-IDF", fill = "Document")

# Zipf's law plot

zipf <- tf_idf %>%

group_by(doc_id) %>%

mutate(rank = row_number(),

term_frequency = tf / sum(tf)) %>%

ungroup()

zipf %>%

ggplot(aes(x = rank, y = term_frequency, color = factor(doc_id))) +

geom_line() +

scale_x_log10() +

scale_y_log10() +

labs(x = "Rank (log scale)", y = "Term Frequency (log scale)", color = "Document")

# Bigram analysis

bigrams <- df %>%

mutate(doc_id = factor(doc_id),

word_next = lead(word),

group_id = cumsum(word == "")) %>%

filter(!is.na(word_next)) %>%

unite(bigram, word, word_next, sep = " ") %>%


18
Appendices

select(doc_id, bigram, group_id)

# Remove stop words from bigrams

bigrams <- bigrams %>%

separate(bigram, c("word1", "word2"), sep = " ") %>%

filter(!word1 %in% stop_words, !word2 %in% stop_words) %>%

unite(bigram, word1, word2, sep = " ")

# Count the frequency of each bigram

bigram_counts <- bigrams %>%

count(doc_id, bigram, sort = TRUE)

# Visualize the most frequent bigrams

bigram_counts %>%

group_by(doc_id) %>%

top_n(10, n) %>%

ungroup() %>%

ggplot(aes(x = reorder_within(bigram, n, doc_id), y = n, fill = doc_id)) +

geom_col(show.legend = FALSE) +

facet_wrap(~doc_id, scales = "free_y") +

coord_flip() +

scale_x_reordered() +

labs(x = "Bigram", y = "Frequency")


19
Appendices

# Sentiment analysis

sentiments <- df %>%

inner_join(get_sentiments("afinn"), by = c("word" = "word")) %>%

group_by(doc_id) %>%

summarise(sentiment_score = sum(value)) %>%

ungroup()

# Visualize sentiment scores

sentiments %>%

ggplot(aes(x = doc_id, y = sentiment_score, fill = factor(doc_id))) +

geom_col(show.legend = FALSE) +

labs(x = "Document", y = "Sentiment Score", fill = "Document")


20
Appendices

You might also like