Icaiccit 719

2023 International Conference on Advances in Computation, Communication and Information Technology (ICAICCIT)
An Efficient Model to Detect the Presence of

Hinglish Text in YouTube Data
Ayeena Bhalla Anupama Chadha
Research Scholar, MRIIRS Professor,MRIRS
ayeena.sgtbimit@gmail.com anupma.sca@mriu.edu
Abstract— Sentiment Analysis is referred as text organization that is the reviews of the product on various social media platforms
used to classify the expressed mind-set or feelings in different before buying it. The reviews of the customers if constantly
manners such as negative, positive, favorable, unfavorable, monitored can help in evaluating the customer’s loyalty, keeping
thumbs up, thumbs down, etc. Majority of the content available on track on their sentiments and also analyzing the impact of
various social media websites is in English Language. Due to various marketing activities related to a product.
advent of technology and access to Internet in huge populated
country like India, people tend to share their views on a language Sentiment Analysis can be done using two approaches:
that they are more comfortable in. This gives rise to users sharing Machine Learning Approach and Lexicon Based Approach.
their opinions in code-mixed languages such as Hindi mixed with Lexicon Based Approach is dictionary-based approach that is
English. Opinions in the form of tweets or comments are available being used for sentiment analysis. The data obtained after
all-over social media. These views/comments posted by the users cleaning is divided and compared with the words in the
can be analyzed for various purposes. This paper proposes a model dictionaries. Polarity is given to all the words and the overall
that will find the percentage of Hinglish text present in the text polarity is calculated. This classifies a text as positive, negative
retrieved from various social media platforms. The user can keep or neutral related to any product or issue. In the Machine
or discard the text for analysis depending on the percentage of Learning approach, data is preprocessed with the help of various
Hinglish text present. The accuracy value attained is 83%, this can preprocessing techniques and then it is fed to different
further increase when add more words in the dataset that is
classifying algorithms. Using these algorithms the data is
classifying the comments.
classified into various polarity i.e. Positive, Negative and
Keywords— Sentiment Analysis, social media, opinions, Hinglish Neutral.
text, Detection of Hinglish text Everything and anything is available on the internet and
every household has access to the internet and its services. In
I. INTRODUCTION India variety of languages are spoken and written, out of which
Sentiment Analysis has now become one of the interesting the most commonly used languages are Hindi and English. A lot
topics of research in the field of Artificial Intelligence. People of work on sentiment analysis on different languages is done
are writing or expressing their views on social platforms like individually but quite less on code mixed languages. Code
Facebook, Twitter, Youtube, Instagram etc. and a huge data is mixed script can be a combination of any two languages like
being generated every minute of the day. The views or Hindi-English(Hinglish), Spanish – English(Spanglish), or any
sentiments can be in any language, the need is to extract the other indigenous Indian language mixed with English. In this
relevant information out of it. The businesses can use this ever research work the focus will be on sentiments written in
growing volume of data for decision making process as well as Hinglish. The objectives of this research are as follows: To
to improve their policies in the future. Also people give their identify the percentage of Hinglish text present in the text to be
feedback about the quality of the product which can be extracted analysed, to create a model to find most frequently used words
and analyzed to improve its quality. So, the text available on the in Hinglish texts, creation of a domain specific
social media platforms if analyzed correctly can be proved lexicon(Cyberbullying) to calculate the polarity of given
useful in many ways. Hinglish words, classifying the sentiments using various
machine learning algorithms, to compare the results of
There exist two main ways to categorize textual
sentiments using lexicon based approach and machine learning
information: facts and opinions. Facts are objective in nature
algorithms.
whereas opinions are usually subjective. Today, the Web is a
place where everyone can post or share reviews about various As the technology has advanced , the availability and access
issues and products. Social media affects the user’s point of view to Internet services is with among the majority of the population
and hence the decisions. It has now become a crucial part of .And people tend to spend more time on Social Media platform
digital marketing. Sentiment analysis is quite similar to opinion consuming the content available of their choice. Also they
mining, with the help of which one can interpret the user’s view express their point of view regarding the same in their language
by finding the polarity of the text. Polarity means to determine of convenience. Earlier people used a single or Unicode
whether the text is having positive, negative or neutral emotions language as a mode of sharing their opinions regarding some
related to any product or issue. The customers nowadays read
979-8-3503-4438-7/23/$31.00 ©2023 IEEE

issue, person or about any product, service or a company even. use of the property that a word and its variation has similar
However now a days people express their opinions in code impact in a large noisy text. It is demonstrated that how word
mixed language such as Hindi mixed English, Hindi with
variations belonging to the same cluster of semantic space help
Bengali and other regional languages. The motive of this paper
is to understand and overview the work done in the earlier stages in finding the suitable word in the substitution phase.
in are of sentiment analysis of code mixed text, primarily
focusing on Hinglish Text(Code mix of Hindi and English). In the paper(Baroi, Singh,Das and Singh, 2020) the authors
This paper proposes a model that will find the percentage of have suggested a model for the sentiment analysis of Hinglish
Hinglish text present in the text retrieved from various social text. Data retrieved from twitter is pre-processed by removing
media platforms. The user can keep or discard the text for noise, punctuation, stop words stemming and label encoding.
analysis depending on the percentage of Hinglish text present.. NITS-Hinglish-Sentimix is an ensemble model that is built on
II. LITERATURE REVIEW detailed preprocessing and extensive model training.
In the paper (Sharma, Srinivas and Chandra, 2015)
various methods of text normalization are presented and Kumar and Vadlamani, (2016) briefed in their
polarity of the statement is calculated. The proposed model research paper document level sentiment classification on 300
does the following tasks: word level identification of language News articles in Hinglish. They used five feature selection
in the code-mixed text of English and Hindi, translation of methods: Correlation, Information Gain, t-statistics, Chi-
Romanized English language word and analyzing the sentiment square, and Gain Ratio.
of the text. Overall, 85% accuracy is achieved with a precision For classification, various Machine learning algorithms are
value of 0.80. The constraint of this approach suggested by the used such as CART, RBFNN, LR, SVM, J48, Random Forest,
authors is manual evaluation of model is done by limited Naïve Bayes, JRip. The best combination after classifying
participants and there is no standard dataset available for code sentiments is a triumvirate of TF-IDF, GR, RBFNN. The
mixed script. constraint of this approach is that small dataset is used to obtain
In the paper(Singh,2021), the sentiment of code mixed results. Also results are not compared with lexicon based
data is analyzed using machine learning algorithms. Initially approach.
data cleaning is done by removing tags and re-tweets,
normalizing different spellings, customized stop word removal In the (Kaur, Mangat and Krail,2are noter, a
and converting emoticons into words. Data transformation is comparison is done on the dictionary-based and machine-
done next using count vectorization and machine learning learning approaches of sentiment analysis. A dataset of 300
algorithms. It was concluded that the proposed Ensemble Hinglish movie reviews is created. After evaluating
Voting Classifier performed best. Ensemble Voting Classifier performance using metrics-Precision, Recall and F-measure, it
uses Logistic Regression, Random Forest and SVM. was found that Dictionary based approach gave better results.
The experiments are done on data set of smaller size, Hinglish
The (Thakur, Sahuand Omer,2020) paper discusses dictionary was created with respect to the size of data.
various techniques to identify sentiments of Hinglish text. In the (Sasidhar, Premjit and Soman,2019) paper, deep
Preprocessing of data is done using various methods such as learning approach is proposed for identifying emotions through
stop word removal, lemmatization, stemming, etc. Out of Hinglish text. The proposed model first convert sentences into
various classifiers used, Naïve Bayes algorithm performs vectors. Among various deep learning methods, CNN-BiLSTM
significantly better than the other approaches. model gave best results. Hybridization of model could have
been done and evaluation of same algorithm can be done on
The author (Khandelwal, Swami, Syed and Shrivastava, 2018) large corpus.
have analyzed the corpus of 3453 Hinglish tweets for humour
detection. The tweets are tagged either Humorous(H) and non- The author(Mishra, Veenugopalan and Gupt,2016) adapted
humorous(N). The features used in the classification system are Content specific lexicon ,the accuracy of sentiment
N-Gram, bag-of-words, common words and hashtags. Analysis classification achieved was 77% using the CSPLE model for
is done by using classification model of machine learning. Best movie domain and 88% in the hotel domain using CSPL
accuracy(69.3%) is yielded by SVM with radial function (Content Specific Polarity Lexicon) model. The polarity
kernel. lexicon coverage has increased with the inclusion of synonyms.
The author (Singh, Choudhary and Shrivastava, 2023) Unigram model has been used but with limited dataset,
have demonstrated the replacement of word with the same antonyms can also be added to the lexicon that is created.
meaning in code mixed social media text. This approach makes
The authors (Sharma,Srinivas and Chandra,2015) have created The system can help to boost teaching and learning processes
their own corpus of 500 from the feedback which were in universities by analyzing the sentiments of students, hence
collected manually. The data is collected from sites such as helping the administrators and teachers understand the areas
Facebook, YouTube. In the proposed work, first, language is which need improvement.
identified in the code mixed sentence, transliteration is done
after spelling correction and then removal of ambiguous words. In this paper (Babu and Kanaga, ,2022), authors have discussed
A lexicon-based approach is used to analyze the sentiments methods to identify social media data which includes texts and
with a precision value of 0.80. Here machine learning emoticons by utilizing various AI techniques. Multi-Class
algorithms can also be used to compare the results. Classification along with Deep Learning algorithms show
higher precision value when applied for sentiment analysis.In
In this paper authors(Shah and Kaushik,2019 ) have reviewed Multi-Class Classification the authors have utilized deep
the work in sentiment analysis for indigenous languages. The learning algorithms and the results obtained showed precise
majority of work done is by using various deep learning and classification with texts, emoticons and emojis. In the(K.
machine learning and limited work is done using the lexicon- Shalini, Ganesh, Kumar and Soman,2018) paper the author
based approach. It was observed that the SVM and LR are best created a Kannada-English code mixed corpus, which was not
in machine learning algorithms and the CNN performed best available previously, by crawling Facebook comments. In their
among deep learning algorithms. The majority of the work is experiments, the authors used code-mixed corpus offered by
focused on movie or political domain not much on rest areas SAIL-2017 which includes English along with regional
such as education, technology, sociology etc.In this paper language Bengali and Hindi. The analysis included using
authors (Utsav, Dhaiwat ,Vajpeyi, Mina and Srivastava, 2020) Facebook’s FastText(open source NLP Library for Facebook),
have tried to classify Hindi-English code mixed tweets based doc2vec followed by SVM classifier, Bidirectional Long Short
on the user’s viewpoint .An Annotated dataset consisting of Term Memory and Convolutional neural networks.
4219Hinglish tweets on scrapping of Article 370 is used
XgBoost is used to attain higher accuracy in both baseline and Altrabsheh, Cocea and Fallahkhair (2014) discussed the
new dataset. application of sentiment analysis in the real-time feedback
system of educational institutes. They worked towards to find
The authors (Bimantara, Larasati, Risondang , Naf’anZidny and the top model for automated analysis by looking at four aspects:
Nugraha,2019 ) have created a model that can classify preprocessing, attributes, machine learning techniques and the
comments based on whether the comments are aimed for cyber use of the neutral class. It was concluded that that the finest
bullying or not. Naïve Bayes is used for classification is.TF- result for the four aspects mentioned is SVM with an accuracy
IDF method is used to preprocess the data and K-Fold Cross of 95 percent.
Validation method is used for testing. The experiment is
performed into two parts, namely using stemming and without The paper(Altrabsheh Gaber and Cocea, 2013) have discussed
stemming. The experiments performed with stemming process how sentiment analysis on the data collected from the feedback
gave an accuracy of 83.53%. on academic parameters can help improve teaching. Naive
Bayes and SVM techniques gave better results as compared to
This paper (Patra , Das and Das ,2017) describes the details of other algorithms.
the competition ICON held in the year 2017.The objective of
the competition was to identify the sentiments using HI- The authors (Al sari, Alkhaldi, Alsaffar, 2022 ) have performed
EN(Hindi-English) and BN-EN(Bengali-English) code-mixed sentiment analysis on the views of passengers on Saudi cruises.
datasets .Here, SVM has been used for sentiment classification. The experiments were conducted on 1200 samples collected
from the social media posts. The data was categorized into
The paper(Rani and Kumar,2017) proposed sentiment analysis Positive or Negative. The machine learning algorithms have
model that performs temporal sentiment and emotion analysis used. Random Forest algorithm gave the highest accuracy.
on the data collected from multilingual student feedback related Also, the results exhibitthat 80% positive sentiments and 20%
to performance of teacher and course satisfaction. This model negative sentiments.
helps to classify sentiments broadly into two categories:
positive and negative. Also this model classifies emotions into The study done by (Păvăloaia, Teodor, Fotache and
eight categories suggested by Robert Plutchik’ i.e. anger, Danile,2019) analyzed customer reactions regarding two types
anticipation etc. A comparison of the proposed system’s results of posts (photographs and videos). Six social media website
was made with the direct assessments of the class performance. were used to collect the data. This study was focused to
understand the role of social media in the promotion of put in for classification into bully or non-bully class using a
beverages. The customer preferences were analyzed using the linear kernel , 10 fold cross validation was also performed.
sentiment analysis techniques applied to big sets of data.
III. PROPOSED METHODOLOGY
The (Srinivasan and Subalalitha, 2023) authors used sampling The methodology to find the percentage of Hinglish text in the
technique combined with Levenshtein distance metricto study given text is elaborated in Figure
the sentiments for imbalance in class of code-mixed data. The Retrieval of comments from
paper (Wadhawan and Aggarwal, 2021 ) discusses the yotube link.Displaying and
saving all the comments in a
implementation of various machine learning algorithms. separeate file including their
Levenshtein distance is used as the preprocessing technique for replies (if any).
Tamil-English and Hindi-English code-mixed data. The
experiments were conducted using deep learning method for
Applying various preprocessing
determining emotions in Hinglish text, mixed tweets, with the techniques on the comments r to
remove emoticons and irrelevant
help of bilingual word embedding derived from FastText and text present.
Word2Vec approaches in addition to transformer-based
models. Numerous deep learning models, accompanied by
transformers were used in the experiments. The BERT model
Create a training set of comments
performed the best. and assign class to them: Hinglish
and Non Hinglish.
In (Atoum, 2023) research, we have presented an approach to
identify cyberbullying from Twitter , based on Sentiment
Analysis with the help of machine learning techniques, i.e. A corpus is created by combining
Naïve Bayes and Support Vector Machine. After collection of the available Hinglish corpus . The
corpus will categorize the words as
tweets , various pre-processing techniques such as stop word Hinglish and Non-Hinglish.
removal, normalization, n-gram technique etc. are applied to
refine the data. The results indicated that the SVM classifiers
performed way better than the Naïve Bayes Classifiers with a Keep or discrad the text for futher
better accuracy value of 92.02%, with SVM it was 81.1%. analysis depending on the threshold
value entered by the user.
In this (Sherly and Jeetha ,2021) paper , the author proposed

that out of the conventionally available approaches for
Fig. 1 Proposed Methodology
detection of cyber bully is carried out using the technique
HRecRCNN (Hybrid Recurrent Residual Convolutional Neural IV. EXPERIMENTAL APPROACH
Network). After applying various preprocessing techniques
The following steps are included in the step-by-step
such as removal of punctuation, urls, emoticons etc. Then the approach.
available data is further classified using Modified Fruit Fly
Algorithm (MFFA) to choose the optimal features. The study
A. Step-1 Data Collection
shows that the HRecRCNN technique outperforms the current
The data is collected is in the form of comments from
algorithms used, in terms of better accuracy, precision and
various Youtube links. The comments posted may or may
lower time complexity.
not have their subsequent replies. The comments along
Nahar, Unankard, Li and Pang (2012) explained an effective with their replies are saved in a separate file as shown in
way of sentiment analysis technique and to identify cyber Figure 2.
bullying messages with the help of PLSA(Probabilistic latent
Sentiment Analysis) for feature selection. The feature selection
betters the accuracy of classifier. Also, they applied the
HITS(Hyperlink-Induced Topic Search ) algorithm to calculate
scores and rank the most influential persons i.e the predators or
victims). The depicted graph based model can be further used
to answer various queries about the user in terms of bullying.
LibSVM (A Library for Support Vector Machines) was also Fig. 2 Retrieval of Comments from YouTube Links
B. Step-2 Data Pre-Processing
In this step the data is pre-processed to remove emoticons
white spaces, special symbols, special characters, NaN
Removal. Figure 3 shows the pre-processed data.
Fig. 3 The Pre-Processed Data
C. Step-3 Classification of Hinglish and Non-Hinglish

Comments
In this step, a training dataset of comments is created and
they are classified as Hinglish and Non Hinglish Fig. 5 Hinglish Corpus
statements. The Class 0 indicates that the comment is in

E. Step-5 Setting the Threshold Value
any other language than Hinglish, Class 1 indicates that the
As explained in the methodology of this model in Section
comment is completely in Hinglish language. Figure 4
1, the user can set the minimum threshold value (in
shows the classification of comments.
percentage) to classify the comment as Hinglish or Non–
Hinglish. The percentage of Hinglish words in a comment
is calculated as in Equation 1.
.
∗ 100
.
Equation 1 Calculation of Threshold value
F. Step -6 Evaluation
Fig. 4 Classification of Comments
Further we evaluate the manual Tagging from the actual
D. Step-4 Corpus Creation class and the predicted class for attaining the level of
A corpus is created that by combining the following accuracy. Figure 6. indicates the value of Accuracy, Recall,
available corpus of Hinglish text: CMUHinglishDoG, f1-score calculated.
HinglishNorm, hinglish-corpus, Hinglish-TOP-Dataset.
This corpus helps to classify text. Figure 5 shows a glimpse
of the corpus used.
Fig. 6 Precision, Recall and F-score values.

[9] 2020. Proceedings of the International Conference on Innovative
Computing & Communications (ICICC) 2020,
DOI:http://dx.doi.org/10.2139/ssrn.3614442
[10] Sasidhar T.,Premjit B., Soman K.,2019 . “Emotion Detection in Hinglish
Code mixed Social media text” 3rd Int. Conf. on Computing and
Network Communication ,2019 ,pp 1346-1352,
DOI:https://doi.org/10.1016/j.procs.2020.04.144
[11] Mishra D.,Veenugopalan N., Gupt, D., 2016., “Content Specific lexicon
for Hindi reviews” 6th Int. Conf. on Advances in Computing &
Communication,2016, pp 554-563 DOI:
https://doi.org/10.1016/j.procs.2016.07.283
[12] Choudhary,N., Singh,R., Bindlish I., and Shrivastava, M.,2018.
“Sentiment Analysis of Code-Mixed Languages leveraging Resource
Rich Languages” 19th International Conference on Computational
Linguistics and Intelligent Text Processing, March 2018
DOI:https://doi.org/10.48550/arXiv.1804.00806
[13] Utsav J., Dhaiwat K.,Vajpeyi R., Mina M., Srivastava V., 2020. “Stance
Detection in Hindi-English Code-Mixed Data” Proceedings of the 7th
ACM IKDD CoDS and 25th COMAD, pp 359–360 , DOI:
https://doi.org/10.1145/3371158.3371226
[14] BimantaraAdisoka,A., Larasati, A.,Risondang ,E., Naf’anZidny,M.,
Fig. 7 Confusion Matrix Nugraha,N.,2019 “Sentiment Analysis of Cyberbullying on Instagram
User Comments”. Journal of Data Science and Its Applications , pp 88-
98, DOI:10.21108/jdsa.2019.2.20
V. RESULTS AND DISCUSSIONS [15] Patra Gopal B., Das D., Das A.,2017.”Sentiment Analysis of Code-
Mixed Indian Languages: An Overview of SAIL Code-Mixed Shared
In the experiments conducted the threshold value is taken as Task” @ICON-2017,
5%. It means that out of the total words in the comments if the DOI: https://doi.org/10.48550/arXiv.1803.06745
percentage of Hinglish words is 5% or above then that comment [16] Babu, N.V., Kanaga, E.G.M.,2022,“Sentiment Analysis in Social Media
will be taken further for sentiment analysis. Data for Depression Detection Using Artificial Intelligence: A
Review.” SN COMPUT. SCI. 3, 74 (2022). DOI:
Visualization through confusion matrix depicts the https://doi.org/10.1007/s42979-021-00958-1
classification of whether the model is accurate in detecting the [17] Khandelwal A., Swami S, Syed S. Akhtar, and Manish Shrivastava.
Hinglish Text present. Figure 7. explains whether that the 2018.” Humor Detection in English-Hindi Code-Mixed Social Media
model classifies the text Hinglish or English. This shows that Content : Corpus and Baseline System.” In Proceedings of the Eleventh
International Conference on Language Resources and Evaluation
the model can accurately identify Hinglish text out of the (LREC 2018), Miyazaki, Japan. European Language Resources
available text by 83 %, this can further be increased when more Association (ELRA), pp1203-1207
words are added in the corpus. This work will be extended for [18] Singh, R., Choudhary, N., Shrivastava, M.,2023. “Automatic
sentiment analysis of Hinglish text in cyberbullying Normalization of Word Variations in Code-Mixed Social Media Text.”
.Computational Linguistics and Intelligent Text Processing.CICLing
identification. 2018. Lecture Notes in Computer Science, vol 13396. Springer,
Cham.pp371-381. DOI: https://doi.org/10.1007/978-3-031-23793-5_30
REFERENCES [19] N. Altrabsheh, M. Cocea and S. Fallahkhair,2014, "Sentiment Analysis:
[1] Kaur,H., Mangat,V., Krail, N.,2017. “Dictionary based sentiment analysis Towards a Tool for Analysing Real-Time Students Feedback," 2014
of Hinglish text and comparision with Machine learning IEEE 26th International Conference on Tools with Artificial
Algorithms“,International Journal of Metadata, Semantics and Ontologies, Intelligence, Limassol, Cyprus, 2014, pp. 419-423, doi:
Vol.12 ,No 2/3,2017,pp 90-102 DOI: 10.1109/ICTAI.2014.70.
https://doi.org/10.1504/IJMSO.2017.090759 [20] Al sari, B., Alkhaldi, R., Alsaffar, D.,2022 “Sentiment analysis for
[2] Shah Rajesh,S.,Kaushik,A.,2019. “Sentiment Analysis on Indian cruises in Saudi Arabia on social media platforms using machine
Indigenous Languages: A review on multilingual opinion mining”, learning algorithms. J Big Data 9, 21 (2022). DOI:
arxiv.org(2019) https://doi.org/10.1186/s40537-022-00568-5
DOI: https://doi.org/10.48550/arXiv.1911.12848 [21] Păvăloaia, V.-D.; Teodor, E.-M.; Fotache, D.; Danileţ,
[3] Sharma, S.,Srinivas,PYKL., Chandra, R.,2015. ”Sentiment analysis of M.,2019,“Opinion Mining on Social Media Data: Sentiment Analysis of
Code-Mix Script”,International Conference of Computing and Network User Preferences”. Sustainability 2019, 11, 4459. DOI:
Communications(2015),pp 1468-1473, DOI: https://doi.org/10.3390/su11164459
10.1109/CoCoNet.2015.7411238 [22] Srinivasan, R., Subalalitha, C.N. 2023, “Sentimental analysis from
[4] Kumar, R., ,Vadlamani, R.,2016. “Sentiment Classification of Hinglish imbalanced code-mixed data using machine learning
Text”, 3rd Int. co nference approaches”. Distrib Parallel Databases 41, pp 37–52 (2023). DOI:
of Recent Advances in Information https://doi.org/10.1007/s10619-021-07331-4
Technology(2016),DOI: 10.1109/RAIT.2016.7507974 [23] Wadhawan A., Aggarwal A. 2021. “Towards Emotion Recognition in
[5] Sharma S., Srinivas P. , Chandra R.,2015. “Text Normalization of Code Hindi-English Code-Mixed Data: A Transformer Based Approach.”
Mix and Sentiment Analysis”.International Conference on Advances in In Proceedings of the Eleventh Workshop on Computational Approaches
Computing, Communication and to Subjectivity, Sentiment and Social Media Analysis, pp 195–202
Informatics(2015),DOI: 10.1109/ICACCI.2015.7275819 [24] Kaur G, Kaushik A, Sharma S., 2019, “Cooking Is Creating Emotion: A
[6] Baroi, Jyoti,S., Singh ,N., Das, R., Singh, Doren, T., 2020, ”Sentiment Study on Hinglish Sentiments of Youtube Cookery Channels Using
Analysis for code mixed Social media text using an Ensemble Semi-Supervised Approach.” Big Data and Cognitive Computing. 2019;
model”SemEval-2020, pp .1298–1303,DOI: 3(3):37. DOI: https://doi.org/10.3390/bdcc3030037
[7] 10.18653/v1/2020.semeval-1.175 [25] Singh ,P., Lefever, E., . 2020.” Sentiment Analysis for Hinglish Code-
[8] Thakur V., Sahu R., Omer S., 2020. “ Current state of Hinglish Text mixed Tweets by means of Cross-lingual Word Embeddings.”
Sentiment Analysis” In Proceedings of the The 4th Workshop on Computational Approaches
to Code Switching, pp 45–51.
[31] Rani S.,and Kumar,P.,2017. "A Sentiment Analysis System to Improve
[26] Ghosh , S., Priyankar, A., Ekbal A. , Bhattacharyya P. 2023. Teaching and Learning," in Computer, vol. 50, no. 5, pp. 36-43, May 2017,
”,Multitasking of sentiment detection and emotion recognition in code- DOI: 10.1109/MC.2017.133.
mixed Hinglish data, “Knowledge-BasedSystems, Volume 260, 2023,DOI:
https://doi.org/10.1016/j.knosys.2022.110182. [32] Mehta, P., and Pandya, S., 2020. “A review on sentiment analysis
[27] S. Das and T. Singh,2023. "Sentiment Recognition of Hinglish Code methodologies, practices and applications.” International Journal of
Mixed Data using Deep Learning Models based Approach," 2023 13th Scientific and Technology Research, 9(2), pp 601-609.
International Conference on Cloud Computing, Data Science & [33] Nahar, V., Unankard, S., Li, X., Pang, C. (2012). “Sentiment Analysis for
Engineering (Confluence), Noida, India, 2023, pp. 265-269, DOI: Effective Detection of Cyber Bullying.” In: Sheng, Q.Z., Wang, G., Jensen,
10.1109/Confluence56041.2023.10048879. C.S., Xu, G. (eds) Web Technologies and Applications. APWeb 2012.
[28] Altrabsheh, N., Gaber, M. M., &Cocea, M. 2013. “SA-E: sentiment Lecture Notes in Computer Science, vol 7235. Springer, Berlin,
analysis for education.”In International conference on intelligent decision Heidelberg. https://doi.org/10.1007/978-3-642-29253-8_75.
technologies Vol. 255, pp. 353-362. [34] Sherly T., Jeetha B.,2021, “Sentiment Analysis and Deep Learning Based
[29] K. Shalini, H. B. Ganesh, M. A. Kumar and K. P. Soman,2018. "Sentiment Cyber Bullying Detection in Twitter Dataset” International Journal of
Analysis for Code-Mixed Indian Social Media Text With Distributed Recent Technology and Engineering (IJRTE), Volume-10 Issue-4, pp 15-
Representation," 2018 International Conference on Advances in 25.
Computing, Communications and Informatics (ICACCI), Bangalore, India, [35] Atoum, J. O. (2023, March). “Detecting Cyberbullying from Tweets
2018, pp. 1126-1131,DOI: 10.1109/ICACCI.2018.8554835. Through Machine Learning Techniques with Sentiment Analysis.”
[30] Babu, N.V., Kanaga, E.G.M.,2022.“Sentiment Analysis in Social Media In Future of Information and Communication Conference (pp. 25-38).
Data for Depression Detection Using Artificial Intelligence: A Cham: Springer Nature Switzerland.
Review.” SN COMPUT. SCI. 3, 74 (2022).https://doi.org/10.1007/s42979- [36] CMUHinglishDoG: https://huggingface.co/datasets/cmu_hinglish_dog
021-00958-1 [37] HinglishNorm: https://aclanthology.org/2020.coling-industry.13/
[38] Hinglish TOP Dataset: https://research.google/resources/datasets/hinglish-
top/

Icaiccit 719

Uploaded by

Copyright:

Available Formats

You might also like

Icaiccit 719

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Icaiccit 719

Uploaded by

Copyright:

Available Formats

2023 International Conference on Advances in Computation, Communication and Information Technology (ICAICCIT)

An Efficient Model to Detect the Presence of

979-8-3503-4438-7/23/$31.00 ©2023 IEEE

In this (Sherly and Jeetha ,2021) paper , the author proposed

Fig. 3 The Pre-Processed Data

C. Step-3 Classification of Hinglish and Non-Hinglish

statements. The Class 0 indicates that the comment is in

Equation 1 Calculation of Threshold value

Fig. 6 Precision, Recall and F-score values.

You might also like