Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

“JNANA SANGAMA”, BELGAUM – 590018

REPORT
On
“ONLINE PUBLIC SHAMMING ON TWITTER: DETECTION,
ANALYSIS, AND MITIGATION”

Submitted in partial fulfillment of the


Requirements for the award of the degree of

BACHELOR OF ENGINEERING
In
INFORMATION SCIENCE & ENGINEERING
By
POOJA S K 4MO17IS021
SHAMBHAVI H 4MO17IS024
ARSHAD PASHA 4MO17IS006

UNDER THE GUIDANCE OF HEAD OF THE DEPARTMENT


SHALINI M.S Mr. JAYARAM C.V
B.E, MTECH B.E., M.TECH.
Asst. Professor Asst. Professor & Head of the
Dept. of Computer science and Dept. of Information science and
Engineering, Mysore College of Engineering, Mysore College of
Engineering and Management, Engineering and Management,
Mysuru-570028 Mysuru -570028

DEPARTMENT OF INFORMATION SCIENCE AND ENGINEERING


MYSORE COLLEGE OF ENGINEERING AND MANAGEMENT,
MYSURU– 570028, KARNATAKA STATE, INDIA
2020-2021
DEPARTMENT OF INFORMATION SCIENCE AND ENGINEERING

CERTIFICATE

Certified that the project work entitled “ONLINE PUBLIC SHAMMING ON TWITEER:
ANALYSIS, DETECTION AND MITIGATION” is a bonafide work carried out by Pooja S
K(4MO17IS021), Shambhavi H(4MO17IS024), and Arsahad pasha(4MO17IS006),(in
partial fulfillment of VIII semester course of Bachelor of Engineering in Information Science
and Engineering of the Visveshvaraya Technological University, Belgaum during the year
2020-21.It is certified that all corrections/suggestions indicated have been incorporated in the
report. The project report has been approved as it satisfies the academic requirements with
respect to the Project Work prescribed for Bachelor of Engineering Degree.

Signature of the Guide Signature of the HOD

Shalini M S Mr. Jayram C V


Assistant Professor Assistant Professor &
Dept of CSE Head of the department ISE

Signature of the Principal

MyCEM, Mysuru

Name of the Examiner Signature


with date

1.

2.
ACKNOWLEDGEMENT

It is the profound feeling of gratitude I would like to express my sincere thanks


to our Institution MYSORE COLLEGE OF ENGINERRING AND MANAGEMENT
for providing Excellent infrastructure for the successful completion of this project.

My hearty thanks are also due towards our JAYRAM C V(HOD) whose timely
support and suggestion went a long way in the completion of the project.

I would like to thank my guide. Shalini M S Asst prof for his support and guidance
in completion of this project.

I wish to express a whole hearted thanks to our principal Dr Prabhus w a m y M S ,


MyCEM for providing excellent infrastructure to pursue the work in the college.

I would like to thank all the Teaching and Non-Teaching staff who is directly
or indirectly responsible for carrying out this project successfully.

I extend a very hearty thanks to my parents and Friends for all the moral
support they provide during the preparation for the project.

POOJA S.K (4MO17ISO21)

SHAMBHAVI H (4MO17IS024)

ARSHAD PASHA(4MO17IS006)
ABSTRACT
Public shaming in online social networks and related online public forums like Twitter
has been increasing in recent years. These events are known to have devastating impact on the
victim’s social, political and financial life. Notwithstanding its known ill effects, little has been
done in popular online social media to remedy this, often by the excuseof large volume and
diversity of such comments and therefore unfeasible number of human moderators required to
achieve the task. In this paper, we automate the task of publicshaming detection in Twitter
from the perspective of victims and explore primarily two aspects, namely, events and
shammers. Shaming tweets are categorized into six types- abusive, comparison, passing
judgment, religious/ethnic, sarcasm/joke and whataboutery and each tweet is classified into one
of these types or as non-shaming. It is observed that out of allthe participating users who post
comments in a particular shaming event, majority of themare likely to shame the victim.
Interestingly, it is also the shamers whose follower counts increase faster than that of the non-
shamers in Twitter. Finally, based on categorization and classification of shaming tweets, an
web application called BlockShame has been designed and deployed for on-the-fly
muting/blocking of shamers attacking a victim on the Twitter.
INDEX
Sl.no Chapter Names Page.no
1 Introduction 1

1.1 Overview of the project 2


1.2 Natural language processing 4
1.3 Sentiment analysis 6
1.4 Types 7
1.5 Artificial intelligence 8
2 Literature Survey 9

3 System Analysis 12
3.1 Aim of the Project 12
3.2 Existing System 12
3.3 Proposed System 13
4 Requirements 14
4.1 Hardware Requirements 14
4.2 Software Requirements 14
5 System Design 15
5.1 Problem Statement 15
5.2 Fundamental Design Concepts 15
5.3 System Architecture 16
5.4 Data Flow Diagram 17
5.5 Use Case Diagram 17
5.6 Sequence Diagram 19
6 Implementation 20
6.1 Programming Language Selection 20
6.2 Platform Selection 20
6.3 Code Conventions 21
6.4 Class Declarations 23
6.5 Difficulties Encountered and Strategies used to tackle 24
6.6 Software Environment 24
7 Software Testing 33
7.1 Test Environment 33
7.2 Unit Testing of Main modules 33
7.3 Integration Testing of Modules 35
7.4 System Testing 36

List of Figures
Sl.no Chapter Names Page.no
1 Introduction 1
2 Literature Survey 9
3 System Analysis 12
4 Requirements 14
4.1 Hardware Requirements 14
4.2 Software Requirements 14
5 System Design 15
Fig 5.3: System Architecture 16
Fig 5.4: Data flow Diagram 17
Fig 5.5: Use Case Diagram 17
Fig 5.5.1: Use Case Diagram for NLP 18
Fig 5.5.2: Use Case Diagram for Weka 18
Fig 5.6: Sequence Diagram 19
6 Implementation 20
Fig 6.6: Java Interpreter 25
Fig 6.6.1: Java Compiler 26
Fig 6.6.2: Java Flatform 27
Fig 6.6.3: Java IDE 29
7 Software Testing 33
Table 7.2.1 Unit Test Case 1 for Registration Page 34
Table 7.2.2 Unit Test Case 2 for Message Post 34
Table 7.2.3 Unit Test Case 3 for Sentiment Analysis 34
Table 7.2.4 Unit Test Case 4 for Data Prediction 35
Table 7.3.1 Integration Test Case 1 for Sender Registration and Post 35
Messages
Table 7.3.2 Integration Test Case 2 for sentiment analysis and 35
prediction
Table 7.4: System Test Case 36
Online public shamming on twitter

CHAPTER 1

INTRODUCTION
ONLINE SOCIAL networks (OSNs) are frequently flooded with scathing remarks
against individuals or organizations on their perceived wrongdoing. When some of these
remarks pertain to objective fact about the event, a sizable proportion attempts to malign the
subject by passing quick judgments based on false or partially true facts. Limited scope of fact
check ability coupled with the virulent nature of OSNs often translates into ignominy or
financial loss or both for the victim.

Negative discourse in the form of hate speech, bullying, profanity, flaming, trolling,
etc., in OSNs is well studied in the literature. On the other hand, public shaming, which is
condemnation of someone who is in violation of accepted social norms to arouse feeling of
guilt in him or her, has not attracted much attention from a computational perspective.
Nevertheless, these events are constantly being on the rise for some years. Public shaming
events have far reaching impact on virtually every aspect of victim‘s life. Such events have
certain distinctive characteristics that set them apart from other similar phenomena- (a) a
definite single target or victim (b) an action committed by the victim perceived to be wrong
(c) a cascade of condemnation from the society. In public shaming, a shamer is seldom
repetitive as opposed to bullying. Hate speech and profanity are sometimes part of a shaming
event but there are nuanced forms of shaming such as sarcasm and jokes, comparison of the
victim with some other persons, etc., which may not contain censored content explicitly.

The enormous volume of comments which is often used to shame an almost unknown
victim speaks of the viral nature of such events. For example, when Justine Sacco, a public
relations person for American Internet Company tweeted ―Going to Africa. Hope I don‘t get
AIDS. Just kidding. I‘m white!‖, she had just 170 followers. Soon, a barrage of criticisms
started pouring in, and the incident became one of the most talked about topics on Twitter, and
the Internet in general, within hours. She lost her job even before her plane landed in South
Africa.

What is common for a diverse set of shaming events we have studied is that the victims
are subjected to punishments disproportionate to the level of crime they have apparently
committed. In Table 1, we have listed the victim, year in which the event took place, action
that triggered public shaming along with the triggering medium, and its

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 1


Online public shamming on twitter

immediate consequences for each studied event. ‗Trigger‘ is the action or words spoken by the
‗Victim‘ which initiated public shaming. ‗Medium of triggering‘ is the firstcommunication
media through which general public became aware of the ‗Trigger‘. The consequences for the
victim, during or shortly after the event, are listed in ‗Immediate consequences‘. Henceforth,
the two letter abbreviations of the victim‘s name will be used to refer to the respective shaming
event.

1.1 Overview of project

The enormous volume of comments which is often used to shame an almost unknown
victim speaks of the viral nature of such events. For example, when Justine Sacco, a public
relations person for American Internet Company tweeted ―Going to Africa. Hope I don‘t get
AIDS. Just kidding. I‘m white!‖ she had just 170 followers. Soon, a barrage of criticisms started
pouring in, and the incident became one of the most talked about topics on Twitter and the
Internet, in general, within hours. She lost her job even before her plane landed in South Africa.
Jon Ronson‘s ―So You‘ve Been Publicly Shamed‖ [1] presents an account of several online
public shaming victims. What is common for a diverse set of shaming events we have studied
is that the victims are subjected to punishments disproportionate to the level of crime they have
apparently committed. In Table I, we have listed the victim, year in which the event took place,
action that triggered public shaming along with the triggering medium, and its immediate
consequences for each studied event. ―Trigger‖ is the action or words spoken by the
―Victim‖ which initiated public shaming. ―Medium of triggering‖ is the first communication
media through which general public became aware of the ―Trigger.‖ The consequences for
the victim, during or shortly after the event, are listed in ―Immediate consequences.‖
Henceforth, the two-letter abbreviations of the victim‘s name will be used to refer to the
respective shaming event.

Firstly, labeling data is labor intensive and time consuming. Secondly, cyberbullying
is hard to describe and judge from a third view due to its intrinsic ambiguities. Thirdly, due to
protection of Internet users and privacy issues, only a small portion of messages are left on
the Internet, and most bullying posts are deleted. As a result, the trained classifier may not
generalize well on testing messages that contain nonactivated but discriminative features. The
goal of this present study is to develop methods that can learn robust and discriminative
representations to tackle the above problems in cyberbullying detection. Some approaches have
been proposed to tackle these problems by incorporating expert knowledge into feature

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 2


Online public shamming on twitter

learning. Yin et.al proposed to combine BoW features, sentiment features and contextual
features to train a support vector machine for online harassment detection. Dinakar et.al utilized
label specific features to extend the general features, where the label specific features are
learned by Linear Discriminative Analysis. In addition, common sense knowledge was also
applied. Nahar et.al presented a weighted TF-IDF scheme via scaling bullying-like features by
a factor of two. Besides content-based information, Maral et.al proposed to apply users‘
information, such as gender and history messages, and context information as extra features.
But a major limitation of these approaches is that the learned feature space still relies on the
BoW assumption and may not be robust. In addition, the performance of these approaches rely
on the quality of hand-crafted features, which require extensive domain knowledge. In this
paper, we investigate one deep learning method named stacked denoising autoencoder (SDA).
SDA stacks several denoising autoencoder and concatenates the outputof each layer as the
learned representation.
Each denoising autoencoder in SDA is trained to recover the input data from a corrupted
Version of it. The input is corrupted by randomly setting some of the input to zero, which is
Called dropout noise. This denoising process helps the auto encoders to learn robust
representation. In addition, each auto encoder layer is intended to learn an increasingly abstract
representation of the input.
In this paper, we develop a new text representation model based on a variant of SDA:
marginalized stacked denoising auto encoders (mS-DA), which adopts linear instead of
nonlinear projection to accelerate training and marginalizes infinite noise distribution in order
to learn more robust representations. We utilize semantic information to expand mSDA and
develop Semantic-enhanced Marginalized Stacked Denoising Autoencoders (smSDA). The
semantic information consists of bullying words. An automatic extraction of bullying words
based on word embeddings is proposed so that the involved human labor can be reduced.
During training of smSDA, we attempt to reconstruct bullying features from other
normal words by discovering the latent structure, i.e. correlation,between bullying and normal
words. The intuition behind this idea is that some bullying messages donot contain bullying
words. The correlation information discovered by smSDA helps to reconstruct bullying
features from normal words, and this in turn facilitates detection of bullying messages without
containing bullying words. For example, there is a strong correlation between bullying word
fuck and normal word off since they often occur together. If bullying messages do not contain
such obvious bullying features, such as fuck is often misspelled as fck, the correlation may help
to reconstruct the bullying features from normal ones so that the

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 3


Online public shamming on twitter

bullying message can be detected. It should be noted that introducing dropout noise has the
effects of enlarging the size of the dataset, including training data size, which helps alleviate
the data sparsity problem. In addition, L1 regularization of the projection matrix is added to the
objective function of each autoencoder layer in our model to enforce the sparstiy of projection
matrix, and this in turn facilitates the discovery of the most relevant terms for reconstructing
bullying terms. The main contributions of our work can be summarized as follows:
Our proposed Semantic-enhanced Marginalized Stacked Denoising Autoencoder is able
to learn robust features from BoW representation in an efficient and effective way. These robust
features are learned by reconstructing original input from corrupted(i.e.,missing) ones. The new
feature space can improve the performance of cyberbullying detection even with a small
labeled training corpus.
* Semantic information is incorporated into the reconstruction process via the designing
of semantic dropout noises and imposing sparsity constraints on mapping matrix. In our
framework,high-quality semantic information, i.e., bullying words, can be extracted
automatically through word embeddings. Finally, these specialized modifications make
the new feature space more discriminative and this in turn facilitates bullying detection.
* Comprehensive experiments on real-data sets have verified the performance of our
proposed model

1.2 Natural language processing

Natural language processing is a field of computer science, artificial intelligence, and


computational linguistics concerned with the interactions between computers and human
(natural) languages. As such, NLP is related to the area of human–computer interaction. Many
challenges in NLP involve: natural language understanding, enabling computers to derive
meaning from human or natural language input; and others involve natural language
generation.

The field of study that focuses on the interactions between human language and
computers is called Natural Language Processing, or NLP for short. It sits at the intersection
of computer science, artificial intelligence, and computational linguistics (Wikipedia).

―Natural Language Processing is a field that covers computer understanding and ma-
nipulation of human language, and it‘s ripe with possibilities for newsgathering,‖ Anthony

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 4


Online public shamming on twitter

Pesce said in Natural Language Processing in the kitchen. ―You usually hear about it in the
context of analyzing large pools of legislation or other document sets, attempting to discover
patterns or root out corruption.

What Can Developers Use NLP Algorithms for?

NLP algorithms are typically based on machine learning algorithms. Instead of hand-
coding large sets of rules, NLP can rely on machine learning to automatically learn these rules
by analysing a set of examples (i.e. a large corpus, like a book, down to a collection of
sentences), and making a statical inference. In general, the more data analyzed, the more
accurate the model will be.

• Summarize blocks of text using Summarizer to extract the most important and
central ideas while ignoring irrelevant information.
• Create a chat bot using ParseyMcParseface, a language parsing deep learning model
made by Google that uses Point-of-Speech tagging.
• Automatically generate keyword tags from content using AutoTag, which leverages
LDA, a technique that discovers topics contained within a body of text.
• Identify the type of entity extracted, such as it being a person, place, or organization
using Named Entity Recognition.
• Use Sentiment Analysis to identify the sentiment of a string of text, from very
negative to neutral to very positive.
• Reduce words to their root, or stem, using Porter Stemmer, or break up text into
tokens using Tokenizer.

1.2.1 Open Source NLP Libraries

These libraries provide the algorithmic building blocks of NLP in real-world applications.
Algorithmia provides a free API endpoint for many of these algorithms, without ever having
to setup or provision servers and infrastructure.

• Apache OpenNLP: a machine learning toolkit that provides tokenizers, sentence


segmentation, part-of-speech tagging, named entity extraction, chunking, parsing,
coreference resolution, and more.

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 5


Online public shamming on twitter

• Natural Language Toolkit (NLTK): a Python library that provides modules for
processing text, classifying, tokenizing, stemming, tagging, parsing, and more.
• Standford NLP : A suite of NLP tools that provide part-of-speech tagging, the named
entity recognizer, coreference resolution system, sentiment analysis, and more.
• MALLET: A Java package that provides Latent Dirichlet Allocation, document
classification, clustering, topic modeling, information extraction, and more.

1.2.2 A Few NLP Examples

• Use Summarizer to automatically summarize a block of text, exacting topic sentences,


and ignoring the rest.
• Generate keyword topic tags from a document using LDA (Latent Dirichlet
Allocation), which determines the most relevant words from a document. This
algorithm is at the heart of the Auto-Tag and Auto-Tag URLmicroservices.
• Sentiment Analysis, based on StanfordNLP, can be used to identify the feeling,
opinion, or belief of a statement, from very negative, to neutral, to very positive. Often,
developers with use an algorithm to identify the sentiment of a term in a sentence, or
use sentiment analysis to analyze social media.

1.3 Sentiment analysis

Sentiment analysis (also known as opinion mining) refers to the use of natural
language processing, text analysis and computational linguistics to identify and extract
subjective information in source materials. Sentiment analysis is widely applied to reviews and
social media for a variety of applications, ranging from marketing to customer service.

Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a


writer with respect to some topic or the overall contextual polarity of a document. The attitude
may be his or her judgment or evaluation (see appraisal theory), affective state (that is to say,
the emotional state of the author when writing), or the intended emotional communication (that
is to say, the emotional effect the author wishes to have on the reader).

A basic task in sentiment analysis is classifying the polarity of a given text at the
document, sentence, or feature/aspect level—whether the expressed opinion in a document, a
sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, "beyond

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 6


Online public shamming on twitter

polarity" sentiment classification looks, for instance, at emotional states such as "angry",
"sad", and "happy".

1.4 Types

Early work in that area includes Turney and Pang who applied different methods for
detecting the polarity of product reviews and movie reviews respectively. This work is at the
document level. One can also classify a document's polarity on a multi-way scale, which was
attempted by Pang and Snyder among others: Pang and Lee expanded the basic task of
classifying a movie review as either positive or negative to predict star ratings on either a 3 or
a 4 star scale, while Snyder performed an in-depth analysis of restaurant reviews, predicting
ratings for various aspects of the given restaurant, such as the food and atmosphere (on a five-
star scale). Even though in most statistical classification methods, the neutral class is ignored
under the assumption that neutral texts lie near the boundary of the binary classifier, several
researchers suggest that, as in every polarity problem, three categories must be identified.

Moreover, it can be proven that specific classifiers such as the Max Entropy and the
SVMs can benefit from the introduction of a neutral class and improve the overall accuracy of
the classification. There are in principle two ways for operating with a neutral class. Either,the
algorithm proceeds by first identifying the neutral language, filtering it out and then assessing
the rest in terms of positive and negative sentiments, or it builds a three way classification in
one step. This second approach often involves estimating a probability distribution over all
categories (e.g. Naive Bayes classifiers as implemented by Python's NLTK kit).

Whether and how to use a neutral class depends on the nature of the data: if the data is
clearly clustered into neutral, negative and positive language, it makes sense to filter the neutral
language out and focus on the polarity between positive and negative sentiments. If, in
contrast, the data is mostly neutral with small deviations towards positive and negative affect,
this strategy would make it harder to clearly distinguish between the two poles.

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 7


Online public shamming on twitter

1.5 Artificial intelligence

Artificial intelligence (AI) is intelligence exhibited by machines. In computer science,


the field of AI research defines itself as the study of "intelligent agents": any device that
perceives its environment and takes actions that maximize its chance of success at some goal.
Colloquially, the term "artificial intelligence" is applied when a machine mimics "cognitive"
functions that humans associate with other human minds, such as "learning" and "problem
solving". As machines become increasingly capable, mental facilities once thought to require
intelligence are removed from the definition. For example, optical character recognition is no
longer perceived as an exemplar of "artificial intelligence", having become a routine
technology. Capabilities currently classified as AI include successfully understanding human
speech, competing at a high level in strategic game systems (such as Chess and Go), self-
driving cars, intelligent routing in Content Delivery Networks, and interpreting complex data.

AI research is divided into subfields that focus on specific problems or on specific


approaches or on the use of a particular tool or towards satisfying particular applications. The
central problems (or goals) of AI research include reasoning, knowledge, planning, learning,
natural language processing (communication), perception and the ability to move and
manipulate objects. General intelligence is among the field's long-term goals. Approaches
include statistical methods, computational intelligence, and traditional symbolic AI. Many
tools are used in AI, including versions of search and mathematical optimization, logic,
methods based on probability and economics. The AI field draws upon computer science,
mathematics, psychology, linguistics, philosophy, neuroscience and artificial psychology.

The field was founded on the claim that human intelligence "can be so precisely
described that a machine can be made to simulate it". This raises philosophical arguments about
the nature of the mind and the ethics of creating artificial beings endowed with human- like
intelligence, issues which have been explored by myth, fiction and philosophy since antiquity.
Some people also consider AI a danger to humanity if it progresses unabatedly. Attempts to
create artificial intelligence have experienced many setbacks, including theALPAC report of
1966, the abandonment of perceptrons in 1970, the Lighthill Report of 1973, the second AI
winter 1987–1993 and the collapse of the Lisp machine market in 1987.

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 8


Online public shamming on twitter

CHAPTER 2

LITERATURE SURVEY

Karthik Dinakar, Roi Reichart, Henry Lieberman ―Modeling the Detection of


Textual Cyberbullying‖ MIT Media Lab, computer Science & Artificial Intelligence
Laboratory Massachusetts Institute of Technology Cambridge, MA 02139 USA.
The scourge of cyber bullying has assumed alarming proportions with an ever-
increasing number of adolescents admitting to having dealt with it either as a victim or as a
by stander. Anonymity and the lack of meaningful supervision in the electronic medium are
two factors that have exacter- bated this social menace. Comments or posts involving sensitive
topics that are personal to an individual are more likely to be internalized by a victim, often
resulting in tragic outcomes. We decompose the overall detection problem in- todetection of
sensitive topics, lending itself into text classification sub-problems. We experiment with a
corpus of 4500 YouTube comments, applying a range of binary and multiclass classifiers. We
find that binary classifiers for in- dividable labels outperform multiclass classifiers. Our find-
ings show that the detection of textual cyber bullying can be tackled by building individual
topic-sensitive classifiers.

Andreas M. Kaplan*, Michael Haenlein “Users of the world, unite! The challenges
and opportunities of Social Media‖ ESCP Europe, 79 Avenue de la Republique, F-75011
Paris, France
The concept of Social Media is top of the agenda for many business executives today.
Decision makers, as well as consultants, try to identify ways in which firms can make profitable
use of applications such as Wikipedia, YouTube, Facebook, Second Life, and Twitter. Yet
despite this interest, there seems to be very limited understanding of what the term ‗‗Social
media ‘‘sexactly means; this article intends to provide some clarification. We begin by
describing the concept of Social Media, and discuss how it differs from related concepts such
as Web 2.0 and User Generated Content. Based on this definition, we then provide a
classification of Social Media which groups applications currently subsumed under the
generalized term into more specific categories by characteristic: collaborative projects, blogs,
content communities, social networking sites, virtual game worlds, and virtual social worlds.
Finally, we present 10 pieces of advice for companies which decide to utilize Social Media.

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 9


Online public shamming on twitter

Qianjia Huang, Vivek K. Singh, Pradeep K. Atrey “Cyber Bullying Detection


Using Social and Textual Analysis‖ Department of Applied Computer Science The
University of Winnipeg, Winnipeg, MB, Canada, The Media Lab Massachusetts
Institute of Technology Cambridge, MA, USA, Department of Computer Science
University at Albany – SUNY Albany, NY, USA.
Cyber Bullying, which often has a deeply negative impact on the victim, has grown as
a serious issue among adolescents. To understand the phenomenon of cyber bullying, experts
in social science have focused on personality, social relationships and psychological factors
involving both the bully and the victim. Recently computer science researchers have also come
up with automated methods to identify cyber bullying messages by identifying bullying-related
keywords in cyber conversations. However, the accuracy of these textual feature based methods
remains limited. In this work, we investigate whether analysing social network features can
improve the accuracy of cyber bullying detection. By analysing the social network structure
between users and deriving
features such as number of friends, network embeddedness, and relationship centrality,
we find that the detection of cyber bullying can be significantly improved by integrating the
textual features with social network features

CHAD EBESUTANI, MATTHEW FIERSTEIN, ANDRES G. VIANA, JOHN


YOUNG,” The Role Of Loneliness In The Relationship Between Anxiety And Depression In
Clinical And School-Based Youth‖ Duksung Women’s University, University of California
at Los Angeles,University of Mississippi Medical Center, University of Mississippi,
MANUEL SPRUNG International Psychology and Psychotherapy Center, Vienna.

Identifying mechanisms that explain the relationship between anxiety and depression
are needed. The Tripartite Model is one model that has been proposed to help explain the
association between these two problems, positing a shared component called negative affect.
The objective of the present study was to examine the role of loneliness in relation to anxiety
and depression. A total of 10,891 school-based youth (Grades 2–12) and 254 clinical children
and adolescents receiving residential treatment (Grades 2–12) completed measures of
loneliness, anxiety, depression, and negative affect. The relationships among loneliness,
anxiety, depression, and negative affect were examined, including whether loneliness was a
significant intervening variable. Various meditational tests converged showing that loneliness
was a significant mediator in the relationship between anxiety and depression. This effect was
found across children (Grades 2–6) and adolescent (Grades7–12) school-based youth. In the

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 10


Online public shamming on twitter

clinical sample, loneliness was found to be a significant mediator between anxiety and
depression, even after introducing negative affect based on the Tripartite Model. Results
supported loneliness as a significant risk factor in youths‘ lives that may result from anxiety
and place youth at risk for subsequent depression. Implications related to intervention and
prevention in school settings are also discussed.

Jun-Ming Xu, Kwang-Sung Jun, Xiaojin Zhu, Amy Bellmore,” Learning from
Bullying Traces in Social Media‖ Department of Computer Sciences,University of
Wisconsin-Madison,Madison, WI 53706, USA,Department of Educational
Psychology,University of Wisconsin-Madison,Madison, WI 53706, USA.

We introduce the social study of bullying to the NLP community. Bullying, in both
physical and cyber worlds (the latter known as cyberbullying), has been recognized as a serious
national health issue among adolescents. However, previous social studies of bullying are
handicapped by data scarcity, while the few computational studies narrowly restrict themselves
to cyberbullying which accounts for only a small fraction of all bullying episodes. Our main
contribution is to present evidence that social media, with appropriate natural language
processing techniques, can be a valuable and abundant data source for the study of bullying in
both worlds. We identify several key problems in using such data sources and formulate them
as NLP tasks, including text classification, role labeling, sentiment analysis, and topic
modeling. Since this is an introductory paper, we present baseline results on these tasks using
off-the-shelf NLP solutions, and encourage the NLP community to contribute better models in
the future.

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 11


Online public shamming on twitter

Chapter 3

SYSTEM ANALYSIS
System analysis is the technique for finding the best answer for the issue. Framework
study is the procedure by which find out about the realistic issues, characterize articles and
prerequisites and assesses the arrangements. It is the state of mind about the affiliation and the
issue it includes, an arrangement of advances that aids in tackling these issues. Attainability
study assumes an essential part in framework examination which gives the objective for
configuration and improvement.

3.1 Aim of the Project


The proposed solution to the issue concerned with fake news includes the use of a tool
that can identify and remove fake sites from the results provided to a user by a search engine
or a social media news feed. The tool can be downloaded by the user and, subsequently, be
appended to the browser or application used to receive news feeds. Once operational, the tool
will use various techniques including those related to the syntactic features of a link to
determine whether the same should be included as part of the search results.

3.2 Existing System:


➢ Based on the content (and not the specific term used), we divide the related work into
five categories- profanity, hate speech, cyberbullying, trolling and personal attacks.
➢ The effectiveness of list based profanity detection for Yahoo! Buzz comments.
Relatively low F1 score (harmonic mean of precision and recall) of this approach is
attributed to distortion of profane words with special characters (e.g., @ss) or spelling
mistakes and low coverage of list words.
➢ The first caveat was partly overcome by considering words as abusive whose edit
distance from a known abusive word equals the number of ―punctuation marks‖
present in the word.
➢ To identify hate speech targeting Jews from a data set consisting of Yahoo! comments
and known anti-Semitic web page contents.
➢ Collected tweets for two weeks and trained a classifier on typed dependency and hateful
terms as features.

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 12


Online public shamming on twitter

3.2.1 Drawbacks of Existing System

✓ Lack of efficient training data-set


✓ Social media are often very short and contain a lot of informal language and
misspellings cannot properly process.
✓ Major limitation of these approaches is that the learned feature space still relies on the
BoW assumption and may not be robust

3.3 Proposed System:


Our goal is to automatically classify tweets in the aforementioned six categories. The
main functional units involving automated classification of shaming tweets are shown. Both
labelled training set and test set of tweets for each of the categories go through the pre-
processing and feature extraction steps. The training set is used to train six support vector
machine (SVM) classifiers. The precision scores of the trained SVMs are next evaluated on the
test set. Based on these scores, the classifiers are arranged hierarchically. A new tweet, after
pre-processing and feature extraction, is fed to the trained classifiers and is labelled withthe
class of the first classifier that detects it to be positive. A tweet is deemed non-shame if allthe
classifiers label it as negative.

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 13


Online public shamming on twitter

Chapter 4

Requirements

4.1 Hardware Requirements


Processor Intel Core i5 or AMD FX 8 core series with clock speed of 2.4
GHz or above
RAM 2GB or above
Hard disk 120 GB or above
Input device Keyboard or mouse or compatible pointing devices
Display XGA (1024*768 pixels) or higher resolution monitor with 32 bit
color settings
Miscellaneous USB Interface, Power adapter, etc
Table 4.1: Hardware Requirements

4.2 Software Requirements


Operating System Windows
Programming Language – Core Java, Advanced Java, J2EE, MVC Framework
Backend
Programming language - Bootstrap Framework, HTML, CSS, JavaScript, Ajax, JQuery
Frontend
Development environment Eclipse Oxygen IDE
Application Server Apache Tomcat v7.0
Database MySQL
Table 4.2: Software Requirements

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 14


Online public shamming on twitter

CHAPTER 5

SYSTEM DESIGN
Outline is a creative procedure; a fine plan is the way to valuable framework. The
framework "Outline" is characterized as "The procedure of applying an assortment of methods
and standards for the guideline of characterizing a procedure or a framework in adequate
component to allow its physical acknowledgment". Diverse configuration elements are taken
after to extend the framework. The outline design depicts the elements of the framework, the
segments or components of the framework and their hope to end-clients.

5.1 Problem Statement

The project is concerned with identifying a solution that could be used to detect and filter out
sites containing fake news for purposes of helping users to avoid being lured by clickbaits. It
is imperative that such solutions are identified as they will prove to be useful to both readers
and tech companies involved in the issue.

5.2 Fundamental Design Concepts


An arrangement of principal plan ideas has developed over the point of reference
three decades. Despite the fact that the level of significance in every idea has blended
throughout the years, each has stood the test of time. Every one furnishes the product creator
with an establishment from which more confused outline strategies can be connected. The basic
outline ideas give the required system for "hitting the nail on the head". The basic outline ideas,
for example, build, refinement, particularity, programming plan, control chain of importance,
basic apportioning, information structure, programming methodology and data whipping are
connected in this anticipate to taking care of business according to the configuration.

5.3 System Architecture


Framework engineering is the hypothetical plan that characterizes the structure and
conduct of a framework. A design clarification is a legitimate depiction of a framework, sorted
out in a mode that backings examination about the auxiliary properties of the framework. It
characterizes the framework device or building squares and gives an arrangement from which
items can be obtained, and frameworks built up, that will work

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 15


Online public shamming on twitter

commonly to actualize the for the most part framework. The System design is uncovered
underneath.

Fig 5.3: System Architecture

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 16


Online public shamming on twitter

5.4 Data Flow Diagram

Fig 5.4: Data flow Diagram

5.5 Use Case Diagram

Fig 5.5: Use Case Diagram

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 17


Online public shamming on twitter

In this module, users can send a friend request to all the people and only those who
accept his/her friend request, the user can start chatting with him/her. In the same manner
other people can also send a friend request and it is up to the user to accept the friend request
or delete the friend request. If the friend request sent by the user is deleted then the user is not
able to chat with that person. Once the friend request is accepted among users then the users
can share images, post messages and can start chatting with their friends.

5.5.1 NLP gathers all the text messages from the database

Fig 5.5.1: Use Case Diagram for NLP

Fig 5.5.2: Use Case Diagram for Weka

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 18


Online public shamming on twitter

Weka is a collection of machine learning algorithms for data mining tasks. The
algorithms can either be applied directly to a dataset or called from your own Java code.
Weka contains tools for data pre-processing, classification, regression, clustering, association
rules, and visualization. It is also well-suited for developing new machine learning schemes.
Training dataset is used to train the machine, then for the test dataset that is the messages
entered by the user the data mining rules are applied and then finally the result is predicted
using J48 algorithm

5.6 Sequence Diagram

Fig 5.6: Sequence Diagram

In this module there is a datastore, NLP, Weka tool. First all the text messages entered
by the users are gathered from a database, and then the sentiment analysis is performed using
NLP, J48 algorithm is applied to predict the data and then data mining rules are applied for the
predicted data and then finally the result is predicted.

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 19


Online public shamming on twitter

Chapter 6
Implementation
Implementation phase plays a vital role in implementing the design requirements into the
reality. Final result can be found in the implementation phase. Therefore, successful new
system can be achieved in this key stage. Careful planning and controlling are required in this
implementation phase. Major decisions have to be taken care, before implementing any project.
They are as follows:
• Selecting the right programming language for the development of the project
• Selecting the right platform, that means windows operating system or linux operating
system or any other operating system
• Coding conventions have to be followed properly

6.1 Programming Language Selection


Java language is selected for coding this system. There are so many reasons to choose
java as the coding language. First of all, it is an ObjectOriented language. It has many features,
so that anyone can use them to develop a robust project or system. Mainly, java is a platform
independent language. Java has rich set of packages like Thread handling, Exceptionhandling,
Garbage collections, Utility classes like generating random numbers and so on. Java also
supports web development, in that java server pages plays an important role to runat the server
side. Servlets can also be used.

6.2 Platform Selection


This system is designed and built using windows operating system family. This system
runs well under windows XP as well as windows 7 operating system. But windows 7 is most
preferable, because of its attractive interfaces and advanced features. Windows 7 provides ease
of use and it is user-friendly.

6.3 Code Conventions


Coding conventions or standards have to be followed for better understanding and for
better maintainance. Coding standards define a format for each piece of code written in the
java language.
6.3.1 Naming Conventions

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 20


Online public shamming on twitter

Readability of programs can be increased by using proper naming conventions to each


class, variables, methods etc. Naming conventions for the packages, classes, interfaces,
methods, variables, constants are described as follows:

• Classes: Class names can be a single word or it can consist of more than one word. If
more than one word is used, then the first letter of each word is capitalized. The class
should be a noun. Class names should be meaningful and descriptive. As far as
possible, class names should be avoided using abbreviations. Any word length can be
defined for the class names.
Example: Class EncryptionLayer

• Interfaces: Interface name can be adjective and first letter of the interface name must
start with the uppercase letter.
Example: ActionListener, ActionEvent

• Packages: Package names should start with lowercase


Example: java.lang.system.class

• Methods: Method name should be a verb and starts with a lowercase. If the method
names consist of more than one word, then starting letter of each word should be
capitalized.
Example: getNextNode(), checkPerformance()

• Variables: Variable names should be meaningful and short. Purpose of the variable
names are indicated by using proper wordings.
Example: string sourceNode

• Constants: All letters in the constants should be in the uppercase and separated by
underscores
Example: MAX_PRIORITY, MIN_VALUE

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 21


Online public shamming on twitter

6.3.2 File Organizations


When the codebase becomes larger and larger, organizing the code becomes a matter.
It is a very important aspect, where multiple programmers may work in the project. Therefore,
here are some of the recommendations with respect to organizing the files:
• A file contains several sections
• Each section is separated by a blank line. A comment line is an optional one to
recognize each section
• There must be only one main program and it should be short and simple
• Source file should contain name of the class, version information, and copyright
information
• Import package statement should be declared
• Comment about the class or interface should be used
• Class or Interface statement should be defined
• Public class variables, Protected class variables, Private class variables are then
declared
• Later constructors or methods are implemented
• Related files should be kept in different folders, if there are many number of source
code files

6.3.3 Properties Declaration


Class demo {
private String username = ―‖;
public void setusername(String username){
this.username = username;
}
public String getusername(){
return username;}
}
Here the private variable username is accessed by using set and get properties.

6.4 Class Declarations


Following rules are used in declaring java classes:

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 22


Online public shamming on twitter

• Class declaration starts with class keyword and any user name can be given for the
name of the class. It may contain superclass and class may also be declared as final,
abstract or it may be public
• There should not be any space between name of the class and parenthesis. Argument
list is followed within the parenthesis
• Open flower brace ‗{‗ is present in the line of class declaration itself. Closing flower
brace ‗}‘ cab be written in the new line. Opening and closing braces ‗{‗ and ‗}‘ appear
in the same line, if there is a null statement
• In the gap between these two braces, constructors, field declarations, behaviour of
classes (implementation) and objects can be defined

Example: public void class display() {


// declarations, constructor, methods
}
Comments
Class demo {
Void demo(){
// The statement System.out.println used to display string ―Tested Result‖
System.out.println(―Tested Result‖);

}
Void add(){
/*
int z =x+y;
System.out.println(―Sum = ‖+z);
*/

}
}

Two types of comments are used in java as single line comment is // and block level
comment is /* */.

6.5 Difficulties Encountered and Strategies used to tackle


Many difficulties were faced during the implementation of two factor data security
mechanism. The following section briefs the difficulties and solutions employed to resolve
them.
• First difficulty was, how to provide security device for each individual user to

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 23


authenticate
Online themselves as authenticated
public shamming on twitteruser.

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 24


Online public shamming on twitter

Solution: KDC module was implemented to provide security device authentication to


each individual user.

• Second difficulty, double encryption was a major challenge


Solution: Identity Based Encryption Mechanism was used to overcome the above
challenge.
6.5.1 Integration of Components

Once sender uploads file by using identity of the receiver, the upload function calls
amazon S3_client to check login credential. S3_client uploads file by calling put Object
Request.

Once Receiver initiates file to download, the filed download function fetches file from
amazon bucket by calling get Object Request.
6.5.2 Synchronization between services

SHA encryption and decryption functions were used to encrypt sender files using 64bit
secret key with security device identity and successfully decrypt file with the help of receiver
security device identity and secret key.

6.6 Software Environment

Java Technology Java technology is both a programming language and a platform.

6.6.1 The Java Programming Language


The Java programming language is a high-level language that can be characterized by
all of the following buzzwords:
▪ Simple
▪ Architecture neutral
▪ Object oriented
▪ Portable
▪ Distributed
▪ High performance
▪ Interpreted
▪ Multithreaded
▪ Robust

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 25


Online public shamming on twitter

▪ Dynamic
▪ Secure
With most programming languages, you either compile or interpret a program so that
you can run it on your computer. The Java programming language is unusual in that a program
is both compiled and interpreted. With the compiler, first you translate a program into an
intermediate language called Java byte codes —the platform-independent codesinterpreted by
the interpreter on the Java platform. The interpreter parses and runs each Java byte code
instruction on the computer. Compilation happens just once; interpretation occurs each time
the program is executed. The following figure illustrates how this works.

Fig: 6.6 Java Interpreter

You can think of Java byte codes as the machine code instructions for the Java Virtual Machine
(Java VM). Every Java interpreter, whether it‘s a development tool or a Web browser that can
run applets, is an implementation of the Java VM. Java byte codes helpmake ―write once,
run anywhere‖ possible. You can compile your program into byte codes on any platform that
has a Java compiler. The byte codes can then be run on any implementation of the Java VM.
That means that as long as a computer has a Java VM, the same program written in the Java
programming language can run on Windows 2000, a Solaris workstatioor on an iMac.

Fig:6.6.1.1 java comiler

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 26


Online public shamming on twitter

6.6.2 The Java Platform


A platform is the hardware or software environment in which a program runs. We‘ve
already mentioned some of the most popular platforms like Windows 2000, Linux, Solaris, and
MacOS. Most platforms can be described as a combination of the operating system and
hardware. The Java platform differs from most other platforms in that it‘s a software-only
platform that runs on top of other hardware-based platforms.

The Java platform has two components:


The Java Virtual Machine (Java VM)
The Java Application Programming Interface (Java API)
You‘ve already been introduced to the Java VM. It‘s the base for the Java platform and
is ported onto various hardware-based platforms.

The Java API is a large collection of ready-made software components that provide
many useful capabilities, such as graphical user interface (GUI) widgets. The Java API is
grouped into libraries of related classes and interfaces; these libraries are known as packages.
The next section, What Can Java Technology Do? Highlights what functionality some of the
packages in the Java API provide.

The following figure depicts a program that‘s running on the Java platform. As the
figure shows, the Java API and the virtual machine insulate the program from the hardware.

Fig 6.6.2: Java Flatform

Native code is code that after you compile it, the compiled code runs on a specific
hardware platform. As a platform-independent environment, the Java platform can be a bit
slower than native code. However, smart compilers, well-tuned interpreters, and just-in-time
byte code compilers can bring performance close to that of native code without threatening
portability.

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 27


Online public shamming on twitter

What Can Java Technology Do?


The most common types of programs written in the Java programming language are
applets and applications. If you‘ve surfed the Web, you‘re probably already familiar with
applets. An applet is a program that adheres to certain conventions that allow it to run within
a Java-enabled browser.

However, the Java programming language is not just for writing cute, entertaining
applets for the Web. The general-purpose, high-level Java programming language is also a
powerful software platform. Using the generous API, you can write many types of programs.
An application is a standalone program that runs directly on the Java platform. A special
kind of application known as a server serves and supports clients on a network. Examples of
servers are Web servers, proxy servers, mail servers, and print servers. Another specialized
program is a servlet. A servlet can almost be thought of as an applet that runs on the server
side. Java Servlets are a popular choice for building interactive web applications, replacing the
use of CGI scripts. Servlets are similar to applets in that they are runtime extensions of
applications. Instead of working in browsers, though, servlets run within Java Web servers,
configuring or tailoring the server.
How does the API support all these kinds of programs? It does so with packages of
software components that provides a wide range of functionality. Every full
implementation of the Java platform gives you the following features:
• The essentials: Objects, strings, threads, numbers, input and output, data
structures, system properties, date and time, and so on.
• Applets: The set of conventions used by applets.
• Networking: URLs, TCP (Transmission Control Protocol), UDP (User Data
gram Protocol) sockets, and IP (Internet Protocol) addresses.
• Internationalization: Help for writing programs that can be localized for users
worldwide. Programs can automatically adapt to specific locales and be
displayed in the appropriate language.
• Security: Both low level and high level, including electronic signatures, public
and private key management, access control, and certificates.
• Software components: Known as JavaBeansTM, can plug into existing
component architectures.
• Object serialization: Allows lightweight persistence and communication via
Remote Method Invocation (RMI).

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 28


Online public shamming on twitter

• Java Database Connectivity (JDBCTM): Provides uniform access to a wide


range of relational databases.
The Java platform also has APIs for 2D and 3D graphics, accessibility, servers, collaboration,
telephony, speech, animation, and more. The following figure depicts what is included in the
Java 2 SDK.

Fig 6.6.3: Java IDE

How Will Java Technology Change My Life?

We can‘t promise you fame, fortune, or even a job if you learn the Java programming
language. Still, it is likely to make your programs better and requires less effort than other
languages. We believe that Java technology will help you do the following:

• Get started quickly: Although the Java programming language is a powerful


object-oriented language, it‘s easy to learn, especially for programmers already
familiar with C or C++.
• Write less code: Comparisons of program metrics (class counts, method counts,
and so on) suggest that a program written in the Java programming language
can be four times smaller than the same program in C++.
• Write better code: The Java programming language encourages good coding
practices, and its garbage collection helps you avoid memory leaks. Its object
orientation, its JavaBeans component architecture, and its wide-ranging, easily
extendible API let you reuse other people‘s tested code and introduce fewer
bugs.

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 29


Online public shamming on twitter

• Develop programs more quickly: Your development time may be as much as


twice as fast versus writing the same program in C++. Why? You write fewer
lines of code and it is a simpler programming language than C++.
• Avoid platform dependencies with 100% Pure Java: You can keep your
program portable by avoiding the use of libraries written in other languages.
The 100% Pure JavaTM Product Certification Program has a repository historical
process manuals, white papers, brochures, and similar materialsonline.
• Write once, run anywhere: Because 100% Pure Java programs are compiled
into machine-independent byte codes, they run consistently on any Java
platform.
• Distribute software more easily: You can upgrade applets easily from a central
server. Applets take advantage of the feature of allowing new classesto be
loaded ―on the fly,‖ without recompiling the entire program.
6.6.3 ODBC
Microsoft Open Database Connectivity (ODBC) is a standard programming interface
for application developers and database systems providers. Before ODBC became a de facto
standard for Windows programs to interface with database systems, programmers had to use
proprietary languages for each database they wanted to connect to. Now, ODBC has made the
choice of the database system almost irrelevant from a coding perspective, which is as it should
be. Application developers have much more important things to worry about than the syntax
that is needed to port their program from one database to another when business needs suddenly
change.
Through the ODBC Administrator in Control Panel, you can specify the particular
database that is associated with a data source that an ODBC application program is written to
use. Think of an ODBC data source as a door with a name on it. Each door will lead you to a
particular database. For example, the data source named Sales Figures might be a SQL Server
database, whereas the Accounts Payable data source could refer to an Access database. The
physical database referred to by a data source can reside anywhere on the LAN.
The ODBC system files are not installed on your system by Windows 95. Rather, they
are installed when you setup a separate database application, such as SQL Server Client or
Visual Basic 4.0. When the ODBC icon is installed in Control Panel, it uses a file called
ODBCINST.DLL.

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 30


Online public shamming on twitter

From a programming perspective, the beauty of ODBC is that the application can be
written to use the same set of function calls to interface with any data source, regardless of
the database vendor. The source code of the application doesn‘t change whether it talks to
Oracle or SQL Server. We only mention these two as an example. There are ODBC drivers
available for several dozen popular database systems. Even Excel spreadsheets and plain text
files can be turned into data sources. The operating system uses the Registry information
written by ODBC Administrator to determine which low-level ODBC drivers are needed to
talk to the data source (such as the interface to Oracle or SQL Server). The loading of the
ODBC drivers is transparent to the ODBC application program. In a client/server environment,
the ODBC API even handles many of the network issues for the application programmer.

The advantages of this scheme are so numerous that you are probably thinking there
must be some catch. The only disadvantage of ODBC is that it isn‘t as efficient as talking
directly to the native database interface. ODBC has had many detractors make the charge that
it is too slow. Microsoft has always claimed that the critical factor in performance is the quality
of the driver software that is used. In our humble opinion, this is true. The availability of good
ODBC drivers has improved a great deal recently. And anyway, the criticism about
performance is somewhat analogous to those who said that compilers would never match the
speed of pure assembly language. Maybe not, but the compiler (or ODBC) gives you the
opportunity to write cleaner programs, which means you finish sooner. Meanwhile, computers
get faster every year.
6.6.4 JDBC
In an effort to set an independent database standard API for Java; Sun Microsystems
developed Java Database Connectivity, or JDBC. JDBC offers a generic SQL database access
mechanism that provides a consistent interface to a variety of RDBMSs. This consistent
interface is achieved through the use of ―plug-in‖ database connectivity modules, or drivers.
If a database vendor wishes to have JDBC support, he or she must provide the driver for each
platform that the database and Java run on.
To gain a wider acceptance of JDBC, Sun based JDBC‘s framework on ODBC. As you
discovered earlier in this chapter, ODBC has widespread support on a variety of platforms.
Basing JDBC on ODBC will allow vendors to bring JDBC drivers to market muchfaster than
developing a completely new connectivity solution.

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 31


Online public shamming on twitter

JDBC was announced in March of 1996. It was released for a 90 day public review that
ended June 8, 1996. Because of user input, the final JDBC v1.0 specification was released
soon after.
The remainder of this section will cover enough information about JDBC for you to
know what it is about and how to use it effectively. This is by no means a complete overview
of JDBC. That would fill an entire book.
6.6.5 JDBC Goals
Few software packages are designed without goals in mind. JDBC is one that, because of
its many goals, drove the development of the API. These goals, in conjunction with early
reviewer feedback, have finalized the JDBC class library into a solid framework for building
database applications in Java.
The goals that were set for JDBC are important. They will give you some insight as to why
certain classes and functionalities behave the way they do. The eight design goals for JDBC
are as follows:
SQL Level API
The designers felt that their main goal was to define a SQL interface for Java. Although
not the lowest database interface level possible, it is at a low enough level for higher-level
tools and APIs to be created. Conversely, it is at a high enough level for application
programmers to use it confidently. Attaining this goal allows for future tool vendors to
―generate‖ JDBC code and to hide many of JDBC‘s complexities from the end user.

1. SQL Conformance
SQL syntax varies as you move from database vendor to database vendor. In an effort
to support a wide variety of vendors, JDBC will allow any query statement to be passed
through it to the underlying database driver. This allows the connectivity module to handle
non-standard functionality in a manner that is suitable for its users.

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 32


Online public shamming on twitter

Chapter 7
Software Testing
In software development life cycle, software testing is one of the significant phases. In
software testing phase, verification and validation of the project under development are
performed with respect to requirements mentioned [28]. In this project, Unit testing of main
modules, Integration testing of different modules and complete System testing functionality
along with the GUI testing is performed.

7.1 Test Environment


This project was tested on the following platforms

Software Platform: Software used for testing is given below


Operating System : Windows XP or Windows 7
Tool / IDE : Eclipse Juno 4.2 with Tomcat Server
Visual Interface : Java Server Pages, Servlets
Database : MySQL 5.0

Hardware Platform: Hardware used for testing is given below


Processor : Intel Pentium 4 or more
Memory : 1GB RAM or more
Harddisk : 40GB or more with 5400 rpm

7.2 Unit Testing of Main modules


Individual modules or functions are tested under Unit testing. This method of testing
is also called as White Box Testing. Independently, different modules are tested here and
their functionality is checked. Each of these modules of Unit testing is explained below.

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 33


Online public shamming on twitter

The Table 7.2.1 shows Unit Test Case 1(UTC-1) to verify whether user interface accepts
registration details of sender and receiver.
Sl No of Test Case UTC-1
Name of Test Case Test case to verify Registration Page
Feature being Tested Registration details
Description User should provide details like Name,
Email Id, Mobile Number
Sample Input Email Id, Mobile Number, Username
Password
Expected Output Registration Successful message should be
displayed
Actual Output As expected
Remarks Passed
Table 7.2.1 Unit Test Case 1 for Registration Page

The Table 7.2.2 shows Unit Test Case 2(UTC-2)


Sl No of Test Case UTC-2
Name of Test Case Test Message Post
Feature being Tested Messages have been posted
Description Store in database
Sample Input text
Expected Output Message posted
Actual Output Device is ready to use
Remarks Passed
Table 7.2.2 Unit Test Case 2 for Message Post

The Table 7.2.3 shows Unit Test Case 3(UTC-3)


Sl No of Test Case UTC-3
Name of Test Case Check Posted Message
Feature being Tested Sentiment analysis
Description Posted message sentiment should be
tested
Sample Input messages
Expected Output Predict sentiment
Actual Output Messages
Remarks Passed
Table 7.2.3 Unit Test Case 3 for Sentiment Analysis

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 34


Online public shamming on twitter

The Table 7.2.4 shows Unit Test Case 4(UTC-4)


Sl No of Test Case UTC-4
Name of Test Case Data Prediction using AI
Feature being Tested Result prediction
Description Posted messages should be tested to
predict bullying words
Sample Input Posted Messages
Expected Output Filter Words
Table 7.2.4 Unit Test Case 4 for Data Prediction

7.3 Integration Testing of Modules


Two or more modules are combined together to check for Integration testing.
Integration Testing purpose is to check whether the integrated modules are performing as
expected. Integrated modules testing is given below.

Table 7.3.1 Integration Test Case 1 for Sender Registration and Post Messages
Sl No of Test Case ITC-1
Name of Test Case Test case to register Sender and post
messages
Feature being Tested Registration and post message
Description Sender should register first to post
messages to other user
Sample Input Text messages
Expected Output Message successfully sent
Table 7.3.1 Integration Test Case 1 for Sender Registration and Post Messages

Table 7.3.2 shows Integration Test Case 2 for sentiment analysis and prediction
Sl No of Test Case ITC-2
Name of Test Case Test case to analyse sentiment and
predict bullying words
Feature being Tested Filter bullying words
Description After sentiment analysis we should apply
ai to predict bullying words
Sample Input Posted Messages
Expected Output Bullying words Predicted
Actual Output As expected
Remarks Passed
Table 7.3.2 Integration Test Case 2 for sentiment analysis and prediction

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 35


Online public shamming on twitter

7.4 System Testing


Sl No of Test Case ITC-1
Name of Test Case Test case to analyse overall performance
of the application
Feature being Tested System performance
Description Should check sentiment predicted
properly and bad word filtered
Sample Input Posted Messages
Expected Output Bullying words Predicted
Actual Output As expected
Remarks Passed
Table 7.4: System Test Case

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 36


Online public shamming on twitter

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 37


Online public shamming on twitter

DEPT OF ISE MYCEM,MYSURU 2020-21 Page 38


Summary
This chapter presents software testing methods and different types of test cases, used to
test the different modules of the project. Various unit test cases are described for each ofthe
modules. Integration testing is performed, when unit test cases or unit modules are taken
together. Finally, system test case is conducted to verify the functioning of overall system.
Conclusion
In this work, we proposed a potential solution for countering the menace of online
public shaming in Twitter by categorizing shaming comments in six types, choosing
appropriate features and designing a set of classifiers to detect it. Instead of treating tweets as
standalone utterances, we studied them to be part of certain shaming events. In doing so, we
observe that seemingly dissimilar events share a lot of interesting properties, such as, a Twitter
user’s propensity to participate in shaming, retweet probabilities of the shaming typesand how
these events unfold in time.

With the growth of online social networks and proportional rise in public shaming
events, voices against callousness on part of the site owners are growing stronger.
Categorization of shaming comments as presented in this work has the potential for a user to
choose to allow certain types of shaming comments (e.g., comments which are sarcastic in
nature) giving her an opportunity for rebuttal, and block others (e.g., comments which attack
her ethnicity) according to individual choices. Freedom to choose what type of utterances one
would not like to see in his/her feed beforehand is way better than flagging a deluge of
comments on the event of shaming. This also liberates moderators from the moral dilemma of
deciding a threshold that separates acceptable online behaviour from unacceptable ones, thus
relieving themselves to a certain extent from the responsibility of fixing what is best for another
person.
REFERENCES

[1] J. Ronson, So You’ve Been Publicly Shamed. London, U.K.: Picador, 2015.

[2] E. Spertus, “Smokey: Automatic recognition of hostile messages,” in Proc. AAAI/IAAI,


1997, pp. 1058–1065.

[3] S. Sood, J. Antin, and E. Churchill, “Profanity use in online communities,” in Proc.
SIGCHI Conf. Hum. Factors Comput. Syst., 2012, pp. 1481–1490.

[4] S. Rojas-Galeano, “On obstructing obscenity obfuscation,” ACM Trans. Web, vol. 11, no.
2, p. 12, 2017.

[5] E. Wulczyn, N. Thain, and L. Dixon, “Ex machina: Personal attacks seen at scale,” in Proc.
26th Int. Conf. World Wide Web, 2017, pp. 1391–1399.

[6] A. Schmidt and M. Wiegand, “A survey on hate speech detection using natural language
processing,” in Proc. 5th Int. Workshop Natural Lang. Process. Social Media Assoc. Comput.
Linguistics, Valencia, Spain, 2017, pp. 1–10.

[7] Hate-Speech. Oxford Dictionaries. Accessed: Aug. 30, 2017. [Online]. Available:
https://en.oxforddictionaries.com/definition/hate_speech

[8] W. Warner and J. Hirschberg, “Detecting hate speech on the world wide Web,” in Proc.
2nd Workshop Lang. Social Media, 2012, pp. 19–26.

[9] I. Kwok and Y. Wang, “Locate the hate: Detecting tweets against blacks,” in Proc. AAAI,
2013, pp. 1621–1622.

[10] P. Burnap and M. L. Williams, “Cyber hate speech on Twitter: An application of machine
classification and statistical modeling for policy and decision making,” Policy Internet, vol. 7,
no. 2, pp. 223–242, 2015.

[11] Lee-Rigby. Lee Rigby Murder: Map and Timeline. Accessed: Dec. 7, 2017. [Online].
Available: https://http://www.bbc.com/news/uk25298580

[12] Z. Waseem and D. Hovy, “Hateful symbols or hateful people?Predictive features for
hate speech detection on Twitter,” in Proc. SRW HLTNAACL, 2016, pp. 88–93.

[13] P.Badjatiya, S. Gupta, M. Gupta, and V. Varma, “Deep learning for hate speech detection
in tweets,” in Proc. 26th Int. Conf. World Wide Web Companion, 2017, pp. 759– 760.

[14] D. Olweus, S. Limber, and S. Mihalic, Blueprints for Violence Prevention, Book Nine:
Bullying Prevention Program. Boulder, CO, USA: Center for the Study and Prevention of
Violence, 1999.
[15] P. K. Smith, H. Cowie, R. F. Olafsson, and A. P. D. Liefooghe, “Definitions of bullying:
A comparison of terms used, and age and gender differences, in a fourteen–country
international comparison,” Child Develop., vol. 73, no. 4, pp. 1119–1133, 2002.

[16] R. S. Griffin and A. M. Gross, “Childhood bullying: Current empirical findings and future
directions for research,” Aggression Violent Behav., vol. 9, no. 4, pp. 379–400, 2004.
[17] H. Vandebosch and K. Van Cleemput, “Defining cyberbullying: A qualitative research
into the perceptions of youngsters,” CyberPsychol. Behav., vol. 11, no. 4, pp. 499–503, 2008.

[18] H. Vandebosch and K. Van Cleemput, “Cyberbullying among youngsters: Profiles of


bullies and victims,” New Media Soc., vol. 11, no. 8, pp. 1349–1371, 2009.

[19] K. Dinakar, B. Jones, C. Havasi, H. Lieberman, and R. Picard, “Common sense


reasoning for detection, prevention, and mitigation of cyberbullying,” ACM Trans. Interact.
Intell. Syst., vol. 2, no. 3, p. 18, 2012.

[20] P. Singh, T. Lin, E. T. Mueller, G. Lim, T. Perkins, and W. L. Zhu, “Open mind common
sense: Knowledge acquisition from the general public,” in Proc. OTM Confederated Int. Conf.
Move Meaningful Internet Syst. Berlin, Germany: Springer, 2002, pp. 1223–1237.

[21] H. Hosseinmardi, S. A. Mattson, R. I. Rafiq, R. Han, Q. Lv, and S. Mishra. (2015).


“Detection of cyberbullying incidents on the instagram social network.” [Online]. Available:
https://arxiv.org/abs/ 1503.03909

[22] J. Cheng, C. Danescu-Niculescu-Mizil, and J. Leskovec, “Antisocial behavior in online


discussion communities,” in Proc. ICWSM, 2015, pp. 61–70.

[23] J. Cheng, C. Danescu-Niculescu-Mizil, J. Leskovec, and M. Bernstein, “Anyone can


become a troll,” Amer. Sci., vol. 105, no. 3, p. 152, 2017.

[24] P. Tsantarliotis, E. Pitoura, and P. Tsaparas, “Defining and predicting troll vulnerability
in online social media,” Social Netw. Anal. Mining, vol. 7, no. 1, p. 26, 2017.

[25] S. O. Sood, E. F. Churchill, and J. Antin, “Automatic identification of personal insults on


social news sites,” J. Assoc. Inf. Sci. Technol., vol. 63, no. 2, pp. 270–285, 2012.

[26] A. Chakraborty, J. Messias, F. Benevenuto, S. Ghosh, N. Ganguly, and K. Gummadi,


“Who makes trends? Understanding demographic biases in crowdsourced recommendations,”
in Proc. 11th Int. AAAI Conf.Web Social Media, 2017, pp. 22–31.

[27]Face++CognitiveServices.Accessed:Feb.20,2018.[Online].Available:
https://www.faceplusplus.com/

[28] A. Rajadesingan, R. Zafarani, and H. Liu, “Sarcasm detection on Twitter: A behavioral


modeling approach,” in Proc. 8th ACM Int. Conf. Web Search Data Mining, 2015, pp. 97–
106.
[29] R. Basak, N. Ganguly, S. Sural, and S. K. Ghosh, “Look before you shame: A study on
shaming activities on Twitter,” in Proc. 25th Int. Conf. Companion World Wide Web, 2016,
pp. 11–12.

[30] C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky, “The


Stanford CoreNLP natural language processing toolkit,” in Proc. Assoc. Comput. Linguistics
(ACL) Syst. Demonstrations, 2014, pp. 55–60. [Online]. Available:
http://www.aclweb.org/anthology/P/P14/P14-5010

[31] M. Hu and B. Liu, “Mining and summarizing customer reviews,” in Proc. 10th ACM
SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2004, pp. 168–177.

[32] M. P. Marcus and M. A. Marcinkiewicz, and B. Santorini, “Building a large annotated


corpus of English: The Penn treebank,” Comput. Linguistics, vol. 19, no. 2, pp. 313–330, 1993.

[33] D. Bamman and N. A. Smith, “Contextualized sarcasm detection on Twitter,” in Proc.


ICWSM, 2015, pp. 574–577.

[34] O. Owoputi, B. O’Connor, C. Dyer, K. Gimpel, and N. Schneider, “Part-of-speech


tagging for Twitter: Word clusters and other advances,” Ph.D.dissertation, School Comput.
Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA, 2012.

[35] Brown-Clusters. Twitter Word Clusters. Accessed: Jul. 2, 2017. [Online]. Available:
http://www.cs.cmu.edu/~ark/TweetNLP/

[36] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20, no. 3, pp.
273–297, 1995.

[37] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM
Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 27:1–27:27, 2011.

[38] E. Colleoni, A. Rozza, and A. Arvidsson, “Echo chamber or public sphere? Predicting
political orientation and measuring political homophily in Twitter using big data,” J. Commun.,
vol. 64, no. 2, pp. 317–332, 2014.

You might also like