Professional Documents
Culture Documents
Report of OPS On Twitter
Report of OPS On Twitter
REPORT
On
“ONLINE PUBLIC SHAMMING ON TWITTER: DETECTION,
ANALYSIS, AND MITIGATION”
BACHELOR OF ENGINEERING
In
INFORMATION SCIENCE & ENGINEERING
By
POOJA S K 4MO17IS021
SHAMBHAVI H 4MO17IS024
ARSHAD PASHA 4MO17IS006
CERTIFICATE
Certified that the project work entitled “ONLINE PUBLIC SHAMMING ON TWITEER:
ANALYSIS, DETECTION AND MITIGATION” is a bonafide work carried out by Pooja S
K(4MO17IS021), Shambhavi H(4MO17IS024), and Arsahad pasha(4MO17IS006),(in
partial fulfillment of VIII semester course of Bachelor of Engineering in Information Science
and Engineering of the Visveshvaraya Technological University, Belgaum during the year
2020-21.It is certified that all corrections/suggestions indicated have been incorporated in the
report. The project report has been approved as it satisfies the academic requirements with
respect to the Project Work prescribed for Bachelor of Engineering Degree.
MyCEM, Mysuru
1.
2.
ACKNOWLEDGEMENT
My hearty thanks are also due towards our JAYRAM C V(HOD) whose timely
support and suggestion went a long way in the completion of the project.
I would like to thank my guide. Shalini M S Asst prof for his support and guidance
in completion of this project.
I would like to thank all the Teaching and Non-Teaching staff who is directly
or indirectly responsible for carrying out this project successfully.
I extend a very hearty thanks to my parents and Friends for all the moral
support they provide during the preparation for the project.
SHAMBHAVI H (4MO17IS024)
ARSHAD PASHA(4MO17IS006)
ABSTRACT
Public shaming in online social networks and related online public forums like Twitter
has been increasing in recent years. These events are known to have devastating impact on the
victim’s social, political and financial life. Notwithstanding its known ill effects, little has been
done in popular online social media to remedy this, often by the excuseof large volume and
diversity of such comments and therefore unfeasible number of human moderators required to
achieve the task. In this paper, we automate the task of publicshaming detection in Twitter
from the perspective of victims and explore primarily two aspects, namely, events and
shammers. Shaming tweets are categorized into six types- abusive, comparison, passing
judgment, religious/ethnic, sarcasm/joke and whataboutery and each tweet is classified into one
of these types or as non-shaming. It is observed that out of allthe participating users who post
comments in a particular shaming event, majority of themare likely to shame the victim.
Interestingly, it is also the shamers whose follower counts increase faster than that of the non-
shamers in Twitter. Finally, based on categorization and classification of shaming tweets, an
web application called BlockShame has been designed and deployed for on-the-fly
muting/blocking of shamers attacking a victim on the Twitter.
INDEX
Sl.no Chapter Names Page.no
1 Introduction 1
3 System Analysis 12
3.1 Aim of the Project 12
3.2 Existing System 12
3.3 Proposed System 13
4 Requirements 14
4.1 Hardware Requirements 14
4.2 Software Requirements 14
5 System Design 15
5.1 Problem Statement 15
5.2 Fundamental Design Concepts 15
5.3 System Architecture 16
5.4 Data Flow Diagram 17
5.5 Use Case Diagram 17
5.6 Sequence Diagram 19
6 Implementation 20
6.1 Programming Language Selection 20
6.2 Platform Selection 20
6.3 Code Conventions 21
6.4 Class Declarations 23
6.5 Difficulties Encountered and Strategies used to tackle 24
6.6 Software Environment 24
7 Software Testing 33
7.1 Test Environment 33
7.2 Unit Testing of Main modules 33
7.3 Integration Testing of Modules 35
7.4 System Testing 36
List of Figures
Sl.no Chapter Names Page.no
1 Introduction 1
2 Literature Survey 9
3 System Analysis 12
4 Requirements 14
4.1 Hardware Requirements 14
4.2 Software Requirements 14
5 System Design 15
Fig 5.3: System Architecture 16
Fig 5.4: Data flow Diagram 17
Fig 5.5: Use Case Diagram 17
Fig 5.5.1: Use Case Diagram for NLP 18
Fig 5.5.2: Use Case Diagram for Weka 18
Fig 5.6: Sequence Diagram 19
6 Implementation 20
Fig 6.6: Java Interpreter 25
Fig 6.6.1: Java Compiler 26
Fig 6.6.2: Java Flatform 27
Fig 6.6.3: Java IDE 29
7 Software Testing 33
Table 7.2.1 Unit Test Case 1 for Registration Page 34
Table 7.2.2 Unit Test Case 2 for Message Post 34
Table 7.2.3 Unit Test Case 3 for Sentiment Analysis 34
Table 7.2.4 Unit Test Case 4 for Data Prediction 35
Table 7.3.1 Integration Test Case 1 for Sender Registration and Post 35
Messages
Table 7.3.2 Integration Test Case 2 for sentiment analysis and 35
prediction
Table 7.4: System Test Case 36
Online public shamming on twitter
CHAPTER 1
INTRODUCTION
ONLINE SOCIAL networks (OSNs) are frequently flooded with scathing remarks
against individuals or organizations on their perceived wrongdoing. When some of these
remarks pertain to objective fact about the event, a sizable proportion attempts to malign the
subject by passing quick judgments based on false or partially true facts. Limited scope of fact
check ability coupled with the virulent nature of OSNs often translates into ignominy or
financial loss or both for the victim.
Negative discourse in the form of hate speech, bullying, profanity, flaming, trolling,
etc., in OSNs is well studied in the literature. On the other hand, public shaming, which is
condemnation of someone who is in violation of accepted social norms to arouse feeling of
guilt in him or her, has not attracted much attention from a computational perspective.
Nevertheless, these events are constantly being on the rise for some years. Public shaming
events have far reaching impact on virtually every aspect of victim‘s life. Such events have
certain distinctive characteristics that set them apart from other similar phenomena- (a) a
definite single target or victim (b) an action committed by the victim perceived to be wrong
(c) a cascade of condemnation from the society. In public shaming, a shamer is seldom
repetitive as opposed to bullying. Hate speech and profanity are sometimes part of a shaming
event but there are nuanced forms of shaming such as sarcasm and jokes, comparison of the
victim with some other persons, etc., which may not contain censored content explicitly.
The enormous volume of comments which is often used to shame an almost unknown
victim speaks of the viral nature of such events. For example, when Justine Sacco, a public
relations person for American Internet Company tweeted ―Going to Africa. Hope I don‘t get
AIDS. Just kidding. I‘m white!‖, she had just 170 followers. Soon, a barrage of criticisms
started pouring in, and the incident became one of the most talked about topics on Twitter, and
the Internet in general, within hours. She lost her job even before her plane landed in South
Africa.
What is common for a diverse set of shaming events we have studied is that the victims
are subjected to punishments disproportionate to the level of crime they have apparently
committed. In Table 1, we have listed the victim, year in which the event took place, action
that triggered public shaming along with the triggering medium, and its
immediate consequences for each studied event. ‗Trigger‘ is the action or words spoken by the
‗Victim‘ which initiated public shaming. ‗Medium of triggering‘ is the firstcommunication
media through which general public became aware of the ‗Trigger‘. The consequences for the
victim, during or shortly after the event, are listed in ‗Immediate consequences‘. Henceforth,
the two letter abbreviations of the victim‘s name will be used to refer to the respective shaming
event.
The enormous volume of comments which is often used to shame an almost unknown
victim speaks of the viral nature of such events. For example, when Justine Sacco, a public
relations person for American Internet Company tweeted ―Going to Africa. Hope I don‘t get
AIDS. Just kidding. I‘m white!‖ she had just 170 followers. Soon, a barrage of criticisms started
pouring in, and the incident became one of the most talked about topics on Twitter and the
Internet, in general, within hours. She lost her job even before her plane landed in South Africa.
Jon Ronson‘s ―So You‘ve Been Publicly Shamed‖ [1] presents an account of several online
public shaming victims. What is common for a diverse set of shaming events we have studied
is that the victims are subjected to punishments disproportionate to the level of crime they have
apparently committed. In Table I, we have listed the victim, year in which the event took place,
action that triggered public shaming along with the triggering medium, and its immediate
consequences for each studied event. ―Trigger‖ is the action or words spoken by the
―Victim‖ which initiated public shaming. ―Medium of triggering‖ is the first communication
media through which general public became aware of the ―Trigger.‖ The consequences for
the victim, during or shortly after the event, are listed in ―Immediate consequences.‖
Henceforth, the two-letter abbreviations of the victim‘s name will be used to refer to the
respective shaming event.
Firstly, labeling data is labor intensive and time consuming. Secondly, cyberbullying
is hard to describe and judge from a third view due to its intrinsic ambiguities. Thirdly, due to
protection of Internet users and privacy issues, only a small portion of messages are left on
the Internet, and most bullying posts are deleted. As a result, the trained classifier may not
generalize well on testing messages that contain nonactivated but discriminative features. The
goal of this present study is to develop methods that can learn robust and discriminative
representations to tackle the above problems in cyberbullying detection. Some approaches have
been proposed to tackle these problems by incorporating expert knowledge into feature
learning. Yin et.al proposed to combine BoW features, sentiment features and contextual
features to train a support vector machine for online harassment detection. Dinakar et.al utilized
label specific features to extend the general features, where the label specific features are
learned by Linear Discriminative Analysis. In addition, common sense knowledge was also
applied. Nahar et.al presented a weighted TF-IDF scheme via scaling bullying-like features by
a factor of two. Besides content-based information, Maral et.al proposed to apply users‘
information, such as gender and history messages, and context information as extra features.
But a major limitation of these approaches is that the learned feature space still relies on the
BoW assumption and may not be robust. In addition, the performance of these approaches rely
on the quality of hand-crafted features, which require extensive domain knowledge. In this
paper, we investigate one deep learning method named stacked denoising autoencoder (SDA).
SDA stacks several denoising autoencoder and concatenates the outputof each layer as the
learned representation.
Each denoising autoencoder in SDA is trained to recover the input data from a corrupted
Version of it. The input is corrupted by randomly setting some of the input to zero, which is
Called dropout noise. This denoising process helps the auto encoders to learn robust
representation. In addition, each auto encoder layer is intended to learn an increasingly abstract
representation of the input.
In this paper, we develop a new text representation model based on a variant of SDA:
marginalized stacked denoising auto encoders (mS-DA), which adopts linear instead of
nonlinear projection to accelerate training and marginalizes infinite noise distribution in order
to learn more robust representations. We utilize semantic information to expand mSDA and
develop Semantic-enhanced Marginalized Stacked Denoising Autoencoders (smSDA). The
semantic information consists of bullying words. An automatic extraction of bullying words
based on word embeddings is proposed so that the involved human labor can be reduced.
During training of smSDA, we attempt to reconstruct bullying features from other
normal words by discovering the latent structure, i.e. correlation,between bullying and normal
words. The intuition behind this idea is that some bullying messages donot contain bullying
words. The correlation information discovered by smSDA helps to reconstruct bullying
features from normal words, and this in turn facilitates detection of bullying messages without
containing bullying words. For example, there is a strong correlation between bullying word
fuck and normal word off since they often occur together. If bullying messages do not contain
such obvious bullying features, such as fuck is often misspelled as fck, the correlation may help
to reconstruct the bullying features from normal ones so that the
bullying message can be detected. It should be noted that introducing dropout noise has the
effects of enlarging the size of the dataset, including training data size, which helps alleviate
the data sparsity problem. In addition, L1 regularization of the projection matrix is added to the
objective function of each autoencoder layer in our model to enforce the sparstiy of projection
matrix, and this in turn facilitates the discovery of the most relevant terms for reconstructing
bullying terms. The main contributions of our work can be summarized as follows:
Our proposed Semantic-enhanced Marginalized Stacked Denoising Autoencoder is able
to learn robust features from BoW representation in an efficient and effective way. These robust
features are learned by reconstructing original input from corrupted(i.e.,missing) ones. The new
feature space can improve the performance of cyberbullying detection even with a small
labeled training corpus.
* Semantic information is incorporated into the reconstruction process via the designing
of semantic dropout noises and imposing sparsity constraints on mapping matrix. In our
framework,high-quality semantic information, i.e., bullying words, can be extracted
automatically through word embeddings. Finally, these specialized modifications make
the new feature space more discriminative and this in turn facilitates bullying detection.
* Comprehensive experiments on real-data sets have verified the performance of our
proposed model
The field of study that focuses on the interactions between human language and
computers is called Natural Language Processing, or NLP for short. It sits at the intersection
of computer science, artificial intelligence, and computational linguistics (Wikipedia).
―Natural Language Processing is a field that covers computer understanding and ma-
nipulation of human language, and it‘s ripe with possibilities for newsgathering,‖ Anthony
Pesce said in Natural Language Processing in the kitchen. ―You usually hear about it in the
context of analyzing large pools of legislation or other document sets, attempting to discover
patterns or root out corruption.
NLP algorithms are typically based on machine learning algorithms. Instead of hand-
coding large sets of rules, NLP can rely on machine learning to automatically learn these rules
by analysing a set of examples (i.e. a large corpus, like a book, down to a collection of
sentences), and making a statical inference. In general, the more data analyzed, the more
accurate the model will be.
• Summarize blocks of text using Summarizer to extract the most important and
central ideas while ignoring irrelevant information.
• Create a chat bot using ParseyMcParseface, a language parsing deep learning model
made by Google that uses Point-of-Speech tagging.
• Automatically generate keyword tags from content using AutoTag, which leverages
LDA, a technique that discovers topics contained within a body of text.
• Identify the type of entity extracted, such as it being a person, place, or organization
using Named Entity Recognition.
• Use Sentiment Analysis to identify the sentiment of a string of text, from very
negative to neutral to very positive.
• Reduce words to their root, or stem, using Porter Stemmer, or break up text into
tokens using Tokenizer.
These libraries provide the algorithmic building blocks of NLP in real-world applications.
Algorithmia provides a free API endpoint for many of these algorithms, without ever having
to setup or provision servers and infrastructure.
• Natural Language Toolkit (NLTK): a Python library that provides modules for
processing text, classifying, tokenizing, stemming, tagging, parsing, and more.
• Standford NLP : A suite of NLP tools that provide part-of-speech tagging, the named
entity recognizer, coreference resolution system, sentiment analysis, and more.
• MALLET: A Java package that provides Latent Dirichlet Allocation, document
classification, clustering, topic modeling, information extraction, and more.
Sentiment analysis (also known as opinion mining) refers to the use of natural
language processing, text analysis and computational linguistics to identify and extract
subjective information in source materials. Sentiment analysis is widely applied to reviews and
social media for a variety of applications, ranging from marketing to customer service.
A basic task in sentiment analysis is classifying the polarity of a given text at the
document, sentence, or feature/aspect level—whether the expressed opinion in a document, a
sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, "beyond
polarity" sentiment classification looks, for instance, at emotional states such as "angry",
"sad", and "happy".
1.4 Types
Early work in that area includes Turney and Pang who applied different methods for
detecting the polarity of product reviews and movie reviews respectively. This work is at the
document level. One can also classify a document's polarity on a multi-way scale, which was
attempted by Pang and Snyder among others: Pang and Lee expanded the basic task of
classifying a movie review as either positive or negative to predict star ratings on either a 3 or
a 4 star scale, while Snyder performed an in-depth analysis of restaurant reviews, predicting
ratings for various aspects of the given restaurant, such as the food and atmosphere (on a five-
star scale). Even though in most statistical classification methods, the neutral class is ignored
under the assumption that neutral texts lie near the boundary of the binary classifier, several
researchers suggest that, as in every polarity problem, three categories must be identified.
Moreover, it can be proven that specific classifiers such as the Max Entropy and the
SVMs can benefit from the introduction of a neutral class and improve the overall accuracy of
the classification. There are in principle two ways for operating with a neutral class. Either,the
algorithm proceeds by first identifying the neutral language, filtering it out and then assessing
the rest in terms of positive and negative sentiments, or it builds a three way classification in
one step. This second approach often involves estimating a probability distribution over all
categories (e.g. Naive Bayes classifiers as implemented by Python's NLTK kit).
Whether and how to use a neutral class depends on the nature of the data: if the data is
clearly clustered into neutral, negative and positive language, it makes sense to filter the neutral
language out and focus on the polarity between positive and negative sentiments. If, in
contrast, the data is mostly neutral with small deviations towards positive and negative affect,
this strategy would make it harder to clearly distinguish between the two poles.
The field was founded on the claim that human intelligence "can be so precisely
described that a machine can be made to simulate it". This raises philosophical arguments about
the nature of the mind and the ethics of creating artificial beings endowed with human- like
intelligence, issues which have been explored by myth, fiction and philosophy since antiquity.
Some people also consider AI a danger to humanity if it progresses unabatedly. Attempts to
create artificial intelligence have experienced many setbacks, including theALPAC report of
1966, the abandonment of perceptrons in 1970, the Lighthill Report of 1973, the second AI
winter 1987–1993 and the collapse of the Lisp machine market in 1987.
CHAPTER 2
LITERATURE SURVEY
Andreas M. Kaplan*, Michael Haenlein “Users of the world, unite! The challenges
and opportunities of Social Media‖ ESCP Europe, 79 Avenue de la Republique, F-75011
Paris, France
The concept of Social Media is top of the agenda for many business executives today.
Decision makers, as well as consultants, try to identify ways in which firms can make profitable
use of applications such as Wikipedia, YouTube, Facebook, Second Life, and Twitter. Yet
despite this interest, there seems to be very limited understanding of what the term ‗‗Social
media ‘‘sexactly means; this article intends to provide some clarification. We begin by
describing the concept of Social Media, and discuss how it differs from related concepts such
as Web 2.0 and User Generated Content. Based on this definition, we then provide a
classification of Social Media which groups applications currently subsumed under the
generalized term into more specific categories by characteristic: collaborative projects, blogs,
content communities, social networking sites, virtual game worlds, and virtual social worlds.
Finally, we present 10 pieces of advice for companies which decide to utilize Social Media.
Identifying mechanisms that explain the relationship between anxiety and depression
are needed. The Tripartite Model is one model that has been proposed to help explain the
association between these two problems, positing a shared component called negative affect.
The objective of the present study was to examine the role of loneliness in relation to anxiety
and depression. A total of 10,891 school-based youth (Grades 2–12) and 254 clinical children
and adolescents receiving residential treatment (Grades 2–12) completed measures of
loneliness, anxiety, depression, and negative affect. The relationships among loneliness,
anxiety, depression, and negative affect were examined, including whether loneliness was a
significant intervening variable. Various meditational tests converged showing that loneliness
was a significant mediator in the relationship between anxiety and depression. This effect was
found across children (Grades 2–6) and adolescent (Grades7–12) school-based youth. In the
clinical sample, loneliness was found to be a significant mediator between anxiety and
depression, even after introducing negative affect based on the Tripartite Model. Results
supported loneliness as a significant risk factor in youths‘ lives that may result from anxiety
and place youth at risk for subsequent depression. Implications related to intervention and
prevention in school settings are also discussed.
Jun-Ming Xu, Kwang-Sung Jun, Xiaojin Zhu, Amy Bellmore,” Learning from
Bullying Traces in Social Media‖ Department of Computer Sciences,University of
Wisconsin-Madison,Madison, WI 53706, USA,Department of Educational
Psychology,University of Wisconsin-Madison,Madison, WI 53706, USA.
We introduce the social study of bullying to the NLP community. Bullying, in both
physical and cyber worlds (the latter known as cyberbullying), has been recognized as a serious
national health issue among adolescents. However, previous social studies of bullying are
handicapped by data scarcity, while the few computational studies narrowly restrict themselves
to cyberbullying which accounts for only a small fraction of all bullying episodes. Our main
contribution is to present evidence that social media, with appropriate natural language
processing techniques, can be a valuable and abundant data source for the study of bullying in
both worlds. We identify several key problems in using such data sources and formulate them
as NLP tasks, including text classification, role labeling, sentiment analysis, and topic
modeling. Since this is an introductory paper, we present baseline results on these tasks using
off-the-shelf NLP solutions, and encourage the NLP community to contribute better models in
the future.
Chapter 3
SYSTEM ANALYSIS
System analysis is the technique for finding the best answer for the issue. Framework
study is the procedure by which find out about the realistic issues, characterize articles and
prerequisites and assesses the arrangements. It is the state of mind about the affiliation and the
issue it includes, an arrangement of advances that aids in tackling these issues. Attainability
study assumes an essential part in framework examination which gives the objective for
configuration and improvement.
Chapter 4
Requirements
CHAPTER 5
SYSTEM DESIGN
Outline is a creative procedure; a fine plan is the way to valuable framework. The
framework "Outline" is characterized as "The procedure of applying an assortment of methods
and standards for the guideline of characterizing a procedure or a framework in adequate
component to allow its physical acknowledgment". Diverse configuration elements are taken
after to extend the framework. The outline design depicts the elements of the framework, the
segments or components of the framework and their hope to end-clients.
The project is concerned with identifying a solution that could be used to detect and filter out
sites containing fake news for purposes of helping users to avoid being lured by clickbaits. It
is imperative that such solutions are identified as they will prove to be useful to both readers
and tech companies involved in the issue.
commonly to actualize the for the most part framework. The System design is uncovered
underneath.
In this module, users can send a friend request to all the people and only those who
accept his/her friend request, the user can start chatting with him/her. In the same manner
other people can also send a friend request and it is up to the user to accept the friend request
or delete the friend request. If the friend request sent by the user is deleted then the user is not
able to chat with that person. Once the friend request is accepted among users then the users
can share images, post messages and can start chatting with their friends.
5.5.1 NLP gathers all the text messages from the database
Weka is a collection of machine learning algorithms for data mining tasks. The
algorithms can either be applied directly to a dataset or called from your own Java code.
Weka contains tools for data pre-processing, classification, regression, clustering, association
rules, and visualization. It is also well-suited for developing new machine learning schemes.
Training dataset is used to train the machine, then for the test dataset that is the messages
entered by the user the data mining rules are applied and then finally the result is predicted
using J48 algorithm
In this module there is a datastore, NLP, Weka tool. First all the text messages entered
by the users are gathered from a database, and then the sentiment analysis is performed using
NLP, J48 algorithm is applied to predict the data and then data mining rules are applied for the
predicted data and then finally the result is predicted.
Chapter 6
Implementation
Implementation phase plays a vital role in implementing the design requirements into the
reality. Final result can be found in the implementation phase. Therefore, successful new
system can be achieved in this key stage. Careful planning and controlling are required in this
implementation phase. Major decisions have to be taken care, before implementing any project.
They are as follows:
• Selecting the right programming language for the development of the project
• Selecting the right platform, that means windows operating system or linux operating
system or any other operating system
• Coding conventions have to be followed properly
• Classes: Class names can be a single word or it can consist of more than one word. If
more than one word is used, then the first letter of each word is capitalized. The class
should be a noun. Class names should be meaningful and descriptive. As far as
possible, class names should be avoided using abbreviations. Any word length can be
defined for the class names.
Example: Class EncryptionLayer
• Interfaces: Interface name can be adjective and first letter of the interface name must
start with the uppercase letter.
Example: ActionListener, ActionEvent
• Methods: Method name should be a verb and starts with a lowercase. If the method
names consist of more than one word, then starting letter of each word should be
capitalized.
Example: getNextNode(), checkPerformance()
• Variables: Variable names should be meaningful and short. Purpose of the variable
names are indicated by using proper wordings.
Example: string sourceNode
• Constants: All letters in the constants should be in the uppercase and separated by
underscores
Example: MAX_PRIORITY, MIN_VALUE
• Class declaration starts with class keyword and any user name can be given for the
name of the class. It may contain superclass and class may also be declared as final,
abstract or it may be public
• There should not be any space between name of the class and parenthesis. Argument
list is followed within the parenthesis
• Open flower brace ‗{‗ is present in the line of class declaration itself. Closing flower
brace ‗}‘ cab be written in the new line. Opening and closing braces ‗{‗ and ‗}‘ appear
in the same line, if there is a null statement
• In the gap between these two braces, constructors, field declarations, behaviour of
classes (implementation) and objects can be defined
}
Void add(){
/*
int z =x+y;
System.out.println(―Sum = ‖+z);
*/
}
}
Two types of comments are used in java as single line comment is // and block level
comment is /* */.
Once sender uploads file by using identity of the receiver, the upload function calls
amazon S3_client to check login credential. S3_client uploads file by calling put Object
Request.
Once Receiver initiates file to download, the filed download function fetches file from
amazon bucket by calling get Object Request.
6.5.2 Synchronization between services
SHA encryption and decryption functions were used to encrypt sender files using 64bit
secret key with security device identity and successfully decrypt file with the help of receiver
security device identity and secret key.
▪ Dynamic
▪ Secure
With most programming languages, you either compile or interpret a program so that
you can run it on your computer. The Java programming language is unusual in that a program
is both compiled and interpreted. With the compiler, first you translate a program into an
intermediate language called Java byte codes —the platform-independent codesinterpreted by
the interpreter on the Java platform. The interpreter parses and runs each Java byte code
instruction on the computer. Compilation happens just once; interpretation occurs each time
the program is executed. The following figure illustrates how this works.
You can think of Java byte codes as the machine code instructions for the Java Virtual Machine
(Java VM). Every Java interpreter, whether it‘s a development tool or a Web browser that can
run applets, is an implementation of the Java VM. Java byte codes helpmake ―write once,
run anywhere‖ possible. You can compile your program into byte codes on any platform that
has a Java compiler. The byte codes can then be run on any implementation of the Java VM.
That means that as long as a computer has a Java VM, the same program written in the Java
programming language can run on Windows 2000, a Solaris workstatioor on an iMac.
The Java API is a large collection of ready-made software components that provide
many useful capabilities, such as graphical user interface (GUI) widgets. The Java API is
grouped into libraries of related classes and interfaces; these libraries are known as packages.
The next section, What Can Java Technology Do? Highlights what functionality some of the
packages in the Java API provide.
The following figure depicts a program that‘s running on the Java platform. As the
figure shows, the Java API and the virtual machine insulate the program from the hardware.
Native code is code that after you compile it, the compiled code runs on a specific
hardware platform. As a platform-independent environment, the Java platform can be a bit
slower than native code. However, smart compilers, well-tuned interpreters, and just-in-time
byte code compilers can bring performance close to that of native code without threatening
portability.
However, the Java programming language is not just for writing cute, entertaining
applets for the Web. The general-purpose, high-level Java programming language is also a
powerful software platform. Using the generous API, you can write many types of programs.
An application is a standalone program that runs directly on the Java platform. A special
kind of application known as a server serves and supports clients on a network. Examples of
servers are Web servers, proxy servers, mail servers, and print servers. Another specialized
program is a servlet. A servlet can almost be thought of as an applet that runs on the server
side. Java Servlets are a popular choice for building interactive web applications, replacing the
use of CGI scripts. Servlets are similar to applets in that they are runtime extensions of
applications. Instead of working in browsers, though, servlets run within Java Web servers,
configuring or tailoring the server.
How does the API support all these kinds of programs? It does so with packages of
software components that provides a wide range of functionality. Every full
implementation of the Java platform gives you the following features:
• The essentials: Objects, strings, threads, numbers, input and output, data
structures, system properties, date and time, and so on.
• Applets: The set of conventions used by applets.
• Networking: URLs, TCP (Transmission Control Protocol), UDP (User Data
gram Protocol) sockets, and IP (Internet Protocol) addresses.
• Internationalization: Help for writing programs that can be localized for users
worldwide. Programs can automatically adapt to specific locales and be
displayed in the appropriate language.
• Security: Both low level and high level, including electronic signatures, public
and private key management, access control, and certificates.
• Software components: Known as JavaBeansTM, can plug into existing
component architectures.
• Object serialization: Allows lightweight persistence and communication via
Remote Method Invocation (RMI).
We can‘t promise you fame, fortune, or even a job if you learn the Java programming
language. Still, it is likely to make your programs better and requires less effort than other
languages. We believe that Java technology will help you do the following:
From a programming perspective, the beauty of ODBC is that the application can be
written to use the same set of function calls to interface with any data source, regardless of
the database vendor. The source code of the application doesn‘t change whether it talks to
Oracle or SQL Server. We only mention these two as an example. There are ODBC drivers
available for several dozen popular database systems. Even Excel spreadsheets and plain text
files can be turned into data sources. The operating system uses the Registry information
written by ODBC Administrator to determine which low-level ODBC drivers are needed to
talk to the data source (such as the interface to Oracle or SQL Server). The loading of the
ODBC drivers is transparent to the ODBC application program. In a client/server environment,
the ODBC API even handles many of the network issues for the application programmer.
The advantages of this scheme are so numerous that you are probably thinking there
must be some catch. The only disadvantage of ODBC is that it isn‘t as efficient as talking
directly to the native database interface. ODBC has had many detractors make the charge that
it is too slow. Microsoft has always claimed that the critical factor in performance is the quality
of the driver software that is used. In our humble opinion, this is true. The availability of good
ODBC drivers has improved a great deal recently. And anyway, the criticism about
performance is somewhat analogous to those who said that compilers would never match the
speed of pure assembly language. Maybe not, but the compiler (or ODBC) gives you the
opportunity to write cleaner programs, which means you finish sooner. Meanwhile, computers
get faster every year.
6.6.4 JDBC
In an effort to set an independent database standard API for Java; Sun Microsystems
developed Java Database Connectivity, or JDBC. JDBC offers a generic SQL database access
mechanism that provides a consistent interface to a variety of RDBMSs. This consistent
interface is achieved through the use of ―plug-in‖ database connectivity modules, or drivers.
If a database vendor wishes to have JDBC support, he or she must provide the driver for each
platform that the database and Java run on.
To gain a wider acceptance of JDBC, Sun based JDBC‘s framework on ODBC. As you
discovered earlier in this chapter, ODBC has widespread support on a variety of platforms.
Basing JDBC on ODBC will allow vendors to bring JDBC drivers to market muchfaster than
developing a completely new connectivity solution.
JDBC was announced in March of 1996. It was released for a 90 day public review that
ended June 8, 1996. Because of user input, the final JDBC v1.0 specification was released
soon after.
The remainder of this section will cover enough information about JDBC for you to
know what it is about and how to use it effectively. This is by no means a complete overview
of JDBC. That would fill an entire book.
6.6.5 JDBC Goals
Few software packages are designed without goals in mind. JDBC is one that, because of
its many goals, drove the development of the API. These goals, in conjunction with early
reviewer feedback, have finalized the JDBC class library into a solid framework for building
database applications in Java.
The goals that were set for JDBC are important. They will give you some insight as to why
certain classes and functionalities behave the way they do. The eight design goals for JDBC
are as follows:
SQL Level API
The designers felt that their main goal was to define a SQL interface for Java. Although
not the lowest database interface level possible, it is at a low enough level for higher-level
tools and APIs to be created. Conversely, it is at a high enough level for application
programmers to use it confidently. Attaining this goal allows for future tool vendors to
―generate‖ JDBC code and to hide many of JDBC‘s complexities from the end user.
1. SQL Conformance
SQL syntax varies as you move from database vendor to database vendor. In an effort
to support a wide variety of vendors, JDBC will allow any query statement to be passed
through it to the underlying database driver. This allows the connectivity module to handle
non-standard functionality in a manner that is suitable for its users.
Chapter 7
Software Testing
In software development life cycle, software testing is one of the significant phases. In
software testing phase, verification and validation of the project under development are
performed with respect to requirements mentioned [28]. In this project, Unit testing of main
modules, Integration testing of different modules and complete System testing functionality
along with the GUI testing is performed.
The Table 7.2.1 shows Unit Test Case 1(UTC-1) to verify whether user interface accepts
registration details of sender and receiver.
Sl No of Test Case UTC-1
Name of Test Case Test case to verify Registration Page
Feature being Tested Registration details
Description User should provide details like Name,
Email Id, Mobile Number
Sample Input Email Id, Mobile Number, Username
Password
Expected Output Registration Successful message should be
displayed
Actual Output As expected
Remarks Passed
Table 7.2.1 Unit Test Case 1 for Registration Page
Table 7.3.1 Integration Test Case 1 for Sender Registration and Post Messages
Sl No of Test Case ITC-1
Name of Test Case Test case to register Sender and post
messages
Feature being Tested Registration and post message
Description Sender should register first to post
messages to other user
Sample Input Text messages
Expected Output Message successfully sent
Table 7.3.1 Integration Test Case 1 for Sender Registration and Post Messages
Table 7.3.2 shows Integration Test Case 2 for sentiment analysis and prediction
Sl No of Test Case ITC-2
Name of Test Case Test case to analyse sentiment and
predict bullying words
Feature being Tested Filter bullying words
Description After sentiment analysis we should apply
ai to predict bullying words
Sample Input Posted Messages
Expected Output Bullying words Predicted
Actual Output As expected
Remarks Passed
Table 7.3.2 Integration Test Case 2 for sentiment analysis and prediction
With the growth of online social networks and proportional rise in public shaming
events, voices against callousness on part of the site owners are growing stronger.
Categorization of shaming comments as presented in this work has the potential for a user to
choose to allow certain types of shaming comments (e.g., comments which are sarcastic in
nature) giving her an opportunity for rebuttal, and block others (e.g., comments which attack
her ethnicity) according to individual choices. Freedom to choose what type of utterances one
would not like to see in his/her feed beforehand is way better than flagging a deluge of
comments on the event of shaming. This also liberates moderators from the moral dilemma of
deciding a threshold that separates acceptable online behaviour from unacceptable ones, thus
relieving themselves to a certain extent from the responsibility of fixing what is best for another
person.
REFERENCES
[1] J. Ronson, So You’ve Been Publicly Shamed. London, U.K.: Picador, 2015.
[3] S. Sood, J. Antin, and E. Churchill, “Profanity use in online communities,” in Proc.
SIGCHI Conf. Hum. Factors Comput. Syst., 2012, pp. 1481–1490.
[4] S. Rojas-Galeano, “On obstructing obscenity obfuscation,” ACM Trans. Web, vol. 11, no.
2, p. 12, 2017.
[5] E. Wulczyn, N. Thain, and L. Dixon, “Ex machina: Personal attacks seen at scale,” in Proc.
26th Int. Conf. World Wide Web, 2017, pp. 1391–1399.
[6] A. Schmidt and M. Wiegand, “A survey on hate speech detection using natural language
processing,” in Proc. 5th Int. Workshop Natural Lang. Process. Social Media Assoc. Comput.
Linguistics, Valencia, Spain, 2017, pp. 1–10.
[7] Hate-Speech. Oxford Dictionaries. Accessed: Aug. 30, 2017. [Online]. Available:
https://en.oxforddictionaries.com/definition/hate_speech
[8] W. Warner and J. Hirschberg, “Detecting hate speech on the world wide Web,” in Proc.
2nd Workshop Lang. Social Media, 2012, pp. 19–26.
[9] I. Kwok and Y. Wang, “Locate the hate: Detecting tweets against blacks,” in Proc. AAAI,
2013, pp. 1621–1622.
[10] P. Burnap and M. L. Williams, “Cyber hate speech on Twitter: An application of machine
classification and statistical modeling for policy and decision making,” Policy Internet, vol. 7,
no. 2, pp. 223–242, 2015.
[11] Lee-Rigby. Lee Rigby Murder: Map and Timeline. Accessed: Dec. 7, 2017. [Online].
Available: https://http://www.bbc.com/news/uk25298580
[12] Z. Waseem and D. Hovy, “Hateful symbols or hateful people?Predictive features for
hate speech detection on Twitter,” in Proc. SRW HLTNAACL, 2016, pp. 88–93.
[13] P.Badjatiya, S. Gupta, M. Gupta, and V. Varma, “Deep learning for hate speech detection
in tweets,” in Proc. 26th Int. Conf. World Wide Web Companion, 2017, pp. 759– 760.
[14] D. Olweus, S. Limber, and S. Mihalic, Blueprints for Violence Prevention, Book Nine:
Bullying Prevention Program. Boulder, CO, USA: Center for the Study and Prevention of
Violence, 1999.
[15] P. K. Smith, H. Cowie, R. F. Olafsson, and A. P. D. Liefooghe, “Definitions of bullying:
A comparison of terms used, and age and gender differences, in a fourteen–country
international comparison,” Child Develop., vol. 73, no. 4, pp. 1119–1133, 2002.
[16] R. S. Griffin and A. M. Gross, “Childhood bullying: Current empirical findings and future
directions for research,” Aggression Violent Behav., vol. 9, no. 4, pp. 379–400, 2004.
[17] H. Vandebosch and K. Van Cleemput, “Defining cyberbullying: A qualitative research
into the perceptions of youngsters,” CyberPsychol. Behav., vol. 11, no. 4, pp. 499–503, 2008.
[20] P. Singh, T. Lin, E. T. Mueller, G. Lim, T. Perkins, and W. L. Zhu, “Open mind common
sense: Knowledge acquisition from the general public,” in Proc. OTM Confederated Int. Conf.
Move Meaningful Internet Syst. Berlin, Germany: Springer, 2002, pp. 1223–1237.
[24] P. Tsantarliotis, E. Pitoura, and P. Tsaparas, “Defining and predicting troll vulnerability
in online social media,” Social Netw. Anal. Mining, vol. 7, no. 1, p. 26, 2017.
[27]Face++CognitiveServices.Accessed:Feb.20,2018.[Online].Available:
https://www.faceplusplus.com/
[31] M. Hu and B. Liu, “Mining and summarizing customer reviews,” in Proc. 10th ACM
SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2004, pp. 168–177.
[35] Brown-Clusters. Twitter Word Clusters. Accessed: Jul. 2, 2017. [Online]. Available:
http://www.cs.cmu.edu/~ark/TweetNLP/
[36] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20, no. 3, pp.
273–297, 1995.
[37] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM
Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 27:1–27:27, 2011.
[38] E. Colleoni, A. Rozza, and A. Arvidsson, “Echo chamber or public sphere? Predicting
political orientation and measuring political homophily in Twitter using big data,” J. Commun.,
vol. 64, no. 2, pp. 317–332, 2014.