Professional Documents
Culture Documents
Social Media Sentiment Analysis A New Empirical Tool For Assessing Public Opinion On Crime
Social Media Sentiment Analysis A New Empirical Tool For Assessing Public Opinion On Crime
Jeremy Prichard, Paul Watters, Tony Krone, Caroline Spiranovic & Helen
Cockburn
To cite this article: Jeremy Prichard, Paul Watters, Tony Krone, Caroline Spiranovic
& Helen Cockburn (2015) Social Media Sentiment Analysis: A New Empirical Tool for
Assessing Public Opinion on Crime?, Current Issues in Criminal Justice, 27:2, 217-236, DOI:
10.1080/10345329.2015.12036042
Article views: 37
Abstract
‘Big data’ presents many interesting opportunities and challenges. This article focuses on
the potential use of social media sentiment analysis as a legitimate tool for criminological
research to better understand public perceptions of crime problems and public attitudes to
responses to crime. While a degree of scepticism should always apply to the use of
unsubstantiated sources on the internet, SMSA is likely to be a rich source of valuable
information. Observational SMSA research presents low-level risks in terms of human
research ethics principally because the information derived is unlikely to lead to the
identification of research subjects. It is arguable, but less certain, that material posted
publicly online does not attract a reasonable expectation of privacy for the author. However,
the strength of this argument may depend on the particular circumstances in which the
material to be analysed was posted.
Introduction
Sentiment analysis (or ‘opinion mining’) is the use of information technology to automatically
evaluate opinions expressed across multiple texts. The internet is a rich source of opinions —
in posts or comments on news and other websites, as well as many different social media
*
Senior Lecturer, Law School, University of Tasmania, Private Bag 89, Hobart Tas 7001, Australia. Email:
jeremy.prichard@utas.edu.au.
† Professor in Information Technology, School of Engineering and Advanced Technology, Massey University,
Private Bag 11 222, Palmerston North 4442, New Zealand. Email: paul.watters.massey@gmail.com.
‡ Associate Professor, School of Law and Justice, Building 11, University of Canberra ACT 2601, Australia.
Email: tony.krone@canberra.edu.au.
§ Research Fellow, Law School, University of Tasmania, Private Bag 89, Hobart Tas 7001, Australia. Email:
caroline.spiranovic@utas.edu.au.
** Lecturer, Law School, University of Tasmania, Private Bag 89, Hobart Tas 7001, Australia. Email:
helen.cockburn@utas.edu.au.
218 CURRENT ISSUES IN CRIMINAL JUSTICE VOLUME 27 NUMBER 2
platforms such as Facebook, Twitter, blogs and message boards. On just one day in early June
2015 it was estimated that there were more than three billion internet users collectively using
almost a billion sites, sending more than 200 billion emails, making nearly four million blog
posts, sending more than 750 million Tweets, and almost 1.5 billion Facebook accounts were
active (Real Time Statistics Project 2015). These raw figures are incredible but are likely to
be inflated by what is effectively ‘junk’, such as spam.
Opinion mining is one way of manipulating part of the staggering amount of information
or ‘big data’ that modern information and communications technology generates (Moorthy et
al 2015). Opinion mining across online news media and social media is referred to as social
media sentiment analysis (‘SMSA’). A basic form uses natural language processing
techniques to extract binary sentiments on particular issues. SMSA may also involve more
nuanced techniques, such as clustering, to analyse related opinions or constructs that do not
fall neatly into binary categories (Layton et al 2013a).
This article discusses the use of SMSA to observe and record public commentary on the
internet that has not been solicited by the researcher (Veltri 2013). In terms of privacy and
human research ethics concerns, this is arguably the least intrusive research application of
SMSA.
There are other applications of SMSA in academic research beyond the scope of this
article. Each raises differing questions about privacy and ethics (Freeman Cook and Hoas
2013), including the effect of the active role taken by the researcher interacting with the
subjects of the research (Hesse-Biber and Griffin 2013). Studies observed can be categorised
into three types:
1. researcher use of a virtual space to engage others and elicit comments (Allen 2014);
2. researcher–participant interaction facilitated through social media (Curtis 2014);
3. researchers monitoring participants in clinical research studies (Glickman et al 2012).
In the technical literature, much attention has been given to refining SMSA to collate
macro-level, real-time indicators of public opinion. Feldman (2013) estimated that
information technology (‘IT’) researchers published over 7000 articles on SMSA. This effort
is, at least in part, driven by the demand for SMSA from governments (Gray and Gordo 2014),
the corporate sector (Zhang and Vos 2014) and in politics (Groshek and Al-Rawi 2013;
Hawthorne et al 2013).
SMSA techniques are established in fields as diverse as health (Christensen et al 2014),
product safety (Isah et al 2014; Shan et al 2014), crisis management (Johansson et al 2012),
economic development (Schroeder 2014) and education (Granitz and Koernig 2011). It is
clear that the vast array of platforms for the expression of opinion presents distinct
opportunities and challenges for research. For example, Hesse-Biber and Griffin (2013)
reviewed different research studies targeting particular interests, including an investigation of
social capital in online gaming communities, a study of hyperlinking (for network analysis)
on ‘living wage’ activist sites, and online social support groups on a parenting site.
Deliberative processes
As surmised by Indermaur and colleagues (2012), scholarly literature has identified a number
of prerequisites of informed opinion including information, responsibility taking and
deliberation (see, for example, Price and Neijins 1998). Information refers to the fact that
respondents require a certain level of knowledge and must be provided with relevant
contextual information in order to arrive at an informed opinion. Responsibility taking refers
to respondents feeling some personal investment or responsibility for their answers.
Deliberation requires an in-depth consideration of the available information and choices
available and the pros and cons of these choices before reaching a decision. The process of
deliberation has been described as a social process whereby individuals discuss their views
with others and must consider the alternative views of others (Yankelovich 2010). Adopting
this strict definition of ‘informed opinion’ would mean that even well-designed and well-
worded representative surveys cannot tap into informed opinions because respondents are not
able to deliberate with others when answering.
Focus groups
Due to these and other weaknesses of representative surveys, some criminologists prefer to
use focus groups to gauge public opinion on crime and justice issues (Gelb 2006:16). Focus
groups usually involve small groups of respondents brought together to discuss a particular
issue(s) and a facilitator who ensures that discussions stay on topic and necessary issues are
covered. The samples generated from focus group studies tend not to be representative of the
population as a whole as the numbers participating are generally small and self-selection
biases may determine who is willing to participate in this more time-intensive method.
Focus groups also tend to generate qualitative, as opposed to quantitative, data. However,
it has been argued that this approach provides richer data than media polls or representative
surveys, as participants can explain and qualify their views in more detail and are encouraged
to think about the issues more deeply by discussing them with others (Gelb 2006; Stobbs et
al 2014). In this sense, focus groups may better tap into informed opinions at least with respect
to the deliberation component. The extent to which respondents are informed and encouraged
to take responsibility largely depends on the design of the study, including the information
and instructions provided to respondents, and respondents’ understanding of the implications
of the study for criminal justice policy.
Mixed methods
Due to the strengths and weaknesses of these approaches, many researchers gauging public
opinion towards crime and justice issues advocate the use of mixed-methods approaches
involving both representative surveys and focus groups. The rich data obtained from focus
groups is said to complement and supplement the information obtained from representative
surveys. Mixed-methods have also been used in juror studies (see, for example, Warner and
Davis 2012) investigating attitudes to sentencing using both surveys and semi-structured
interviews to provide a richly textured understanding of the attitudes of ordinary people
presented with legally admissible material relevant to sentencing of individual offenders
(Warner and Davis 2012; Gwin 2010).
Deliberative polls
Deliberative polls combine the key features of representative surveys and focus groups and
capitalise on the strengths of these methods. Deliberative polls essentially involve the use of
222 CURRENT ISSUES IN CRIMINAL JUSTICE VOLUME 27 NUMBER 2
sentiments into a simple, quantitative statement, such as ‘90% of respondents agree that sex
offenders deserve life in jail’.
A recent Australian online newspaper article proposed increases to sentences for child sex
offences. In this example, a journalist wrote a short news story covering a proposal to change
the law, and 46 users responded with their own opinions. The responses range in length from
one or two words (‘good’ or ‘great idea’), to 293 words. Other responses include both natural
language, as well as links to tweets and images. The opinions range from ‘kill everyone before
they commit crime’, and ‘physical castration’, through to crime prevention and rehabilitation.
Many responses contain spelling or grammatical errors. To reduce this complex set of data to
one or more statements expressing sentiment, accompanied by a frequency analysis, a
significant amount of natural language processing and information retrieval is required.
Some approaches to opinion mining attempt to circumvent the information retrieval
problem by forcing users to provide quantitative ratings against qualitative descriptors. For
example, Amazon.com allows users to rank products from one to five stars and to leave a
comment or write a review. Similarly, TripAdvisor provides an equivalent five-point scale for
hotel reviews. Yet these kinds of scales do not represent the range of opinion, emotion or
attitudes that might be revealed from a computational analysis of text; indeed, sometimes the
quantitative ratings are not consistent with the qualitative reviews, or with external standards.
A user may rate an externally rated three-star hotel with five stars, since the experience met
his or her expectations, but this does mean that the hotel is actually ‘5-star’ (Layton et al
2013c). To some extent, this reflects the subjective nature of sentiments, rather than more
fact-based schemes; for example, to achieve an extra star rating, a hotel may simply have to
install a pool, rather than meet the subjectively identified needs of its patrons.
In describing the development of sentiment analysis, Pang and Lee (2008) note the range
of data sources first able to be mined, beginning with e-commerce sites, review sites and
blogs. With Web 2.0, this extended to social media including tweets, Facebook and LinkedIn.
Common constraints apply to the computational processing required to identify and extract
sentiment from these newer sources.
An additional problem is that short message services like Twitter provide very little textual
material to process. Returning to the child sex offender story, a single comment like ‘Good’
is ambiguous, since the subject must be inferred from the story. Is it ‘good’ that proposed
sentences are longer or was there some other aspect of the story or comments made that was
‘good’? A reader may be able to infer a sequence within the discussion forum threads, but it
is not always the case that users will reply in the most ‘logical’ place, and an automated
technique for analysing opinion may struggle without a clearly defined context. These types
of ambiguity continue to make SMSA a challenge. For example, Bartlett and Norrie (2015)
describe a study of public attitudes towards immigration which was initially based on
automated ‘natural language processing’ analysis of Twitter feeds. The authors found it
necessary to include manual analysis to determine the direction of sentiments (whether
positive, negative or neutral).
Most approaches to SMSA need three components to operate: a model for representing
text to perform computations on it; an algorithm for identifying and measuring sentiment; and
a reporting system.
Representational models
The most common approach to natural language processing is to use a vector representation,
or a ‘bag of words’ approach, which is described in detail by Perone (2011). In a bag of words,
224 CURRENT ISSUES IN CRIMINAL JUSTICE VOLUME 27 NUMBER 2
each document, such as a comment on a news story, is coded with the frequency of term
occurrence, where each unique term is coded as a dictionary entry. Coding as a dictionary
entry means that you create a data dictionary of unique terms in all of the documents, starting
at 1, and enumerating every unique term. So ‘crime’ is term 1, ‘to’ is term 2, and so on. The
order of terms is not considered by most algorithms. Thus, if we take two or more documents
(from our newspaper opinion example above), such as:
Opinion 1: ‘crime to come to the attention of the police’ and
Opinion 2: ‘get tough on crime’
we can construct a dictionary thus:
{
‘crime’: 1,
‘to’: 2,
‘come’: 3,
‘the’: 4,
‘attention’: 5,
‘of’: 6,
‘police’: 7,
‘get’: 8,
‘tough’: 9,
‘on’: 10,
}
which has 10 distinct terms. We then create a vector space representation of the terms in each
document:
Opinion 1: [1, 2, 1, 2, 1, 1, 1, 0, 0, 0]
Opinion 2: [1, 0, 0, 0, 0, 0, 0, 1, 1, 1]
Reading the first vector, which corresponds to the first document, from left to right, it
means that there is one instance of the term ‘crime’, two of the term ‘to’, one of the term
‘come’, two of the term ‘the’, and so on. The term ‘crime’ appears in each document, so the
frequency count shown here is ‘1’ for each vector. For the terms ‘to’ and ‘the’, the frequency
count for the first document is ‘2’, but since the terms do not appear in the second vector, the
frequency count is ‘0’. This is an example only and this sort of analysis is obviously unlikely
to be meaningful with a small number of documents.
While the frequency count is critical to determining the relevance of a certain term to a
particular document, this can also be offset by weighting the terms against its frequency in
natural language at large. Schemes such as Term Frequency-Inverse Document Frequency
(‘TF-IDF’) operate using this principle, and can be used to remove high-frequency words such
as ‘to’ and ‘the’ by creating a stoplist, since they are not helping in computationally extracting
meaning from documents (Wu et al 2008). Standard natural language processing technologies
can be applied to improve the quality of the vectors: verbs can be stemmed to ensure that they
NOVEMBER 2015 SOCIAL MEDIA SENTIMENT ANALYSIS 225
are not counted as separate features, and misspelled words could be identified and counted
within the frequencies for the correctly spelled word.
Algorithms
Once feature vectors of this kind have been developed, they can act as input for various
learning algorithms that could be used to measure sentiment. This can be achieved using a
similar approach to spam classification for electronic mail, for example, where more terms
associated with spam will be associated with the ‘spam’ set of terms than the non-spam ‘ham’
set. In the simplest case of sentiment analysis — such as a proposition to increase jail terms
for sex offenders — it should be possible to separate documents into two separate groups
(for/against) using a binary classifier, such as Bayes’ algorithm. If sufficiently large
representative samples are obtained for each class, this kind of probabilistic classifier can
produce highly accurate results. It may also be possible to improve the classification results
by using a form of semi-supervised learning, such that a human judge can provide feedback
on the judgments made by a supervised algorithm (Goldberg and Zhu 2006).
To automatically identify which groups are associated with each proposition, it is necessary
to match keywords that are typically ‘for’ a proposition to cases, and those typically ‘against’.
This could be achieved by using data gathered from human judges (Pang et al 2002), or by
using a set of hypernyms extracted from a semantic database like WordNet (Baccianella et al
2010). For an exploration of concepts relevant to determining meaning in social media text,
see Lomborg (2015).
The easiest propositions to test for sentiment are those that are polarising and likely to fall
into two separate camps. As is apparent from the two sample vectors above, there is not a lot
of overlap. If this pattern was repeated at large scales, with many respondents, separating out
the terms associated with each argument (good/bad, for/against etc) should be relatively easy.
One aspect of sentiment analysis that makes it more complicated than email filtering is that
the identification of multiple classes may not be known a priori. It is not the case that posters
in the online article referred to above only had two opinions; the issues raised were
multifaceted and complex, so multiclass classification may be necessary.
Reporting
In the simple example above, the data was drawn from posts on a single news article. To
investigate sentiments more broadly, it may be necessary to integrate raw data sampled from
a range of sources, which is technically relatively easy to achieve. Any data that can eventually
be represented as a case, using the bag of words model, can be analysed for sentiment. Many
social media applications provide Application Programming Interfaces (‘APIs’) that make it
easy to search for, identify and download relevant data. A range of data interchange formats
is widely in use, including the eXtensible Markup Language (‘XML’), and the so-called
‘semantic web’ technologies for representing and reasoning about web data (including the
Resource Description Framework). Each API will have its own formats and available
services; Google, for example, has a set of APIs that allows data to be searched for and
integrated across web, mail and geographic data sets. However, there may be proprietary
barriers to accessing data in bulk and many services limit the rate at which data can be
downloaded, so that competitors cannot simply create a ‘carbon copy’ of all of the company’s
data; Twitter, for example, limits search rates to between 15 and 180 requests for 15 minutes
(Twitter 2015). When services place time or capacity limits on data downloads, this can
significantly lengthen the data acquisition phase of the study — a data retrieval task that might
take ten minutes ordinarily may take 24 hours if delays are introduced. Depending on the
226 CURRENT ISSUES IN CRIMINAL JUSTICE VOLUME 27 NUMBER 2
study design, it may be helpful to pool all data together into a single dataset, or at least retain
the source, so that comparisons could be made between different providers (Facebook,
Twitter) or modalities (news commentary, social media).
Researchers intending to use sentiment analysis are faced with a range of practical
considerations. Sample sizes required depend entirely on the classification algorithms being
used and the application at hand. An example is the sentiment analysis of H1N1 tweets to
predict the spread of the virus. In this case, a maximum of 600 tweets per day over nine days
was sufficient to achieve a high level of predictability (Chew and Eysenbach 2010). The cost
of implementing a system for undertaking sentiment analysis will depend on: whether
commercial or open source software is used; whether API access to data sources is free; the
scale of the data to be extracted; whether custom APIs or screen-scraping software need to be
developed; and the not-insignificant hardware costs for storing and processing data. The
expertise required to implement these systems includes natural language engineering skills,
data integration knowledge, and experience with various machine learning algorithms.
Researchers with these IT skill sets would exist at many universities in countries like Australia
and New Zealand. However, for their skills to effectively address criminal justice-related
research questions, clearly they would need to collaborate with criminologists.
As a note of caution, the accuracy of even some of the best techniques is far from perfect.
For example, Agarwal et al (2011) used a completely automated model of SMSA. They
undertook binary opinion mining of a large Twitter corpus, and found accuracy ranged
between 71.35 to 75.39 per cent using various sorts of SMSA algorithms, including unigram,
tree kernel, senti-features and combinations of these. Standard deviation for test accuracy
ranged between 0.65 and 1.95. Given that chance level accuracy would be 50 per cent, it
seems that current iterations of 100 per cent automated SMSA involves unacceptable risk of
error. Where statistical analyses were concerned, this could translate into Type 1 and 2 errors
(erroneous acceptance or erroneous rejections of hypotheses). Consequently, those interested
in investigating SMSA for criminological research are — at least for the foreseeable future
— likely to want to include the sorts of human judgment and supervision employed by
Goldberg and Zhu (2006). Perhaps these results suggest that a certain level of automation may
be desirable, and may reduce the human effort required by about 50 per cent, but, ultimately,
human assessment is required for greatest reliability.
findings may be difficult for journal editors and peer reviewers to assess. For a discussion of
the sorts of discipline-challenges that big data (like SMSA) has presented empirical sociology,
see Savage and Burrow (2007).
Notwithstanding these complexities and challenges, this article suggests that SMSA is a
promising method for gauging public opinion either alone or in combination with traditional
empirical approaches. Certain strengths of SMSA ought to be considered from the empirical
perspective. First, after establishing new collaborations and implementing and refining SMSA
methods, research teams would have a tool that could be used efficiently and frequently. This
would be ideal, for example, to use cross-sectional repeated measures to track public opinion
on a particular topic over time. Second, although this article has highlighted how error can
operate within SMSA, the traditional methods are themselves not protected from human error.
For instance, a researcher’s handwritten interview notes may capture some of the sentiment
expressed by a participant, but miss other points conveyed. Additional errors may be made
when typing the notes into an electronic format, coding the qualitative data, or cleaning the
data in preparation for analysis (McCrady et al 2010). Third, and perhaps most strikingly,
SMSA sample sizes can be very large indeed, as discussed above – many hundreds of
thousands of people. Fourth, unlike traditional methods of studying public opinion, SMSA
does not recruit participants. It only analyses what participants express in public settings
online. This means that SMSA limits some of the selection effects capable of biasing results
in traditional methods. For example, for practical reasons, recruitment for traditional studies
may be limited to certain geographical areas. Alternatively, participation in a study may be
inconvenient for a class of people because of work, leisure or family commitments — despite
the fact that they fall within a study’s target population.
Finally, participation in empirical research can itself influence participants’ behaviour in
different ways — a phenomenon that is sometimes called the ‘observer effect’. Among other
things, participant responses can be affected by their desire to be seen in a positive light by
the researcher, particularly in face-to-face interviews (Krumpal 2013). This suggests that
another potential value of SMSA data is that it removes researchers from the environment
under analysis. It is likely that if individual concerns about ‘social desirability’ (Krumpal
2013:2026) affect behaviour in empirical interviews, then social desirability probably also
influences online behaviour. However, arguably social desirability loses potency when
internet users feel anonymous. The perception of anonymity is considered a powerful factor
in criminal decision-making (Clarke 2008), including serious online crimes (Wortley and
Smallbone 2012) and engaging in other forms of deviant behaviour (Demetriou and Silke
2003). The implication for criminologists is that SMSA may have particular advantages in
capturing honest but extreme views on contentious criminal justice issues that would not be
expressed in other forums.
x some techniques may identify individuals by gathering data, such as names, images,
dates of birth or addresses. Individuals who post under pseudonyms may
inadvertently reveal information about themselves. There have been numerous cases
of individuals posting opinions on social media whose employment has been
terminated for failing to adhere to their employer’s social media policies (Berkelaar
2014; Jacobson and Tufts 2013; Moussa 2015; O’Connor and Schmidt 2015; Van
Iddekinge 2013; West and Bowman 2014);
x sometimes opinions given in restricted circumstances may inadvertently be leaked.
Tagging a friend in Facebook posts, for example, may make these opinions available
to friends of friends. It is not clear that users always understand the implications of
opinion leakage;
x open source intelligence algorithms also make it possible to match, with 90 per cent
accuracy, text being composed by the same individual using different aliases or
pseudonyms (Layton et al 2013b). However, when this step is taken alone, the
identity of the person using those aliases is not revealed.
Privacy laws in Australia such as the Privacy Act 1988 (Cth) currently have a narrow scope,
being ‘concerned with the security of personal information held by certain entities, rather than
with privacy more generally’ (ALRC 2014:46). The Australian Law Reform Commission
(‘ALRC’) recommended a new tort of invasion of privacy with two limbs: intrusion into a
reasonable expectation of privacy; and misuse of private information with a test that ‘the
invasion of privacy must be committed intentionally or recklessly, must be found to be
serious, and must not be justified by broader public interest considerations, such as freedom
of speech’ (ALRC 2014:78). Importantly, the ALRC noted that the terms under which a
person posts material to the internet is usually determined by the End User Licence Agreement
set by the website administrator and agreed to as a condition of use. A comprehensive review
of these agreements showed widely varying practices that are unlikely to be fully appreciated
by users (MacGibbon and Phair 2013).
Most internet users included in a SMSA study are at very low risk of being identified by
algorithms designed for the limited purpose of analysing public opinion. Importantly, SMSA
can be designed to explicitly exclude identifying information from the data collection, or the
risk of inadvertent identification can be reduced by cloaking the results when reported.
Australia’s National Statement on Ethical Conduct in Human Research (NHMRC 2007),
updated in May 2015, does not contain specific provisions regarding social media. However,
SMSA clearly falls under its broad definition of ‘human research’ because it involves
analysing ‘data’ or ‘other materials’ generated by individuals (NHMRC 2007:7). In a sense,
SMSA also involves human ‘observation’, albeit in an online environment and not usually in
real time. Like similar research-ethics documents that operate in other countries, the National
Statement (NHMRC 2007) recognises cornerstone ethical principles for human research.
These principles are not intended to be applied in a formulaic way. Rather, they are used to
balance the ethical strengths and weaknesses of potential research.
One such principle is respect for individuals’ autonomy. Autonomy is most obviously
respected by the fact that researchers usually seek individuals’ voluntary and fully informed
consent before including them in a study. In addition, participants’ autonomy is respected
through taking steps to safeguard participants’ confidentiality and to protect their personal
information (Beauchamp and Childress 2001). The other ethical principles are non-
maleficence, beneficence and distributive justice. Respectively these principles require that
research:
NOVEMBER 2015 SOCIAL MEDIA SENTIMENT ANALYSIS 229
Conclusion
This article deals with the observation and recording of ‘public’ commentary for the purposes
of criminological research. The use of SMSA to distil opinions from publicly posted writings
is unlikely to identify persons and, in any event, is based on material where there is unlikely
to be a reasonable expectation of privacy. In our view, to answer the question we posed in the
title to this article, SMSA is a potentially useful new empirical tool for assessing public
opinion on crime.
SMSA can be designed so that, from the data gathered, all or most of the participants are
non-identifiable — meaning that the data do not contain individual identifiers. Steps can be
taken to further mitigate the low risk of identifying participants. As noted, human judges
improve the accuracy of SMSA data (Goldberg and Zhu 2006). They could also be employed
to test the efficacy of SMSA identity safeguards in preparatory phases. Once a study
commences, human judges could play a central role in monitoring the SMSA project’s HREC
compliance. Adverse or unexpected outcomes would need to be reported to the relevant
HREC. In some cases, it may be possible to rectify the SMSA algorithm to address the
safeguard problem. Since SMSA is a form of big data, researchers are not likely to be
interested in reporting specific sections of text, although if some text is worth quoting it can
be suitably cloaked to minimise identification. If researchers are committed to following
protocols about reporting qualitative data, they could further reduce risks of harm to
participants (for example, by ensuring individuals are not linked with views that may
embarrass them or cause them to be discriminated against).
Other forms of research using big data or SMSA techniques may be more problematic and
would have to be considered individually on their merits. The ‘mosaic theory’, which suggests
that expectations of privacy may be engaged for the aggregation of disparate personalised
data, may serve as a useful guide for considering the implications of other uses of SMSA
(Gray et al 2013).
230 CURRENT ISSUES IN CRIMINAL JUSTICE VOLUME 27 NUMBER 2
Finally, a note of caution is required. As with anything on the internet, common sense and
experience tells us to be sceptical and critical. The potential for misinformation, distortion,
trolling and manipulation of social media is ever present and we should consider carefully the
wider context in which all comments appear on the internet.
NOVEMBER 2015 SOCIAL MEDIA SENTIMENT ANALYSIS 231
Legislation
Privacy Act 1988 (Cth)
References
Agarwal A, Xie B, Vovsha I, Rambow O and Passonneau R (2011) ‘Sentiment Analysis of Twitter Data’
in Proceedings of the Workshop on Languages in Social Media, Association for Computational
Linguistics, 30–8
Allen C (2014) ‘Anti-Social Networking: Findings from a Pilot Study on Opposing Dudley Mosque
Using Facebook Groups as Both Site and Method for Research’, Sage Open 4(1)
<http://sgo.sagepub.com/content/4/1/2158244014522074>
Australian Law Reform Commission (‘ALRC’) (2014) Serious Invasions of Privacy in the Digital Era,
Report 123
Baccianella S, Esuli A and Sebastiani F (2010) ‘SentiWordNet 3: An Enhanced Lexical Resource for
Sentiment Analysis and Opinion Mining’ in Proceedings of the Seventh Conference on International
Language Resources and Evaluation (LREC 10), 2200–4
Bartlett J and Norrie R (2015) Immigration on Twitter: Understanding Public Attitudes Online, Demos
Beauchamp TL and Childress JF (2001) Principles of Biomedical Ethics (5th ed), Oxford University Press
Berkelaar BL (2014) ‘Cybervetting, Online Information and Personnel Selection: New Transparency
Expectations and the Emergence of a Digital Social Contract’, Management Communication Quarterly
28(4), 479–506
Boase J (2013) ‘Implications of Software-Based Mobile Media for Social Research’, Mobile Media &
Communication 1(1), 57–62
Brody BA (1998) The Ethics of Biomedical Research: An International Perspective, Oxford University Press
Byun CC and Hollander EJ (2015) ‘Explaining the Intensity of the Arab Spring’, Digest of Middle East
Studies 24(1), 26–46
Chew C and Eysenbach G (2010) ‘Pandemics in the Age of Twitter: Content Analysis of Tweets during
the 2009 H1N1 Uutbreak’, PloS ONE 5(11), 1–13
Christensen H, Batterham PJ and O’Dea B (2014) ‘E-health Interventions for Suicide Prevention’,
International Journal of Environmental Research and Public Health 11(8), 8193–212
Clarke R (2008) ‘Situational Crime Prevention’ in R Wortley and L Mazerolle (eds), Environmental
Criminology and Crime Analysis, Devon Willan Publishing, 178–95
Clement A (2014) ‘Canada’s Bad Dream’, World Policy Journal 31(3), 20–4
Creech B (2014) ‘Disciplines of Truth: The “Arab Spring”, American Journalistic Practice, and the
Production of Public Knowledge’, Journalism 2014, 1–17
Curtis BL (2014) ‘Social Networking and Online Recruiting for HIV Research: Ethical Challenges’,
Journal of Empirical Research on Human Research Ethics 9(1), 58–70
232 CURRENT ISSUES IN CRIMINAL JUSTICE VOLUME 27 NUMBER 2
Demetriou C and Silke A (2003) ‘A Criminological Internet “Sting” — Experimental Evidence of Illegal
and Deviant Visits to a Website Trap’, British Journal of Criminology 43(1), 213–22
Doob A and Roberts J (1983) Sentencing: An Analysis of the Public’s View of Sentencing, Department
of Justice Canda
Feldman R (2013) ‘Techniques and Applications for Sentiment Analysis’, Communications of the ACM
56(4), 82–9
Freeman Cook A and Hoas H (2013) ‘The Truth about the Truth: What Matters when Privacy and
Anonymity Can no Longer be Promised to Those who Participate in Clinical Trial Research?’, Research
Ethics 9(3), 97–108
Gelb K (2006) Myths and Misconceptions: Public Opinion Versus Public Judgment about Sentencing,
Melbourne Sentencing Advisory Council of Victoria
Glickman SW, Galhenage S, McNair L, Barber Z, Patel K, Schulman KA and McHutchison JG (2012)
‘The Potential Influence of Internet-Based Social Networking on the Conduct of Clinical Research
Studies’, Journal of Empirical Research on Human Research Ethics 7(1), 71–80
Goldberg AB and Zhu X (2006) ‘Seeing Stars When There Aren’t Many Stars: Graph-Based Semi-
Supervised Learning For Sentiment Categorization’ in Proceedings of the First Workshop On Graph
Based Methods for Natural Language Processing, 45–52
Grace E (2014) ‘Learning to Use the Internet and Online Social Media: What is the Effectiveness of
Homebased Intervention for Youth with Complex Communication Needs’, Child Language Teaching
and Therapy 30(2), 141–57
Granitz N and Koernig SK (2011) ‘Web 2.0 and Marketing Education Explanations and Experiential
Applications’, Journal of Marketing Education 33(1), 57–72
Gray CH and Gordo ÁJ (2014) ‘Social Media in Conflict: Comparing Military and Social-Movement
Technocultures’ Cultural Politics 10(3), 251–61
Green DA (2006) ‘Public Opinion Versus Public Judgment about Crime’, British Journal of
Criminology 46, 131–54
Groshek J and Al-Rawi A (2012) ‘Public Sentiment and Critical Framing in Social Media Content
During the US Presidential Campaign’, Social Science Computer Review 31(5), 563–76
Gwin J (2010) ‘Juror Sentiment on Just Punishment: Do the Federal Sentencing Guidelines Reflect
Community Values?’, Harvard Law & Policy Review 4, 173–200
Hall W, Prichard J, Kirkbride P, Bruno R, Thai PK, Gartner C, Lai FY, Ort C and Mueller JF (2012)
‘An Analysis of Ethical Issues in Using Wastewater Analysis to Monitor Illicit Drug Use’, Addiction
107, 1767–73
Hartz-Karp J, Anderson P, Gasti J and Felicetti A (2010) ‘The Australian Citizens Parliament: Forging
Shared Identity through Public Deliberation’, Journal of Public Affairs 10(4), 353–71
Hawthorne J, Houston JB and McKinney MS (2013) ‘Live-Tweeting a Presidential Primary Debate:
Exploring New Political Conversations’, Social Science Computer Review 31(5), 552–62
Hesse-Biber S and Griffin AJ (2013) ‘Internet-Mediated Technologies and Mixed Methods Research:
Problems and Prospects’, Journal of Mixed Methods Research 7(1), 43–61
NOVEMBER 2015 SOCIAL MEDIA SENTIMENT ANALYSIS 233
Indermaur D and Roberts L (2009) ‘Confidence in the Criminal Justice System’, Trends and Issues in
Crime and Criminal Justice 387, 1–6
Indermaur D, Roberts L, Spiranovic C, Mackenzie G and Gelb K (2012) ‘A Matter of Judgment:
The Effect of Information and Deliberation on Public Attitudes to Punishments’, Punishment & Society
14(2), 147–65
Isah H, Trundle P and Neagu D (2014) ‘Social Media Analysis for Product Safety Using Text Mining
and Sentiment Analysis’, 14th UK Workshop on Computational Intelligence, Bradford, 8–10 September
2014, 1–7
Jacobson WS and Tufts SH (2013) ‘To Post or Not to Post: Employee Rights and Social Media’, Review
of Public Personnel Administration 33(1), 84–107
Johansson F, Brynielsson J and Quijano MN (2012) ‘Estimating Citizen Alertness in Crises using Social
Media Monitoring and Analysis’, Paper presented at the European Intelligence and Security Informatics
Conference, Odense, 22–24 August 2012, 189–96
Krumpal I (2013) ‘Determinants of Social Desirability Bias in Sensitive Surveys: A Literature Review’,
Quality & Quantity 47(4), 2025–47
Layton R, Perez C, Birregah B, Watters P and Lemercier M (2013b) ‘Indirect Information Linkage for
OSINT through Authorship Analysis of Aliases’, Trends and Applications in Knowledge Discovery and
Data Mining, Berlin Heidelberg Springer, 36–46
Layton R, Watters P and Dazeley R (2010) ‘Authorship Attribution for Twitter in 140 Characters or
Less’, Paper presented at the Cybercrime and Trustworthy Computing Workshop, Ballarat, 19–20 July
2010, 1–8
Layton R, Watters P and Dazeley R (2013) ‘Automated Unsupervised Authorship Analysis Using
Evidence Accumulation Clustering’, Natural Language Engineering 19(1), 95–120
Layton R, Watters P and Ureche O (2013) ‘Identifying Faked Hotel Reviews Using Authorship
Analysis’, Paper presented at the Cybercrime and Trustworthy Computing Workshop, Ballarat, 19–20
July 2010, 1–6
Lewis P (2011) Reading the Riots Investigating England’s Summer of Disorder, The London School of
Economics and Political Science and The Guardian
Lomborg S (2015) ‘“Meaning” in Social Media’, Social Media + Society 1(1)
McAfee A and Brynjolfsson E (2012) ‘Big Data: The Management Revolution’, Harvard Business
Review 90, 60–6
McCrady BS, Ladd B, Vermont L and Steele J (2010) ‘Interviews’ in J Miller, J Strang and P Miller
(eds), Addiction Research Methods, Wiley-Blackwell, 109–25
McCue (2015) Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis,
Butterworth Heinemann
MacGibbon A and Phair N (2013) 2013 Australian Online Privacy Index, Canberra Centre for Internet
Safety
McQuade S (2009) ‘Cybercrime’ in M Tonry (ed) The Oxford Handbook of Crime and Public Policy,
475–98
234 CURRENT ISSUES IN CRIMINAL JUSTICE VOLUME 27 NUMBER 2
Marx GT (2013) ‘The Public as Partner? Technology Can Make Us Auxiliaries as Well as Vigilantes’
Security & Privacy 11(5), 56–61
Mayfield T (2015) #Keephopealive: How to Send Indonesia a Message, The Drum, 22 January 2015,
<http://www.abc.net.au/news/2015-01-23/mayfield-indonesia-death-penalty/6043022>
Meade A (2015) ‘Triple J Defends Poll Which Backed Death Penalty for Bali Nine Pair’, The Guardian
(online), 6 February 2015 <http://www.theguardian.com/world/2015/feb/06/triple-j-defends-poll-
which-backed-death-penalty-for-bali-nine-pair>
Meraz S (2009) ‘Is There an Elite Hold? Traditional Media to Social Media Agenda Setting Influence
in Blog Networks’, Journal of Computer-Mediated Communication 14(3), 682–707
Moore GE (1998) ‘Cramming More Components onto Integrated Circuits’, Proceedings of the IEEE
86(1), 82–5
Moorthy J, Lahiri R, Biswas N, Sanyal D, Ranjan J, Nanath K and Ghoshet P (2015) ‘Big Data:
Prospects and Challenges’, Vikalpa 40(1), 74–96
Moussa M (2015) ‘Monitoring Employee Behavior through the Use of Technology and Issues of
Employee Privacy in America’, Sage Open 5(2)
National Health and Medical Research Council (‘NHMRC’) (2007) National Statement on Ethical
Conduct in Human Research <https://www.nhmrc.gov.au/guidelines-publications/e72>
O’Connor KW and Schmidt GB (2015) ‘“Facebook Fired”: Legal Standards for Social Media-based
Terminations of K–12 Public School Teachers’, Sage Open 5(1)
Pang B and Lee L (2008) ‘Opinion Mining and Sentiment Analysis’, Foundations and Trends in
Information Retrieval 2(1–2), 1–135
Pang B, Lee L and Vaithyanathan S (2002) ‘Thumbs Up? Sentiment Classification Using Machine
Learning Techniques’, Proceedings of the ACL02 Conference on Empirical Methods in Natural
Language Processing 10, 79–86
Papathanassopoulos S (2015) ‘Privacy 2.0’, Social Media + Society 1(1)
Park SJ (2011) ‘Networked Politics on Cyworld: The Text and Sentiment of Korean Political Profiles’,
Social Science Computer Review 29(3), 288–99
Perone CS (2011) ‘Machine Learning: Text Feature Extraction (tf-idf) — Part I’ on Pyevolve
(18 September 2011) <http://blog.christianperone.com/?p=1589>
Pickett J, Mancini C and Mears D (2013) ‘Vulnerable Victims, Monstrous Offenders, and
Unmanageable Risk: Explaining Public Opinion on the Social Control of Sex Crime’, Criminology
51(3), 729–59
Potts L and Harrison A (2013) ‘Interfaces as Rhetorical Constructions: Reddit and 4chan during the
Boston Marathon Bombings’ in Proceedings of the 31st ACM International Conference on Design of
Communication, 143–50
Prensky M (2001) ‘Digital Natives, Digital Immigrants Part 1’, On the Horizon 9(5), 1–6
Price V and Niejens P (1998) ‘Deliberative Polls: Towards Improved Measures of “Informed” Public
Opinion’, International Journal of Public Opinion Research 10, 145–76
Qin J (2015) ‘Hero on Twitter, Traitor on News: How Social Media and Legacy News Frame Snowden’,
The International Journal of Press Politics 20(2), 166–84
NOVEMBER 2015 SOCIAL MEDIA SENTIMENT ANALYSIS 235
Wortley R (2012) ‘Situational Prevention of Child Abuse in the New Technologies’ in E Quayle and
K Ribisl (eds), Understanding and Preventing Online Sexual Exploitation of Children, Routledge,
188–204
Wu HC, Luk RWP, Wong KF and Kwok KL (2008) ‘Interpreting tf-idf Term Weights as Making
Relevance Decisions’, ACM Transactions on Information Systems 26(3), 13
Yankelovich D (2010) ‘How to Achieve Sounder Public Judgment’ in D Yankelovich and W Freidman
(eds), Toward Wiser Public Judgment, Vanderbilt University Press, 11–32
Zhang B and Vos M (2014) ‘Social Media Monitoring: Aims, Methods, and Challenges for International
Companies’, Corporate Communications 19(4), 371–83