Professional Documents
Culture Documents
A Computational Approach To Politeness With Application To Social Factors
A Computational Approach To Politeness With Application To Social Factors
Figure 1: Distribution of politeness scores. Posi- Table 2: The percentage of requests for which all
tive scores indicate requests perceived as polite. five annotators agree on binary politeness. The
4th quartile contains the requests with the top 25%
politeness scores in the data. (For reference, ran-
dard deviation of 0.7 for both domains; positive domized scoring yields agreement percentages of
values correspond to polite requests (i.e., requests <20% for all quartiles.)
with normalized annotations towards the “very po-
lite” extreme) and negative values to impolite re-
quests. A summary of all our request data is shown ary between somewhat polite and somewhat im-
in Table 1. polite requests can be blurry. To test this intuition,
we break the set of annotated requests into four
Inter-annotator agreement To evaluate the re- groups, each corresponding to a politeness score
liability of the annotations we measure the inter- quartile. For each quartile, we compute the per-
annotator agreement by computing, for each batch centage of requests for which all five annotators
of 13 documents that were annotated by the same made the same binary politeness judgment. As
set of 5 users, the mean pairwise correlation of the shown in Table 2, full agreement is much more
respective scores. For reference, we compute the common in the 1st (bottom) and 4th (top) quar-
same quantities after randomizing the scores by tiles than in the middle quartiles. This suggests
sampling from the observed distribution of polite- that the politeness scores assigned to requests that
ness scores. As shown in Figure 2, the labels are are only somewhat polite or somewhat impolite
coherent and significantly different from the ran- are less reliable and less tied to an intuitive notion
domized procedure (p < 0.0001 according to a of binary politeness. This discrepancy motivates
Wilcoxon signed rank test).5 our choice of classes in the prediction experiments
(Section 4) and our use of the top politeness quar-
Binary perception Although we did not im-
tile (the 25% most polite requests) as a reference
pose a discrete categorization of politeness, we
in our subsequent discussion.
acknowledge an implicit binary perception of the
phenomenon: whenever an annotator moved a
3 Politeness strategies
slider in one direction or the other, she made a
binary politeness judgment. However, the bound- As we mentioned earlier, requests impose on the
5
addressee, potentially placing her in social peril if
The commonly used Cohen/Fleiss Kappa agreement
measures are not suitable for this type of annotation, in which she is unwilling or unable to comply. Requests
labels are continuous rather than categorical. therefore naturally give rise to the negative po-
liteness strategies of Brown and Levinson (1987), ally associated with requests. First-person plural
which are attempts to mitigate these social threats. forms like we and our (line 15) are also ways of
These strategies are prominent in Table 3, which being indirect, as they create the sense that the
describes the core politeness markers we analyzed burden of the request is shared between speaker
in our corpus of Wikipedia requests. We do not and addressee (We really should . . . ). Though in-
include the Stack Exchange data in this analysis, directness is not invariably interpreted as polite-
reserving it as a “test community” for our predic- ness marking (Blum-Kulka, 2003), it is nonethe-
tion task (Section 4). less a reliable marker of it, as our scores indicate.
Requests exhibiting politeness markers are au- What’s more, direct variants (imperatives, state-
tomatically extracted using regular expression ments about the addressee’s obligations) are less
matching on the dependency parse obtained by the polite (lines 10–11).
Stanford Dependency Parser (de Marneffe et al., Indirect strategies also combine with hedges
2006), together with specialized lexicons. For ex- (line 19) conveying that the addressee is unlikely
ample, for the hedges marker (Table 3, line 19), to accept the burden (Would you by any chance
we match all requests containing a nominal subject . . . ?, Would it be at all possible . . . ?). These too
dependency edge pointing out from a hedge verb serve to provide the addressee with a face-saving
from the hedge list created by Hyland (2005). For way to deny the request. We even see subtle effects
each politeness strategy, Table 3 shows the aver- of modality at work here: the irrealis, counterfac-
age politeness score of the respective requests (as tual forms would and could are more polite than
described in Section 2; positive numbers indicate their ability (dispositional) or future-oriented vari-
polite requests), and their top politeness quartile ants can and will; compare lines 12 and 13. This
membership (i.e., what percentage fall within the parallels the contrast between factuality markers
top quartile of politeness scores). As discussed at (impolite; line 20) and hedging (polite; line 19).
the end of Section 2, the top politeness quartile Many of these features are correlated with each
gives a more robust and more intuitive measure of other, in keeping with the insight of Brown and
politeness. For reference, a random sample of re- Levinson (1987) that politeness markers are of-
quests will have a 0 politeness score and a 25% top ten combined to create a cumulative effect of in-
quartile membership; in both cases, larger num- creased politeness. Our corpora also highlight in-
bers indicate higher politeness. teractions that are unexpected (or at least unac-
Gratitude and deference (lines 1–2) are ways counted for) on existing theories of politeness. For
for the speaker to incur a social cost, helping to example, sentence-medial please is polite (line 7),
balance out the burden the request places on the presumably because of its freedom to combine
addressee. Adopting Kaplan (1999)’s metaphor, with other negative politeness strategies (Could
these are the coin of the realm when it comes to you please . . . ). In contrast, sentence-initial please
paying the addressee respect. Thus, they are indi- is impolite (line 8), because it typically signals a
cators of positive politeness. more direct strategy (Please do this), which can
make the politeness marker itself seem insincere.
Terms from the sentiment lexicon (Liu et al.,
We see similar interactions between pronominal
2005) are also tools for positive politeness, either
forms and syntactic structure: sentence-initial you
by emphasizing a positive relationship with the ad-
is impolite (You need to . . . ), whereas sentence-
dressee (line 4), or being impolite by using nega-
medial you is often part of the indirect strategies
tive sentiment that damages this positive relation-
we discussed above (Would/Could you . . . ).
ship (line 5). Greetings (line 3) are another way to
build a positive relationship with the addressee.
4 Predicting politeness
The remainder of the cues in Table 3 are neg-
ative politeness strategies, serving the purpose of We now show how our linguistic analysis can be
minimizing, at least in appearance, the imposition used in a machine learning model for automati-
on the addressee. Apologizing (line 6) deflects the cally classifying requests according to politeness.
social threat of the request by attuning to the impo- A classifier can help verify the predictive power,
sition itself. Being indirect (line 9) is another way robustness, and domain-independent generality of
to minimize social threat. This strategy allows the the linguistic strategies of Section 3. Also, by pro-
speaker to avoid words and phrases convention- viding automatic politeness judgments for large
Strategy Politeness In top quartile Example
Table 3: Positive (1-5) and negative (6–20) politeness strategies and their relation to human perception of
politeness. For each strategy we show the average (human annotated) politeness scores for the requests
exhibiting that strategy (compare with 0 for a random sample of requests; a positive number indicates
the strategy is perceived as being polite), as well as the percentage of requests exhibiting the respective
strategy that fall in the top quartile of politeness scores (compare with 25% for a random sample of
requests). Throughout the paper: for politeness scores, statistical significance is calculated by comparing
the set of requests exhibiting the strategy with the rest using a Mann-Whitney-Wilcoxon U test; for top
quartile membership a binomial test is used.
amounts of new data on a scale unfeasible for hu- We consider two classes of requests: polite
man annotation, it can also enable a detailed anal- and impolite, defined as the top and, respectively,
ysis of the relation between politeness and social bottom quartile of requests when sorted by their
factors (Section 5). politeness score (based on the binary notion of
politeness discussed in Section 2). The classes
Task setup To evaluate the robustness and are therefore balanced, with each class consisting
domain-independence of the analysis from Sec- of 1,089 requests for the Wikipedia domain and
tion 3, we run our prediction experiments on two 1,651 requests for the Stack Exchange domain.
very different domains. We treat Wikipedia as a We compare two classifiers — a bag of words
“development domain” since we used it for de- classifier (BOW) and a linguistically informed
veloping and identifying features and for training classifier (Ling.) — and use human labelers as a
our models. Stack Exchange is our “test domain” reference point. The BOW classifier is an SVM
since it was not used for identifying features. We using a unigram feature representation.6 We con-
take the model (features and weights) trained on sider this to be a strong baseline for this new
Wikipedia and use them to classify requests from
6
Stack Exchange. Unigrams appearing less than 10 times are excluded.
classification task, especially considering the large In-domain Cross-domain
amount of training data available. The linguisti- Train Wiki SE Wiki SE
cally informed classifier (Ling.) is an SVM using Test Wiki SE SE Wiki
the linguistic features listed in Table 3 in addition BOW 79.84% 74.47% 64.23% 72.17%
to the unigram features. Finally, to obtain a ref- Ling. 83.79% 78.19% 67.53% 75.43%
erence point for the prediction task we also collect
three new politeness annotations for each of the re- Human 86.72% 80.89% 80.89% 86.72%
quests in our dataset using the same methodology Table 4: Accuracies of our two classifiers for
described in Section 2. We then calculate human Wikipedia (Wiki) and Stack Exchange (SE), for
performance on the task (Human) as the percent- in-domain and cross-domain settings. Human per-
age of requests for which the average score from formance is included as a reference point. The ran-
the additional annotations matches the binary po- dom baseline performance is 50%.
liteness class of the original annotations (e.g., a
positive score corresponds to the polite class).
supporting the use of computational methods to
Classification results We evaluate the classi- pursue questions about social variables.
fiers both in an in-domain setting, with a standard
leave-one-out cross validation procedure, and in a 5.1 Relation to social outcome
cross-domain setting, where we train on one do- Earlier, we characterized politeness markings as
main and test on the other (Table 4). For both our currency used to pay respect. Such language is
development and our test domains, and in both the therefore costly in a social sense, and, relatedly,
in-domain and cross-domain settings, the linguis- tends to incur costs in terms of communicative ef-
tically informed features give 3-4% absolute im- ficiency (Van Rooy, 2003). Are these costs worth
provement over the bag of words model. While paying? We now address this question by studying
the in-domain results are within 3% of human per- politeness in the context of the electoral system of
formance, the greater room for improvement in the the Wikipedia community of editors.
cross-domain setting motivates further research on Among Wikipedia editors, status is a salient so-
linguistic cues of politeness. cial variable (Anderson et al., 2012). Administra-
The experiments in this section confirm that tors (admins) are editors who have been granted
our theory-inspired features are indeed effective in certain rights, including the ability to block other
practice, and generalize well to new domains. In editors and to protect or delete articles.7 Ad-
the next section we exploit this insight to automat- mins have a higher status than common editors
ically annotate a much larger set of requests (about (non-admins), and this distinction seems to be
400,000) with politeness labels, enabling us to re- widely acknowledged by the community (Burke
late politeness to several social variables and out- and Kraut, 2008b; Leskovec et al., 2010; Danescu-
comes. For new requests, we use class probabil- Niculescu-Mizil et al., 2012). Aspiring editors
ity estimates obtained by fitting a logistic regres- become admins through public elections,8 so we
sion model to the output of the SVM (Witten and know when the status change from non-admin to
Frank, 2005) as predicted politeness scores (with admins occurred and can study users’ language
values between 0 and 1; henceforth politeness, by use in relation to that time.
abuse of language). To see whether politeness correlates with even-
tual high status, we compare, in Table 5, the po-
5 Relation to social factors liteness levels of requests made by users who will
We now apply our framework to studying the rela- eventually succeed in becoming administrators
tionship between politeness and social variables, (Eventual status: Admins) with requests made by
focussing on social power dynamics. Encour- users who are not admins (Non-admins).9 We ob-
aged by the close-to-human performance of our serve that admins-to-be are significantly more po-
in-domain classifiers, we use them to assign po- 7
http://en.wikipedia.org/wiki/
liteness labels to our full dataset and then compare Wikipedia:Administrators
8
these labels to independent measures of power and http://en.wikipedia.org/wiki/
Wikipedia:Requests_for_adminship
status in our data. The results closely match those 9
We consider only requests made up to one month before
obtained with human-labeled data alone, thereby the election, to avoid confusion with pre-election behavior.
Eventual status Politeness Top quart. Successful candidates
0.46 Failed candidates
Admins 0.46** 30%***
pus to test our hypotheses and to apply precise Ide, 1989; Blum-Kulka and Kasper, 1990; Blum-
controls to our experiments (such as restricting Kulka, 2003; Watts, 2003; Byon, 2006). The start-
our analysis to question-askers in the reputation ing point for most research is the theory of Brown
experiment). In order to validate this methodol- and Levinson (1987). Aspects of this theory
ogy, we turned again to human annotation: we have been explored from game-theoretic perspec-
collected additional politeness annotation for the tives (Van Rooy, 2003) and implemented in lan-
types of requests involved in the newly designed guage generation systems for interactive narratives
experiments. When we re-ran our experiments on (Walker et al., 1997), cooking instructions, (Gupta
human-labeled data alone we obtained the same et al., 2007), translation (Faruqui and Pado, 2012),
qualitative results, with statistical significance al- spoken dialog (Wang et al., 2012), and subjectivity
ways lower than 0.01.12 analysis (Abdul-Mageed and Diab, 2012), among
others.
Prediction-based interactions The human val-
In recent years, politeness has been studied in
idation of classifier-based results suggests that
online settings. Researchers have identified vari-
our prediction framework can be used to explore
ation in politeness marking across different con-
differences in politeness levels across factors of
texts and media types (Herring, 1994; Brennan
interest, such as communities, geographical re-
and Ohaeri, 1999; Duthler, 2006) and between
gions and gender, even where gathering suffi-
different social groups (Burke and Kraut, 2008a).
cient human-annotated data is infeasible. We
The present paper pursues similar goals using or-
mention just a few such preliminary results here:
ders of magnitude more data, which facilitates a
(i) Wikipedians from the U.S. Midwest are most
fuller survey of different politeness strategies.
polite (when compared to other census-defined
regions), (ii) female Wikipedians are generally Politeness marking is one aspect of the broader
more polite (consistent with prior studies in which issue of how language relates to power and status,
women are more polite in a variety of domains; which has been studied in the context of workplace
(Herring, 1994)), and (iii) programming language discourse (Bramsen et al., ; Diehl et al., 2007;
communities on Stack Exchange vary significantly Peterson et al., 2011; Prabhakaran et al., 2012;
by politeness (Table 8; full disclosure: our analy- Gilbert, 2012; McCallum et al., 2007) and so-
ses were conducted in Python). cial networking (Scholand et al., 2010). However,
this research focusses on domain-specific textual
6 Related work cues, whereas the present work seeks to lever-
age domain-independent politeness cues, build-
Politeness has been a central concern of modern ing on the literature on how politeness affects
pragmatic theory since its inception (Grice, 1975; worksplace social dynamics and power structures
Lakoff, 1973; Lakoff, 1977; Leech, 1983; Brown (Gyasi Obeng, 1997; Chilton, 1990; Andersson
and Levinson, 1978), because it is a source of and Pearson, 1999; Rogers and Lee-Wong, 2003;
pragmatic enrichment, social meaning, and cul- Holmes and Stubbe, 2005). Burke and Kraut
tural variation (Harada, 1976; Matsumoto, 1988; (2008b) study the question of how and why spe-
12
cific individuals rise to administrative positions
However, due to the limited size of the human-labeled
data, we could not control for the role of the user in the Stack on Wikipedia, and Danescu-Niculescu-Mizil et al.
Exchange reputation experiment. (2012) show that power differences on Wikipedia
are revealed through aspects of linguistic accom- Philip Bramsen, Martha Escobar-Molana, Ami Patel,
modation. The present paper complements this and Rafael Alonso. Extracting social power rela-
tionships from natural language. In Proceedings of
work by revealing the role of politeness in social
ACL, pages 773–782.
outcomes and power relations.
Susan E Brennan and Justina O Ohaeri. 1999. Why
7 Conclusion do electronic conversations seem less polite? the
costs and benefits of hedging. SIGSOFT Softw. Eng.
We construct and release a large collection of Notes, 24(2):227–235.
politeness-annotated requests and use it to evalu-
Penelope Brown and Stephen C. Levinson. 1978.
ate key aspects of politeness theory. We build a Universals in language use: Politeness phenomena.
politeness classifier that achieves near-human per- In Esther N. Goody, editor, Questions and Polite-
formance and use it to explore the relation between ness: Strategies in Social Interaction, pages 56–311,
politeness and social factors such as power, status, Cambridge. Cambridge University Press.
gender, and community membership. We hope the Penelope Brown and Stephen C Levinson. 1987. Po-
publicly available collection of annotated requests liteness: some universals in language usage. Cam-
enables further study of politeness and its relation bridge University Press.
to social factors, as this paper has only begun to Moira Burke and Robert Kraut. 2008a. Mind your
explore this area. Ps and Qs: the impact of politeness and rudeness
in online communities. In Proceedings of CSCW,
Acknowledgments pages 281–284.
We thank Jean Wu for running the AMT an- Moira Burke and Robert Kraut. 2008b. Taking up the
notation task, and all the participating turkers. mop: identifying future wikipedia administrators. In
CHI ’08 extended abstracts on Human factors in
We thank Diana Minculescu and the anonymous
computing systems, pages 3441–3446.
reviewers for their helpful comments. This
work was supported in part by NSF IIS-1016909, Andrew Sangpil Byon. 2006. The role of linguistic in-
CNS-1010921, IIS-1149837, IIS-1159679, ARO directness and honorifics in achieving linguistic po-
liteness in Korean requests. Journal of Politeness
MURI, DARPA SMISC, Okawa Foundation, Do- Research, 2(2):247–276.
como, Boeing, Allyes, Volkswagen, Intel, Alfred
P. Sloan Fellowship, the Microsoft Faculty Fel- Paul Chilton. 1990. Politeness, politics, and diplo-
lowship, the Gordon and Dailey Pattee Faculty macy. Discourse and Society, 1(2):201–224.
Fellowship, and the Center for Advanced Study in Herbert H. Clark and Dale H. Schunk. 1980. Polite
the Behavioral Sciences at Stanford. responses to polite requests. Cognition, 8(1):111–
143.
Shoshana Blum-Kulka. 2003. Indirectness and po- Manaal Faruqui and Sebastian Pado. 2012. Towards a
liteness in requests: Same or different? Journal of model of formal and informal address in english. In
Pragmatics, 11(2):131–146. Proceedings of EACL, pages 623–633.
Elen P. Francik and Herbert H. Clark. 1985. How to Bing Liu, Minqing Hu, and Junsheng Cheng. 2005.
make requests that overcome obstacles to compli- Opinion Observer: analyzing and comparing opin-
ance. Journal of Memory and Language, 24:560– ions on the Web. In Proceedings of WWW, pages
568. 342–351.
Eric Gilbert. 2012. Phrases that signal workplace hier- Yoshiko Matsumoto. 1988. Reexamination of the uni-
archy. In Proceedings of CSCW, pages 1037–1046. versality of face: Politeness phenomena in Japanese.
Journal of Pragmatics, 12(4):403–426.
H. Paul Grice. 1975. Logic and conversation. In Pe-
ter Cole and Jerry Morgan, editors, Syntax and Se- Andrew McCallum, Xuerui Wang, and Andr’es
mantics, volume 3: Speech Acts, pages 43–58. Aca- Corrada-Emmanuel. 2007. Topic and role discovery
demic Press, New York. in social networks with experiments on Enron and
academic email. Journal of Artificial Intelligence
S Gupta, M Walker, and D Romano. 2007. How rude Research, 30(1):249–272.
are you?: Evaluating politeness and affect in inter-
Robert Munro, Steven Bethard, Victor Kuperman,
action. Affective Computing and Intelligent Interac-
Vicky Tzuyin Lai, Robin Melnick, Christopher
tion, pages 203–217.
Potts, Tyler Schnoebelen, and Harry Tily. 2010.
Samuel Gyasi Obeng. 1997. Language and politics: Crowdsourcing and language studies: the new gen-
Indirectness in political discourse. Discourse and eration of linguistic data. In Proceedings of the
Society, 8(1):49–83. NAACL HLT 2010 Workshop on Creating Speech
and Language Data with Amazon’s Mechanical
S. I. Harada. 1976. Honorifics. In Masayoshi Turk, pages 122–130.
Shibatani, editor, Syntax and Semantics, volume Kelly Peterson, Matt Hohensee, and Fei Xia. 2011.
5: Japanese Generative Grammar, pages 499–561. Email formality in the workplace: A case study on
Academic Press, New York. the enron corpus. In Proceedings of the ACL Work-
shop on Language in Social Media, pages 86–95.
Susan Herring. 1994. Politeness in computer cul-
ture: Why women thank and men flame. In Cul- Vinodkumar Prabhakaran, Owen Rambow, and Mona
tural performances: Proceedings of the third Berke- Diab. 2012. Predicting Overt Display of Power in
ley women and language conference, volume 278, Written Dialogs. In Proceedings of NAACL-HLT,
page 94. pages 518–522.
Janet Holmes and Maria Stubbe. 2005. Power and Po- Priscilla S. Rogers and Song Mei Lee-Wong. 2003.
liteness in the Workplace: A Sociolinguistic Analysis Reconceptualizing politeness to accommodate dy-
of Talk at Work. Longman, London. namic tensions in subordinate-to-superior reporting.
Journal of Business and Technical Communication,
Ken Hyland. 2005. Metadiscourse: Exploring Interac- 17(4):379–412.
tion in Writing. Continuum, London and New York.
Andrew J. Scholand, Yla R. Tausczik, and James W.
Sachiko Ide. 1989. Formal forms and discernment: Pennebaker. 2010. Social language network analy-
Two neglected aspects of universals of linguistic po- sis. In Proceedings of CSCW, pages 23–26.
liteness. Multilingua, 8(2–3):223–248.
Robert Van Rooy. 2003. Being polite is a handicap:
David Kaplan. 1999. What is meaning? Explorations Towards a game theoretical analysis of polite lin-
in the theory of Meaning as Use. Brief version — guistic behavior. In Proceedings of TARK, pages
draft 1. Ms., UCLA. 45–58.
Robin Lakoff. 1973. The logic of politeness; or, mid- Marilyn A Walker, Janet E Cahn, and Stephen J Whit-
ing your P’s and Q’s. In Proceedings of the 9th taker. 1997. Improvising linguistic style: social and
Meeting of the Chicago Linguistic Society, pages affective bases for agent personality. In Proceedings
292–305. of AGENTS, pages 96–105.
William Yang Wang, Samantha Finkelstein, Amy
Robin Lakoff. 1977. What you can do with words: Ogan, Alan W. Black, and Justine Cassell. 2012.
Politeness, pragmatics and performatives. In Pro- ”love ya, jerkface”: Using sparse log-linear mod-
ceedings of the Texas Conference on Performatives, els to build positive and impolite relationships with
Presuppositions and Implicatures, pages 79–106. teens. In Proceedings of SIGDIAL, pages 20–29.
Geoffrey N. Leech. 1983. Principles of Pragmatics. Richard J. Watts. 2003. Politeness. Cambridge Uni-
Longman, London and New York. versity Press, Cambridge.
Jure Leskovec, Daniel Huttenlocher, and Jon Klein- Ian H Witten and Eibe Frank. 2005. Data Mining:
berg. 2010. Governance in Social Media: A case Practical machine learning tools and techniques.
study of the Wikipedia promotion process. In Pro- Morgan Kaufmann.
ceedings of ICWSM, pages 98–105.