Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

This article has been accepted for publication in IEEE Intelligent Systems but has not yet been

fully edited.
1
Some content may change prior to final publication.

Revealing Psycholinguistic Dimensions of


Communities in Social Networks
Tushar Maheshwari, Aishwarya N. Reganti, Upendra Kumar, Tanmoy Chakraborty1 , Amitava Das
{tushar.m14,aishwarya.r14,upendra.k14,amitava.das}@iiits.in, 1 tanmoy@iiitd.ac.in
Indian Institute Of Information Technology Chittoor, Sri City, A.P. , India
1 Indraprastha Institute of Information Technology Delhi (IIIT-D), India

Abstract: A community in social network is composed of individuals with similar behavior. Although there
has been a plethora of work on understanding network topologies (edge density, clustering coefficient, etc.)
within a community, the semantic interpretation of a community has hardly been studied. The present paper
aims at understanding Personalities and Values of individuals in social communities. To this end, we collect
datasets from various social media platforms (including Facebook, Twitter), which contain Values and Ethics
of users. Then we design a three-fold experimental setup. First, we propose automatic models to determine
Personality and Values (Values and Ethics) of individuals by analyzing the kind of language used and their
conduct in social media. Secondly, various experiments are performed to understand the blend of characteristics
of individuals within a social network community. Finally, we claim that the detected Personality and Values
of individuals can be used further as additional node attributes to detect better community structure. To our
knowledge, this is the first computational analysis to understand and predict psycholinguistic dimensions of
individuals in social networks.
Index Terms—Personality, Values, social media, societal sentiment, community

1 I NTRODUCTION

D ETECTING and analyzing dense groups or commu-


nities from social networks has attracted immense
attention over the last decade. Several heuristics and al-
deavors to achieve them; Benevolence (BE): Those who
tend towards being benevolent are very philanthropic, they
seek to help others and provide general welfare; Confor-
gorithms were proposed to detect communities based on mity (CO): This category of people obey clear rules and
the topological structure of the network [1]. However, the structures; Hedonism (HE): Hedonists are those who simply
semantic interpretation of a community, i.e., the behavior of enjoy themselves; Power (PO): The ability to control others
individuals within a community has hardly been studied. is important to people who possess this value and power
This paper presents the first computational psycholinguistic will be actively sought by dominating others and control
study to understand the behavior of individuals forming over resources; Security (SE): Those who seek security
communities in social networks. We use two psycholinguis- value, health and safety to a greater extent than other people
tic models – Personality and Values models, to identify the (perhaps because of childhood woes); Self-direction (SD):
behavior of individuals. The Personality model is used to Individuals who are self-directed, enjoy being independent
understand the characteristics or blend of characteristics and are outside the control of others; Stimulation (ST):
at individual level, whereas the Values model is used to It is closely related to hedonism, nevertheless the goals
analyze inter-personal dynamics of societal sentiment. are slightly different. In this case, pleasure is acquired
The Big 5 Personality traits [2], aka the five factor model specifically from excitement and thrill; Tradition (TR): A
(FFM), is a widely used Personality model. The five factors traditionalist respects practices of the past, doing things
are: Openness (O): A personality trait possessed by in- blindly because they are customary; Universalism (UN):
dividuals who are imaginative, insightful and have wide Individuals who are universal, seek social justice and tol-
interests; Conscientiousness (C): Refers to those who are erance for all.
organized, thorough, and planned; Extroversion (E): Refers Shwartz, along with the identification ten basic Values,
to Personality of those who are talkative, energetic, and also explains how these Values are related to each other and
assertive; Agreeableness (A): Individuals with this Per- influence one another, since individuals possessing any of
sonality trait are sympathetic, kind, and affectionate; and the Values may also possess values in accord with another
Neuroticism (N): Individuals who are mostly tense, moody, (e.g., Conformity and Tradition) or contrary with at least
and anxious. Big 5 model is also represented using the one other Value (e.g., Benevolence and Achievement). Such
acronym OCEAN. coinciding nature psychological classes makes the compu-
To define societal sentiment, we use the well-established tational classification problem much more challenging than
“Schwartz Theory of Basic Human Values” [3], which de- the typical sentiment analysis problem.
fines ten basic and distinct personal Values – Achievement In this paper, we raise a fundamental question in order
(AC): Achievers sett goals and then make all possible en- to understand the behavioural characteristics of individuals

Digital Object Identifier 10.1109/MIS.2018.111144400 1541-1672/$26.00 2018 IEEE


This article has been accepted for publication in IEEE Intelligent Systems but has not yet been fully edited.
2
Some content may change prior to final publication.
in social communities – what are the Personality and Values consider social dimensions of individuals such as occupa-
classes of individuals forming communities in social networks, tion, gender, age, locality etc. as node attributes for the
and how do these differ across multiple communities? To this algorithms. We are the first to explore the psycholinguistic
end, three corpora have been utilised (one collected by us behavior of individuals ( For further details refer to our
and two publicly available): (i) a Twitter corpus, where paper [11]) and augment them as additional features into
users are marked with their corresponding Values, (ii) a the community detection algorithm to show how it helps
Facebook corpus containing user Personality markings [4], in detecting more accurate communities from the networks
and (iii) another large-scale Twitter network with ground- (see Section 6).
truth communities. Three kinds of experiments are designed
over these corpora. First, systems are developed to auto-
3 C ORPUS ACQUISITION
matically determine Personality and Values of individuals
by analysing the kind of language they use and behaviour At the very beginning of our research work, we ask our-
in social media (Section 4). Second, a detailed analysis is selves an elementary question- a self-interrogation - If social
conducted to understand the dynamics of societal sentiment media is a good representative of the original society that exists
within a community (Section 5). Finally, we claim that ?, in order to understand whether Personality and Values of
the detected Personality and Values of individuals can be human beings get reflected in their social media behaviour.
used further as additional node attributes to detect better To the best of our knowledge, there is no prior work
community structure (Section 6). on understanding Values from social media behaviors. We
were motivated by a similar work [12]. In this paper, our
endeavour is to understand the behavioural characteristics
2 R ELATED W ORK of individuals in social communities: What are the Personality
The Big 5 Personality model [2] and Schwartz’ Values model and Values classes of individuals forming communities in social
[3] can be considered as a person level sentiment model networks, and how do these differ across multiple communities?
and a societal sentiment model, respectively. Going one To this end, three corpora have been utilised – one collected
step further, we here attempt to understand the societal by us and two publicly available: (i) A Twitter corpus, where
sentiment of groups of individuals forming communities in users are marked with their corresponding Values, (ii) A
social networks [5]. Facebook corpus containing user Personality markings [4],
Intensive analysis of human beliefs which also includes and (iii) Another large-scale Twitter network with ground-
their values, ethics, the kind of attitudes they possess has truth communities.
been a cardinal research path in both Psychology and Soci- Two kinds of experiments were designed using these
ology for many years now. corpora. First, systems were developed to automatically de-
According to the Schwartz 10-Values model, ten basic termine Personality and Values of individuals by analysing
Values denote numerous consequences and results of an the kind of language they use and conduct in social media.
individual’s role in a society [6]. The Values have also been Second, a detailed analysis was conducted to understand
able to successfully provide a strong and valid explanation the dynamics of societal sentiment - involving Personality
of consumer conduct/behaviour and how these values af- and Values of individuals within a community.
fect it [7].
Over the recent years, there have been few initiatives 3.1 Twitter Values Corpus
to automatically identify various Personality traits of indi-
viduals from the use of language and their behaviour in The standard method for any psychological data col-
social media; one such initiative was the Workshop and lection is through self-evaluative psychometric tests.
Shared Task on Computational Personality Recognition held Self-evaluations were obtained using a fifty-item long
in 20131 , and then again in 20142 (see Section 3.2 for more male/female version of the Portrait Values Questionnaire
details). However, to the best of our knowledge, no auto- (PVQ). We crowd-sourced the data using the Amazon Me-
matic predictive model for Schwartz’ Values has been built or chanical Turk (AMT) service. A 50 item PVQ questionnaire
tested before. In recent times, proliferation of social media was given to people and we requested them to - (i) attempt
and the ever-growing collection of publicly accessible web the PVQ questions sincerely, and (ii) provide us with their
data continually provides new and exciting opportunities to Twitter ids so that their tweets could be scraped with due
study how people are thinking, behaving, and feeling, and permission. However, we faced a lot a challenges while
is the main motivation of our present research. collecting the data. For example, many users had set their
Community boundary detection is a fundamental prob- accounts to protected, which hindered our data collection
lem in network analysis, which has been extensively studied through the Twitter API. Also, we discarded quite some
during last one decade (see the detailed reviews in [8]). Most users as they had posted fewer than 100 tweets (according
popular algorithms consider only network information for to the statistics, around 44% of the Twitter users created
community detection. However, recently there has been an account, but never sent a tweet), making them insignif-
a claim that augmenting node and edge attributes along icant for statistical analysis. Finally, we manage to collect
with the network topology increases the performance of the information of 368 unique users. The highest, lowest and
algorithms [9], [10]. However, all these approaches mostly average number of tweets per user was 15K, 100 and 1,608
respectively.
1. http://mypersonality.org/wiki/doku.php?id=wcpr13 In addition to identifying the basic Personality traits and
2. https://sites.google.com/site/wcprst/home/wcpr14 Schwartz Values types, both the models also explain how

Digital Object Identifier 10.1109/MIS.2018.111144400 1541-1672/$26.00 2018 IEEE


This article has been accepted for publication in IEEE Intelligent Systems but has not yet been fully edited.
3
Some content may change prior to final publication.
dataset by crawling the tweets of each user, required for
our Personality and Values models. The original dataset
had 18,021 users and 5,038 communities. We discarded all
the communities with size less than five and considered
1, 562 remaining ground-truth communities. We further dis-
carded all the non-existent accounts and those users who
had posted fewer than 100 tweets. We downloaded total
6,768 users’ tweet data. The highest (resp. lowest) number
of tweets for a user was 3,641 (resp. 100) with an average
number tweets per user being 2,406.

4 C OMPUTATIONAL P ERSONALITY AND VALUES


M ODELS
Here, we discuss various features and methods used for
the automatic Personality and Values identification. We
experimented with several machine Learning algorithms –
Support Vector Machine (SVM), Multinomial Naive Bayes
Fig. 1. Schwartz Values fuzziness for Twitter Values corpus using
Circos Visualisation. In this figure, for example the fuzzy membership (mNM), Simple Logistic Regression (LR), and Random For-
of the ACY oriented people is represented by outgoing red bands. est (RF). Among them, SVM (with linear kernel) turned out
The width of each outgoing band from ACY represents the degree of to be the best performing method. A binary SVM classifier
membership of ACY with other Values classes. There are 10 incoming
is trained, each for a particular Personality and Values type
bands of 10 different colours towards ACY, indicating the membership
of each class in ACY. In each class there is a self-arc representing separately.
membership of each class with itself (i.e., 100%). Psycholinguistic Lexica: We use four different Psy-
cholinguistic lexicon based features: Linguistic Inquiry
these classes are mutually connected influence on another as
Word Count (LIWC)4 , Harvard General Inquirer5 , MRC
well, since the pursuit of any of the Personality/Values may
psycholinguistic database6 , and Sensicon7 . LIWC is a dili-
result in accord or converse with another class. For example,
gently developed and popular hand-crafted lexicon. It con-
anyone having Open personality can have Agreeable nature
tains 69 different categorical words, specifically designed
as well; similarly someone with Power orientation can also
for psycholinguistic experiments. The Harvard General In-
have Achievement orientation. To understand this notion,
quirer is yet another psycholinguistic lexicon that contains
the fuzzy membership statistics from the Twitter Values cor-
182 categories. The words in the lexicon are also tagged
pus is presented through a Circos visualization in Figure 1.
with two large valence categories, namely, negative and
Some interesting observations from Figure 1 are as follows
positive among other psycholinguistic categories, such as
– Power (POY) oriented people have highest overlap with
words of pain, pleasure, virtue and vice, words symbolising
Achievement (ACY) oriented people, tradition oriented peo-
understatement, overstatement, etc. Following our earlier
ple have highest overlap with people high in conformity,
research [11], we also considered 14 different features from
and Hedonic (HEY) people have significant overlap with
the MRC Psycholinguistic lexicon. These features were rat-
people seeking Stimulation (STY). Such overlapping nature
ings of Familiarity, Concreteness, Kucera-Francis frequency,
of psychological classes makes the computational classifi-
number of phoneme and syllables, Brown verbal frequency,
cation problem much more challenging than the classical
imagability and age of acquisition8 . Sensicon is a sensorial
sentiment analysis problem.
lexicon, containing words with sense association scores for
3.2 Facebook Personality Corpus
5 human senses which are Smell, Sight, Taste, Hearing, and
We used the corpus that was released during “Workshop Touch. Sensicon provides a numerical membership which
and Shared Task on Computational Personality Recogni- indicates the extent to which a sense, among 5 senses that
tion” (WCPR) in 2013, they published a Facebook corpus humans possess is used to discern a certain concept. We
which contains 250 users data includes 10,000 Facebook use this lexicon into both the models. We perform feature
status updates and their social network properties, marked ablation using Pearson correlation of lexicon features vs
with their Personality attributes. Among 8 teams partic- Personality and Values types. Finally, classifiers are trained
ipated in this shared task, the best performing team [4] using only those features which positively contribute to a
achieved F-Score (an average over all the personality traits) particular Personality/Values type.
of 0.73. In Section 4.1, we show that our proposed system Non-linguistic Features: Features such as social network
beats their system with an average 0.80 F-Score. structures can aid find the underlying Personality and Val-
ues of an individual. Facebook network properties including
3.3 Twitter Community Dataset
4. http://www.liwc.net/
We use the Twitter network, released by SNAP3 (nodes: 5. http://www.wjh.harvard.edu/ inquirer/
81,306, edges: 1,768,149). Users are distinguished by their 6. http://www.psych.rl.ac.uk/
Twitter id’s. This dataset has been widely used in com- 7. https://hlt-nlp.fbk.eu/technologies/sensicon
munity detection studies [10]. We further enriched the 8. To get these MRC features, we use the following API: http://ota.
oucs.ox.ac.uk/headers/1054.xml.
3. http://snap.stanford.edu/

Digital Object Identifier 10.1109/MIS.2018.111144400 1541-1672/$26.00 2018 IEEE


This article has been accepted for publication in IEEE Intelligent Systems but has not yet been fully edited.
4
Some content may change prior to final publication.
TABLE 1. Speech-act class distributions in corpus along with the (a) performance and (b) error-rate of the speech-act classifier.

Speech-act SNO Wh YN SO AD YA T AP RA A O
Avg.
(Class distributions in %) (33.37) (11.45) (15.45) (5.16) (6.88) (15.08) (0.41) (3.26) (0.71) (0.07) (14.59)
(a) F-Score 0.45 0.88 0.88 0.72 0.45 0.60 0.72 0.60 0.12 0.77 0.12 0.69
(b) Error-Rate (in % ) 4.1 1.1 1.6 2.1 3 1.6 1.2 3.7 3.9 2.3 3.6 2.65

TABLE 2. Performance of (a) Personality and (b) Values models. TABLE 3. Performance (Pearson correlation) of the Personality regres-
sion model.
Personality AC BE CO HE PO
F-Score 0.81 0.81 0.75 0.74 0.82 Values O C E A N Avg.
(a)
Personality SE SD ST TR UN PC 0.37 0.37 0.39 0.36 0.41 0.38
F-Score 0.80 0.78 0.73 0.80 0.89
TABLE 4. Performance (Pearson correlation) of the Values regression
Values O C E A N
(b) model.
F-Score 0.83 0.78 0.80 0.79 0.75
Values AC BE CO HE PO SE SD ST TR UN Avg.
PC 0.32 0.21 0.21 0.25 0.28 0.32 0.32 0.27 0.35 0.34 0.29
betweenness, transitivity, centrality, network size and den-
sity were calculated. These values have been provided as a
part of the released Facebook Personality corpus. systems were trained on Facebook and Quora corpus, then
Some features distinctive of the Twitter Values Corpus applied on Twitter; the Personality classifier was trained on
were- total number of tweets per user, number of likes per a Facebook corpus, and applied on Twitter.
tweet, average time difference between two tweets, number For the Personality analysis there is no publicly available
of favourites and re-tweets. The centrality scores on the Twitter dataset; therefore it is difficult to assess transfer-loss
networks of the individuals’ followers and friends were also for the Personality analysis. For Values model our classifier
used. is trained and tested on Twitter corpus.
Speech-act Features: The communication patterns of In order to examine the transfer loss for the Speech
human beings – visually, verbally or via texual conversa- Act classifier on Twitter corpus, we have further annotated
tion, might be indicative enough of their Personality and a small corpus consisting of 1,100 tweets - 100 for each
Values traits. By following the hypothesis of [13], speech-act class. We ran our previous speech act classifier – which
features were added to our feature list for the classification was trained on Facebook and Quora data on the newly
of Personalities Values. However, for this experiment we annotated corpus. We report the error rate in Table 1(b). We
restrict speech-act classes into 11 major categories: Yes-No observe that the average error-rate is as low as 2.65 % which
Question (YN), Action Directive (AD), Appreciation (AP), concludes that the Speech Act Classifier can be used for the
Wh Question (Wh), Statement Non-Opinion (SNO), Re- classification of tweets.
sponse Acknowledgement (RA), Yes Answers (YA), Thank- 4.1 Evaluation of Personality and Values Models
ing (T), Apology (A), Statement Opinion (SO) and others
Identifying the Personality and Values models can be ap-
(O). We scraped about 7K utterances from Facebook and
proached both as a multi-class problem (preferred by com-
Quora pages, they were also annotated by us manually.
putational people) and as a multi-valued regression problem
Using this corpus, we build a Support Vector Machine
(preferred by psychologists). Here the former view is han-
based speech-act classifier. The features used were : top
dled by three traditional types of classifiers, namely Support
20% most frequent word bigrams (Bag of Words), exis-
Vector Machines (SVM, Linear Kernel), Multinomial Naı̈ve
tence of question marks, existence of “wh” words, POS
Bayes (MNB), and Random Forests (RF), for which each Per-
tags distributions, occurrence of “thanks/thanking” words
sonality/Value type is subdivided into Yes class (positively
and sentiment lexicons such as NRC sentiment lexicon9 ,
orientated), and No class (negative oriented), resulting 10
SentiWordNet10 and WordNet Affect11 . With 10-fold cross
classes (5 × 2) for the Personality classifier and 20 classes
validation, an average F-score of 0.69 was obtained. The
(10 × 2) for Values classifier. The multi-valued regression
corpus distribution, category wise and the performance of
view is handled by Simple Logistic Regression (LR), using
the classifier are elucidated in Table 1.
real Personality and Values scores (scaled to the interval
Classifying social media conversational texts into spe-
[0, 1]). We observe that our SVM-based Personality model
cific speech-act classes is still an under-research problem,
achieves an average F-Score of 0.80, outperforming the best
which is a separate research problem by its own right. We
competing system [4] (F-Score of 0.73) as reported in Figure
do realize that the developed speech-act classifier has not
2. Our Values model obtains an average F-score of 0.82.
reached the de-facto status, but still the user specific speech-
Class-wise performance of both the models is reported in
act distributions (in %) have been used as features for the
Table 2.
psycholinguistic classifiers (Adding 11 new features) and
In regression setup, the average correlation coefficient
We find a marked performance improvement of 6.12% (F-
for Personality and Values models achieve average cor-
Score) on the Twitter Values corpus.
relation (Pearson) of 0.38 and 0.29 respectively as shown
Transfer learning: Another question might be raised
in Tables 3 and 4. Respective correlation scores may be
here – is there any loss in transfer learning as many of the
considered as moderate.
classifiers are applied across domains - e.g., the dialog act
9. http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon. 5 U NDERSTANDING THE S OCIETAL S ENTIMENT
htm
10. http://sentiwordnet.isti.cnr.it/ Once the automatic Personality and Values models have
11. http://wndomains.fbk.eu/wnaffect.html been produced, they can be applied to the SNAP Twitter

Digital Object Identifier 10.1109/MIS.2018.111144400 1541-1672/$26.00 2018 IEEE


This article has been accepted for publication in IEEE Intelligent Systems but has not yet been fully edited.
5
Some content may change prior to final publication.

Fig. 3. Heatmap representing cross-relational entropy scores among


different (a) Personality vs Personality and (b) Values vs Values.

There are several interesting observations in these fig-


ures. For example, in Figure 3(b), one can observe that the
Fig. 2. Performance of Personality and Values Models with different Security people tend to feel uncomfortable with achieve-
feature sets added incrementally. ment oriented group, as Security (SE) oriented people al-
community dataset (see Section 3.3). To understand whether ways want to be safe, and not very eager to break societal
people within the same social network community con- rules. Achievement (AC) oriented people, on the other hand,
ceive homogeneous characteristics w.r.t. their Personality are always very eager to achieve their personal ambitions,
and Values, we measure randomness for each dimension and are ready to take any risks to achieve their goals
separately using Shannon’s Entropy. It is a measure of the [14]. Another interesting observation is that Traditional (TR)
uncertainty. Higher entropy scores indicate more diversity people always find difficulties to stay inside other types
(low similarity). The analysis of entropy scores in each of communities. The reverse is also true – other people
community would in turn reveal the semantic interpretation also find difficulties to join TR oriented groups (row TR),
of societal sentiment. resulting in very low entropy scores for the whole row.
As stated before, Personality and Values categories are Similar trends can be seen for power (PO) groups, and for
interconnected to each other and influence one another, the conscientiousness (C) vs extroversion (E) personalities
since the pursuance of any of the Personality/Values types in Figure 3(a) [14]. On the other hand, self-direction (SD)
results, either in accord with another value (e.g., conformity people find it very hard to fit into a conformity (CO) group
and tradition) or an obverse with an one other value (e.g., as SD (Self-Direction) oriented people want to lead life on
benevolence and achievement). We recall that both the their own rules. A few scaling factors are also interesting.
psychology models — Personality and Values models — For example, in a group of agreeable personalities (row 4
support fuzzy membership. Therefore, the goal is to un- in Figure 3(a)), open people (col. 1) find themselves quite
derstand if certain communities with lesser entropy scores comfortable, whereas the reverse is not true.
for at least one or more Personalities/Values dimensions Comparison with Random Users: One might further
could be treated as homogeneous. Of course, one can argue ask whether a population drawn randomly possesses the
that power oriented people can make friends or form a similar psycho-sociological properties as that of the nat-
community with hedonic or universal people, and similarly urally formed population (such as in the SNAP dataset).
that extroverts can handle neurotic friends. We certainly Reports of psycho-sociological analysis always depends on
believe that such cross categorical relationships exist in real how the target population is chosen. To understand this
life; but here our aim is to understand whether homogeneity notion empirically, we perform a random sample test. For
is perceived to a large extent within these societal commu- the experiment we chose 5,000 Twitter users and created a
nities. random set. This experiment was repeated 20 times. The
For both Personality and Values models, we show the psycho-sociological traits of those random samples with
cross-entropy using heat-maps in Figures 3(a) and 3(b) users present in the SNAP dataset is reported in Figure 4
respectively. In these figures, red shades represent higher (for the sake of brevity we only report curves corresponding
entropy score, blue shades represent lower entropy scores. to 5 random samples along with the SNAP dataset). We ob-
To understand the natural preference of people belonging serve that the obtained Personality and Values distributions
to same community could be understood from these tables. in each set largely differ from each other. Moreover, the
For example, the first row in Figure 3(b) and Figure 3(a) difference is statistically significant according to the t-test
represent all the communities having less entropy scores for with 95% confidence interval.
achievement and openness respectively. Less entropy scores
indicate higher similarity.
The obtained entropy scores differ greatly across com-
6 C OMMUNITY D ETECTION USING P ERSONALITY
munities. Therefore, the entropy scores are normalised as
follows to keep the range between 0 and 1: xscaled = AND VALUES
(x−xmin ) So far, we have analyzed Values and Personality of indi-
xmax −xmin . We further normalize the entropy scores based
on the community size. Finally, communities below a ex- viduals belonging to different network communities. We
perimentally chosen threshold (median of xscaled value) are hypothesize that these two features can be used as node-
considered for analysis. centric attributes to detect better community structure in the

Digital Object Identifier 10.1109/MIS.2018.111144400 1541-1672/$26.00 2018 IEEE


This article has been accepted for publication in IEEE Intelligent Systems but has not yet been fully edited.
6
Some content may change prior to final publication.
7 C ONCLUSION & F UTURE D IRECTIONS
In this paper, we have attempted to establish a correlation
between the Personality and Values of individuals belong-
ing to the same community in social networks. In particular,
the contributions of our paper are fivefold: (i) development
of Schwartz’ Values Twitter corpus, which we will release
soon for research purpose; (ii) design of automatic systems
to categorize users based on their Values; (iii) development
of advanced models to identify users’ personality traits; (iv)
understand the semantic interpretation of social communi-
ties, which to the best of our knowledge is addressed for the
first time in this paper, and (v) augmentation of Personality
and Values of individuals as node attributes along with the
network information in order to detect better community
Fig. 4. Consistency check of the psycho-sociological models by com-
structure. However, the terse nature of social media text is a
paring 5 random population along with SNAP. The value of both problem for any system development. In future, we would
Personality and Values traits are quite different for different population. like to study the demographic psycholinguistic variations
across social network communities. Moreover, we are keen
network. Here we use CESNA [10] and T-CME [15] – two to apply the psycholinguistic models for other applications
state-of-the-art algorithms which consider both network and such as Internet advertising, surveillance over social media
node attributes for community detection. for counter-terrorism, etc.
Both the algorithms allows us to control the weights of
R EFERENCES
network and node features. We consider the Twitter net-
work and three sets of features – the network information, [1] T. Chakraborty, A. Dalmia, A. Mukherjee, and N. Ganguly,
“Metrics for community analysis: A survey,” arXiv preprint
the Personality and Values features. We run them with arXiv:1604.03512, 2016.
Values and Personality feature sets separately along with [2] L. R. Goldberg, “An alternative “description of personality”: the
network information. We use real values obtained from big-five factor structure,” Journal of personality and social psychology,
vol. 59, no. 6, p. 1216, 1990.
the regression models directly into the feature vector. We
[3] S. H. Schwartz, “Universals in the content and structure of val-
report the accuracy of CESNA and T-CME for individual ues: theoretical advances and empirical tests in 20 countries,” in
feature settings separately in Table 5. Finally, we combine Advances in Experimental Social Psychology (M. Zanna, ed.), vol. 25,
all these features together with equal weights given to node pp. 1–65, New York: Academic Press, 1992.
[4] B. Verhoeven, W. Daelemans, and T. De Smedt, “Ensemble meth-
attributes and network information and report the accuracy ods for personality recognition,” in Proceedings of WCPR13, in
in Table 5. conjunction with ICWSM-13, 2013.
Community Evaluation Metrics. Since we know the [5] T. Maheshwari, A. N. Reganti, T. Chakraborty, and A. Das, “Socio-
ground-truth community structure of the Twitter network ethnic ingredients of social network communities,” in CSCW,
pp. 235–238, 2017.
as discussed in Section 3.3, we compare the detected com- [6] A. Argandoña, “Fostering values in organizations,” Journal of
munity structure with the ground-truth using the following Business Ethics, vol. 45, no. 1–2, pp. 15–28, 2003.
standard metrics – Normalized Mutual Information (NMI), [7] L. R. Kahle, S. E. Beatty, and P. Homer, “Alternative measurement
approaches to consumer values: The list of values (lov) and values
Adjusted Rand Index (ARI), Purity (PU). The value of the and life style (vals),” Journal of consumer research, pp. 405–409, 1986.
first three metrics ranges between (−1) and 1. The more [8] S. Fortunato, “Community detection in graphs,” Physics Reports,
the value of the metrics, the better the detected community vol. 486, no. 3-5, pp. 75 – 174, 2010.
structure resembles with the ground-truth. [9] S. Zhu, K. Yu, Y. Chi, and Y. Gong, “Combining content and link
for classification using matrix factorization,” in SIGIR, (New York,
Results. Table 5 presents the accuracy of both the algo- USA), pp. 487–494, ACM, 2007.
rithms for different feature sets. Overall, we observe that [10] J. Yang, J. J. McAuley, and J. Leskovec, “Community detection in
considering the node attributes along with the network networks with node attributes,” in ICDM, pp. 1151–1156, 2013.
[11] T. Maheshwari, A. N. Reganti, S. Jamatia, A. Gupta, U. Kumar,
features always improves the performance of the algorithms B. Gambäck, and A. Das, “A societal sentiment analysis: Predicting
as opposed to considering only the network information. the values and ethics of individuals by analysing social media
Considering all the features, CESNA (T-CME) achieves 7% content,” in EACL, 2017.
(8.6%), 11.41% (11.11%) and 9.23% (7.35%) higher NMI, [12] M. D. Back, J. M. Stopfer, S. Vazire, S. Gaddis, S. Schmukle,
B. Egloff, and S. D. Gosling, “Facebook profiles reflect actual
ARI and PU respectively compared to the case when only personality, not self-idealization,” Psychological Science, vol. 21,
network features are used. This result corroborates with our pp. 372–374, 2010.
hypothesis that addition of node attributes indeed enhances [13] D. S. Appling, E. J. Briscoe, H. Hayes, and R. L. Mappus, “Towards
automated personality identification using speech acts,” in Seventh
the performance of the community detection algorithms. International AAAI Conference on Weblogs and Social Media, 2013.
[14] T. Maheshwari, A. N. Reganti, U. Kumar, T. Chakraborty, and
TABLE 5. The accuracy of T-CME and CESNA (within parenthesis) A. Das, “Semantic interpretation of social network communi-
with different feature sets. ties,” in AAAI, February 4-9, 2017, San Francisco, California, USA.,
Sl. # Used features NMI ARI PU pp. 4967–4968, 2017.
(a) Network information 0.57 (0.58) 0.61 (0.63) 0.65 (0.68) [15] L. M. Smith, L. Zhu, K. Lerman, and A. G. Percus, “Partitioning
(b) (a) + Value feature 0.57 (0.61) 0.61 (0.62) 0.66 (0.68) networks with node attributes by compressing information flow,”
(c) (b) + Personality feature 0.59 (0.60) 0.64 (0.65) 0.69 (0.70)
ACM TKDD, vol. 11, no. 2, pp. 15:1–15:26, 2016.
(d) All 0.61 (0.63) 0.68 (0.70) 0.71 (0.73)

Tushar Maheshwari is an undergraduate student at Department of


Computer Science and Engineering of the Indian Institute of Information
Digital Object Identifier Technology, Sri 1541-1672/$26.00
10.1109/MIS.2018.111144400 City, India. 2018 IEEE
This article has been accepted for publication in IEEE Intelligent Systems but has not yet been fully edited.
7
Some content may change prior to final publication.
Aishwarya N. Reganti is an undergraduate student at Departmen of
Computer Science and Engineering of the Indian Institute of Information
Technology, Sri City, India.

Upendra Kumar is an undergraduate student at Department of Com-


puter Science and Engineering of the Indian Institute of Information
Technology, Sri City, India.

Tanmoy Chakraborty is an Assistant Professor at Department of Com-


puter Science and Engineering, Indraprastha Institute of Information
Technology Delhi, India. His broad research interests are Social Network
Analysis, Data Mining and NLP.

Amitava Das is an Assistant Professor at Department of Computer


Science and Engineering of the Indian Institute of Information Tech-
nology, Sri City, India. His research interests broadly span three areas
and more specifically their intersection: Natural Language Processing,
Social Media Analysis, and Computational Creativity.

Digital Object Identifier 10.1109/MIS.2018.111144400 1541-1672/$26.00 2018 IEEE

You might also like