Professional Documents
Culture Documents
Personality Recognition Applying Machine Learning
Personality Recognition Applying Machine Learning
Personality Recognition Applying Machine Learning
Hugo A. Castellanos
Universidad Nacional de Colombia
Bogotá, Colombia
hacastellanosm@unal.edu.co
−0.01
−0.08 Table 6: Results over test data with SVR using met-
ric averages as input
−0.09
2
-6
2
-5
2
-4
2
-3
2
-2
2
-1
2
0
2
1
2
2
2
3
2
4
2
5
Personality trait MSE PC
Emotional stability 10.24 0.03
Extroversion 9.01 0.01
Figure 2: Variation of γ parameter in SVR versus Openness to experience 7.34 0.3
the resulting error in cross validation with the best Agreeableness 9.36 0.01
C and parameters. Conscientiousness 9.99 -0.25
with other participant results, and showing better results [4] M. H. Halstead. Elements of Software Science
than the baseline in submissions 2 and 3. Conscientiousness (Operating and Programming Systems Series). Elsevier
followed with the best error for the SVR and Extra Tree Science Inc., New York, NY, USA, 1977.
Regressor. [5] D. I. Holmes and F. J. Tweedie. Forensic Stylometry:
The worst predicted trait with RMSE was Emotional Sta- A Review of the {CUSUM} Controversy. Revue
bility/Neuroticism in all methods, based in the results of Informatique et Statistique dans les Science Humaines,
other participants1 , this was a general result [11]. A deep pages 19–47, 1995.
study in this particular trait is required to improve the re- [6] R. R. Joshi and R. V. Argiddi. Author Identification :
sults. An Approach Based on Style Feature Metrics of
When measured with Pearson Product-Moment Correla- Software Source Codes. 4(4):564–568, 2013.
tion (PC), the results are very different among runs. But [7] A. Kuhn, S. Ducasse, and T. Gı̂rba. Semantic
submissions 2 and 3 showed much better results compared clustering: Identifying topics in source code.
with baseline because indicates a stronger correlation that Information and Software Technology, 49(3):230–243,
the one showed in the baseline. The SVR with averages 2007.
has an important correlation in openness with value of 0.3 [8] R. Malhotra. Empirical Research in Software
and conscientiousness with value of -0.25. In the ETR run, Engineering: Concepts, Analysis, and Applications.
openness was the highest value with 0.29. SVR over metrics CRC Press, 2015.
in openness also had the highest value with 0.28. This trait
[9] T. J. McCabe. A complexity measure. IEEE
was the most consistent among all the used methods.
Transactions on software Engineering, (4):308–320,
It is interesting that PC shows correlations with openness
1976.
and conscientiousness. This is a good result because indi-
cates that the used metrics have certain relationship whit the [10] T. Parr. The Definitive ANTLR 4 Reference.
mentioned personality traits. Compared with the baseline Pragmatic Bookshelf, 2nd edition, 2013.
RMSE, the proposed method performed slightly better, but [11] F. Rangel, F. González, F. Restrepo, M. Montes, and
still it is not significant, which shows that more work is re- P. Rosso. Pan at fire: Overview of the pr-soco track on
quired to obtain a good predictor of personality. Therefore, personality recognition in source code. In Working
it is necessary to include more source code metrics within notes of FIRE 2016 - Forum for Information Retrieval
this study. This could lead to find that certain metrics are Evaluation, Kolkata, India, December 7-10, 2016,
related to specific personality traits. CEUR Workshop Proceedings. CEUR-WS.org, 2016.
[12] V. Y. Shen, S. D. Conte, and H. E. Dunsmore.
Software science revisited: A critical analysis of the
6. CONCLUSIONS AND FUTURE WORK theory and its empirical support. IEEE Transactions
The source code metrics extracted and used as input to on Software Engineering, (2):155–165, 1983.
the machine learning methods were enough to get a close [13] J. H. Ward Jr. Hierarchical grouping to optimize an
prediction of several personality traits. Other approaches objective function. Journal of the American statistical
can be consulted in [?] which shows other results and ap- association, 58(301):236–244, 1963.
proximations for the PR-SOCO task.
As the PC denotes certain correlation, in this case par-
ticularly with openness, this could mean that the metrics
considered in this work are likely related to the mentioned
trait. However, as there are several other metrics with differ-
ent purposes, like quality, readability, etc., the use of more of
those metrics could improve the prediction. Other metrics
not considered in this study may have better relationships
with the personality traits. This work could be extended by
exploring other metrics an its relationship with each person-
ality trait.
7. REFERENCES
[1] S. Argamon, S. Dhawle, M. Koppel, and J. W.
Pennebaker. Lexical predictors of personality type.
Proceedings of joint annual meeting of the interface
and The Classification Society of North America,
pages 1–16, 2005.
[2] A. Caliskan-Islam, R. Harang, A. Liu, A. Narayanan,
C. Voss, F. Yamaguchi, and R. Greenstadt.
De-anonymizing Programmers via Code Stylometry.
USENIX sec, pages 255–270, 2015.
[3] B. Dit, M. Revelle, M. Gethers, and D. Poshyvanyk.
Feature location in source code: A taxonomy and
survey. Journal of software: Evolution and Process,
25(1):53–95, 2013.
1
http://www.autoritas.es/prsoco/evaluation/