Professional Documents
Culture Documents
LN and ML-based Model Architecture For Recruiting IT Professionals
LN and ML-based Model Architecture For Recruiting IT Professionals
2298/CSIS123456789X
1
161. Introduction
17Personnel selection is the process of obtaining the quantity and quality of employees
18needed for the business and involves a large number of activities (planning,
19recruitment, selection and incorporation of new employees).
20[16] indicates that one of the disadvantages of the recruitment process is the cost of
21operation related to the application of appropriate selection techniques, that is,
22choosing the candidate that meets the requirements of the position offered is a
23complicated task because it implies that the Human Resources area invests large
24resources, distributed among activities such as: review of profiles, filtering and
25personal interviews.
26Human resources management and the problems they present are being addressed by
27Artificial Intelligence (IA) and its branches. For example, in the literature review of
28[6], the author shows us that AI offers a diverse set of suggestions of how specific AI
29techniques could be applied to specific Human Resources tasks.
30An example of the aforementioned is reflected in the proposal of [4], in which they
31address the problem of candidate classification with the help of Machine Learning.
32For this purpose, they evaluated algorithms (linear regression, M5 model tree, REP
33decision tree and support vector machine) of supervised learning in combination with
34a semantic skill matching mechanism to achieve automated electronic recruitment.
35Another ML-based proposal is [3], in which they propose a microservices-based
36framework in order to recommend the best job offers for a candidate.
3
12 First Author et al.
2
1On the other hand, [19] proposes a system with a hybrid approach (PLN and regular
2expressions) that seeks to solve the problem of resume categorization and resume-job
3offer matching.
4Finally, [11] present a bidirectional recommender system for candidates in job search.
5The author’s proposal implements a microservices-based, scalable and stateless
6architecture to drive automation through recommendation using Machine Learning
7and static methods.
8Using an electronic recruitment or e-recruitment strategy that also implements
9“intelligent” mechanism or AI techniques, offers great advantages when evaluating
10hundreds of profiles, since they offer faster (depending on the technique and
11processing resources) and more accurate results for what we are looking for.
12Based on the review of recruitment research, the objective of the present research is to
13design an architecture based on Natural Language Processing and Machine Learning
14to address the problem of recruiting IT professionals.
15The rest of the paper is organized as follows. Section 2 covers research covering the
16recruitment problem. In section 3, we detail the architecture design and, finally, in
17section 4 we show the results and discussion on these.
18
20We analyzed a total of 20 investigations and divided them into 3 categories according
21to the techniques applied: Machine learning, Natural Language Processing and
22Semantic Correspondence.
24[4] proposes a system for candidate selection through the analysis of the candidate’s
25LinkedIn and blogger profile. For this purpose, they evaluated supervised learning
26algorithms (linear regression, M5 model tree, REP decision tree and support vector
27machine) and combined them with a semantic skill matching mechanism.
28Supported by the strengths of semantic knowledge (concept similarity) and the
29strengths of Machine Learning methods, [3] propose a scalable and stateless
30architecture for an automated Human Capital Management system and with which
31they seek to recommend jobs to a candidate and vice versa, recommend candidates for
32a company.
33A recommendation system that uses a Gradient Boosting Decision Tree (GBDT) and a
34hybrid convolutional neural network model to compute a correlation between a job
35seeker and a job offer with the goal of improving the quality of human resource
36recommendation is proposed by [17].
37[20] proposes a convolutional neural network model with the objective of solving the
38person-job matching problem. The authors’ proposal is a neural network that learns
39the joint representation of person-job fit from historical job applications.
40[11] proposes an architecture for automation through recommendation using machine
41learning and statistical methods. The authors’ proposal is an extension of the research
3
1 Authors’ Instructions 3
2
3
4
1of [3] in which they aim to achieve better system robustness and recommendation
2quality by implementing features such as candidate career interests, scoring functions
3for academic information and professional experience, string matching, etc.
4[15] presents an automated Machine Learning-based model for CV recommendation.
5In which, a CV goes through preprocessing for cleaning and feature extraction using
6the TF-IDF approach and subsequently through the classification model is assigned to
7a category.
8In the recruitment process, recruiters do not focus exclusively on a person’s technical
9skills to determine their sustainability for an offered position, but also take into
10account characteristics such as education, personality, experience, etc.
11
12 Table 1. Characteristics to consider for personnel selection
5
14 First Author et al.
2
1[17] and [20] dive into Machine Learning subtasks and propose solutions based on the
2use of Convolutional Neural Networks. The first one proposes a recommendation
3system with a GBDT model and a hybrid convolutional neural network model for
4regularization and recommendation. The second one, on the other hand, relies
5exclusively on convolutional neural networks and applies cosine similarity to calculate
6the similarity between job offers and a candidate’s CV.
8A job recommendation system based on user profile is proposed by [8], in which they
9also seek to predict career advancement from the user’s work history.
10A content-based recommendation algorithm that extends and updates the Minkowski
11distance is proposed by [1], with the objective of matching people and jobs. The
12authors’ proposal quantifies the sustainability of a searcher/candidate by analyzing a
13structured form of the candidate’s job and profile created from the content analysis of
14the unstructured form of these.
15[7] proposes a Resume Matching System called ResuMatcher, which determines the
16sustainability of a job by calculating the similarity between the models generated from
17the resume and the job description.
18A career path recommendation system that relies on text mining and collaborative
19filtering techniques and also recommends skills based on related job offers generated
20from the user’s profile skills is proposed by [13].
21[12] proposes a candidate recommendation system called Smart Applicant Ranker; in
22it, they use ontologies to compare CV models (consisting of education, work
23experience and skills) and job requirement models to find the best candidates based on
24the similarity of the generated ontological models.
25A bidirectional semantic correspondence system is proposed by [2] to measure the
26degree of semantic similarity between the skills and qualifications of a job seeker and
27an offered vacancy. In addition, they apply machine learning techniques for
28bidirectional matching of job vacancies and occupational standards to improve the
29content of job vacancies and job seeker profiles based on social network analysis and
30occupational standards.
31[18] propose the use of weighted tree algorithms to calculate the similarity between
32job advertisements and keywords or criteria used by job seekers.
33[14] propose an ontology-based (most relevant) job recommendation system that is
34built from the basic information collected and the list of favorite and viewed jobs by
35the user.
36In the proposals of [8], [1], [12], [2], [18] and [14] the authors propose solutions that
37require the information to be analyzed to have a certain structure. On the other hand,
38the proposals of [7] and [13] apply unstructured analysis, taking into account that the
39information contained in a CV does not present a unique style or format.
40 Table 2. Format of the information to be processed
Information to process
Author(s) Structured Not
Structured
3
1 Authors’ Instructions 5
2
3
4
[Error: X
Reference
source not found]
[Error: X
Reference
source not found]
[Error: X
Reference
source not found]
[Error: X
Reference
source not found]
[Error: X
Reference
source not found]
[Error: X
Reference
source not found]
[Error: X
Reference
source not found]
[Error: X
Reference
source not found]
1[8], [7] and [2] present proposals that approach the selection problem from the
2perspective of similarity between a candidate’s CV/profile and the vacancy/position
3offered. In contrast, [18] addresses the problem through the similarity of the content of
4a job offer and the search keywords used by a user.
5Although the proposals of [8], [7] and [2] address the same similarity approach, each
6one presents some peculiarity. In the proposal put forward by [8], recommendation
7based on the content of the candidate’s work history is applied. [7] rely on the
8qualifications, skills and work experience described in the candidate’s CV and those
9required in the job offer and generate recommendations based on the similarity
10between them. Finally, [2] take into account the similarity of qualifications and skills
11and also take into account the candidate’s connections since their testimony enhances
12the process of evaluating whether or not a candidate is suitable for a vacancy.
13
14 Table 3. Data source
5
16 First Author et al.
2
Reference
source not
found]
[Error: Indeed 1000
Reference
source not
found]
[Error: Universidad Estatal de San 1000
Reference José
source not
found]
[Error: - -
Reference
source not
found]
[Error: Not specified 175
Reference
source not
found]
[Error: Not specified 100
Reference
source not
found]
[Error: - -
Reference
source not
found]
2An online recruitment system that exploits multiple semantic resources and uses
3statistical measures of concepts relatedness is proposed by [10]. Moreover, it relies on
4PLN to identify and extract possible concept lists from job postings and candidate
5CVs.
6[9] propose a solution focused on job matching for older workers. In this solution,
7from the description entered in the system search engine, keywords are extracted from
8the text after tokenizing sentences and filtering words based on morphological
9analysis. Then, based on the top 10 keywords, the search for related job offer
10documents is performed.
11To solve the resume-job offer matching problem of job portals [19] pose a hybrid
12approach and incorporate the use of resume categorization to reduce the dataset to be
13analyzed, that is, instead of evaluating the total resumes, the analysis is only applied
14to resumes that fall within the category described in the job offer.
3
1 Authors’ Instructions 7
2
3
4
1To cover the problem of CV retrieval based on the description of a job offer, [5]
2propose the use of the average word embedding (AWE) model and the Principal
3Component (PCA) algorithm to solve the dimensionality problem that AWE can
4present.
5
6Table 4. Weighting techniques applied in proposals using PLN
7
Author(s) Technique/Approach
Weighting
[Error: TF-IDF
Reference
source not
found]
[Error: BM25
Reference
source not
found]
[Error: TF-IDF
Reference
source not
found]
[Error: AWE
Reference
source not
found]
8
9In the proposals of [10], [9], [19] and [5], we could appreciate different techniques
10applied to information retrieval, as shown in Table 4, [9] applied that TF-IDF
11weighting scheme to eliminate concepts that do not present significant value. [9] made
12use of Solr/lucene scores of the BM25 algorithm, which performs scoring based on
13term frequency and document length normalization. [9], relied on the TF-IDF
14technique, which subsequently performs concept list filtering/refinement by removing
15concepts with low weights assigned by this technique. On the other hand, [9] indicate
16that classical information retrieval models such as Bag of Word (BOW) and BM25
17have certain weaknesses and require complementary techniques such as latent
18semantic indexing (LSI). Therefore, they rely on the average word embeddings
19(AWE) models.
20
5
18 First Author et al.
2
13. Architecture design
19
20 Figure 1. Model Architecture
3
1 Authors’ Instructions 9
2
3
4
1 Figure 1 shows the architecture of our model and its components:
2 Data form
3 Pre-processing module
4 Categorization module
5 Matching module
6
8 It represents the core of the system and is the component that receives the necessary
9information for the model to work. Through it, the actors (applicant and candidate)
10initiate the behavior of the model, since they provide the data that pass through each
11of the components of the model and ultimately generate a ranking of candidates for
12the job offer entered or a ranking of job offers for the CV entered.
14 In this component, the corpus of the text entered in the skills section goes through a
15cleaning process, through which we detect and eliminate those punctuation marks or
16symbols that do not provide context-relevant meaning or that cause an IT skill not to
17be detected.
18
19 Figure 2. Skills corpus cleaning
20
21In figure 2, we present the proposed flowchart for data cleaning. Since in our skills
22detection process we rely on an IT dictionary, it is necessary to ensure that an IT skill
23(contained in the skills section of each form) does not contain characters that would
24cause the omission of this skill during the process. Therefore, the first step to follow is
5
110 First Author et al.
2
1the conversion of the text of the skills section into a list of characters. After that, we
2parse each element of the generated list and remove the signs and symbols. Finally,
3we rejoin this list of characters and obtain a clean corpus to process.
4An important element in this module is Word2vec, which is a neural network
5composed of an input layer, a hidden layer and an output layer that allows us to
6calculate the semantic relationship between words in a given context. Taking into
7account the above, we take advantage of this tool and train it with IT skills.
8This model helps us to fulfill the objective of this module, which is to obtain a subset
9of skills with a strong semantic relationship and thus, reduce the number of queries to
10be made later in the categorization module. This is under the premise that a set of
11strongly related skills will result in an equally related number of IT occupations.
12
133.3 Categorization module
14With this module we obtain the IT occupations related to each of the skills detected in
15the previous module. These occupations help us to categorize the document (job offer
16or CV) that is being processes and also serve to reduce the volume of data to be
17worked with in the next module.
18Table 5. IT dictionary excerpt
IT Skill IT Professions
Expressjs backend, js developer
Extjs frontend, js developer
Firebase backend, mobile
developer
Flask python developer,
backend, web developer
19
20Table 5 shows a small excerpt of how the IT dictionary is composed.
21An IT skill is not exclusive to one profession and that is why during the consultation
22of our IT skill dictionary it is possible that there are one or more IT skills that have in
23common one or more IT professions/occupations.
24Taking into account the above, during each query to our dictionary we assign a
25frequency value. Then, at the end of the query process, we calculate the average
26frequency and categorize the document under evaluation (job offer or CV) with those
27professions that have a value greater than or equal to the average.
283.4 Matching module
29In this module, in case a job offer is being processed, the list of professional categories
30obtained is taken and for each of these, the CVs of the same category are extracted
31from the database. In case a profile or CV is being processed, the documents extracted
32from the database will be job offers.
33With the set of documents obtained, a data table is built. This data table has as column
34headers the IT skills detected from the filtered set and the item being processed, each
35row will be represented by a profile or CV, where each row – column intersection will
36have a value that depends on the following conditions:
37• 0 will be assigned if the CV does not possess the IT skill described in the
38column.
3
1 Authors’ Instructions 11
2
3
4
1• 1 will be assigned if the CV possesses the IT skill described in the column.
2• 2 will be assigned if the CV contains the IT skill described in the column and
3it matches one of the requirements of the job offer.
4In case a profile or CV is being processed, the criteria are the same, with the
5difference that each row will be represented by a job offer.
6This data table represents the input for clustering. The unsupervised Mean-shift
7algorithm is in charge of analyzing this set and assigning a group or cluster number to
8each one. This algorithm, unlike others, does not require a number of clusters to be
9assigned, but it iterates and analyzes each of the elements of the set and establishes the
10number of clusters. Once the process is finished, we have the number of clusters to
11which each element belongs. Of these, those that are in the same cluster as the
12document (job offer or CV) being processed represent the output of the clustering
13component. es el encargado de analizar este conjunto y asignar a cada uno un número
14de grupo o clúster.
153.5 Model output
16Our final objective is to obtain a ranking of candidates; therefore, we order the CVs
17(obtained during clustering) based on the percentage of skills that a CV fulfills with
18respect to those specified in the job offer. Put differently, given a CVi, where i є N,
19which contains an HCV list of skills, and given the job offer, which contains the
20required skills (RS) and the desirable skills (DS). The percentage of RS (%RS) is
21calculated as the number of RS that are contained in HCV over the total number of
22HCV items.
23
24As an example, given a CV and a job offer with RS and DS. The percentage of RS and
25DS is calculated as follows:
26HCV = [Java, Spring, JSF, Oracle, Android, Flutter, Spring Boot]
27• n(HCV) = 7
28• RS = [Java, Android, React, Flutter] %RS = 3/7 ≈ 42.8%
29• DS = [Spring, Spring Boot] %DS = 2/7 ≈ 28.5%
304 Results and Discusion
31In this section, for the evaluation and discussion of results, we used 200 job offers and
3250 profiles or CVs. In addition, we rely on an IT dictionary which consists of 225
33skills, and the occupations associated with each of these.
34As we indicated in the theoretical input chapter, out model consists of 3 components:
35pre-processing, categorization and clustering. In this chapter we will show the results
36of processing a document (job offer or CV) by each of these components.
374.1 Pre-processing results
384.1.1 Case 1:CV
39When registering a CV through the web system form, the section containing IT
40knowledge or skills is processed to detect those with the highest semantic similarity:
41
42Table 6. CV: Pre-processing results
CV IT skills detected Most similar IT
skills
cv_000 9 = ['html', 'css', 8 = ['html5', 'css3',
5
112 First Author et al.
2
1 'javascript', 'java', 'javascript', 'php',
'php', 'laravel', 'vue.js', 'java',
'vuejs', 'rxjava', 'spring', 'laravel']
'spring']
cv_000 13 = ['html', 'css', 9 = ['html5', 'css3',
2 'javascript', 'javascript', 'php',
'typescript', 'java', 'typescript',
'php', 'python', 'angular', 'python',
'angular', 'nodejs', 'react', 'nodejs']
'azure', 'react', 'js',
'nestjs']
3
1 Authors’ Instructions 13
2
3
4
'aws', 'azure'] 'azure']
Oferta_2 13 = ['php', 10 = ['php',
'javascript', 'python',
'typescript', 'c#', 'symfony', 'css3',
'xamarin', 'javascript',
'python', 'html5',
'symfony', 'typescript',
'django', 'html', 'angular', 'c#',
'css', 'aws', 'xamarin']
'dynamo',
'angular']
5
114 First Author et al.
2
cv_0049 ['java developer', 'backend']
cv_0050 ['java developer', 'frontend']
1
2As mentioned in the previous paragraph, the skills obtained as output from the pre-
3processing are consulted in the IT dictionary and as a result, we obtain data shown in
4Table 8.
54.2.2 Case 2: job offer
6For the case of a job offer, the same process is applied as in case 1, but the skills that
7are consulted in the IT dictionary are those that were detected in the mandatory skills
8section, since these are the ones that best describe the required profile.
9
10Table 9. Job offer: categorization results
Offer Assigned categories
Oferta_1 ['frontend', 'web developer', 'js developer', 'php
developer']
Oferta_2 ['php developer', 'backend', 'web developer',
'frontend', '.net developer']
3
1 Authors’ Instructions 15
2
3
4
5
116 First Author et al.
2
1An architecture based on Natural Language Processing and Machine Learning is
2proposed to address the problem of recruiting IT personnel.
3As shown in the cited references, in addition to the skills or knowledge, there are other
4qualities that are qualified to determine which person best meets the requirements of a
5job offer. Among these we find the work history, with which we can obtain the years
6of experience, positions held, among others. As future work, we want to build on this
7architecture to design a generalized architecture for recruitment.
8
9References
101. Almalis, N. D., Tsihrintzis, G. A., Karagiannis, N., & Strati, A. D. (2016). FoDRA - A
11 new content-based job recommendation algorithm for job seeking and recruiting. IISA
12 2015 - 6th International Conference on Information, Intelligence, Systems and
13 Applications.
142. Chala, S. A., Ansari, F., Fathi, M., & Tijdens, K. (2018). Semantic matching of job seeker
15 to vacancy: a bidirectional approach. International Journal of Manpower, 39(8), 1047–
16 1063.
173. Chaudhary, A., Jobanputra, M., Shah, S., Gandhi, R., Chaudhary, S., & Goswami, R.
18 (2018). Automated human capital management system. 12th Annual IEEE International
19 Systems Conference, SysCon 2018 - Proceedings, 1–8.
204. Faliagka, E., Iliadis, L., Karydis, I., Rigou, M., Sioutas, S., Tsakalidis, A., & Tzimas, G.
21 (2014). On-line consistent ranking on e-recruitment: Seeking the truth behind a well-
22 formed CV. Artificial Intelligence Review, 42(3), 515–528.
235. Fernández-Reyes, F. C., & Shinde, S. (2019). CV Retrieval System based on job
24 description matching using hybrid word embeddings. Computer Speech and Language, 56,
25 73–79.
266. Figueroa-García, J. C., Kalenatic, D., & López-Bello, C. A. (2015). Artificial Intelligent
27 Techniques in Human Resource Management. Intelligent Systems Reference Library, 87,
28 623–643.
297. Guo, S., Alamudun, F., & Hammond, T. (2016). RésuMatcher: A personalized résumé-job
30 matching system. Expert Systems with Applications, 60, 169–182.
318. Heap, B., Krzywicki, A., Wobcke, W., Bain, M., & Compton, P. (2014). Combining career
32 progression and profile matching in a job recommender system. Lecture Notes in
33 Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture
34 Notes in Bioinformatics), 8862, 396–408.
359. Kaoru, S., Kenichi, S., Masatomo, K., & Atsuhi, H. (2017). Towards extracting recruiters’
36 tacit knowledge based on interactions with a job matching system. Lecture Notes in
37 Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture
38 Notes in Bioinformatics), 10298, 557–568.
3910. Kmail, A. B., Maree, M., Belkhatir, M., & Alhashmi, S. M. (2016). An automatic online
40 recruitment system based on exploiting multiple semantic resources and concept-
41 relatedness measures. Proceedings - International Conference on Tools with Artificial
42 Intelligence, ICTAI, 2016-Janua, 620–627.
4311. Mehta, M., Derasari, R., Patel, S., Kakadiya, A., Gandhi, R., Chaudhary, S., & Goswami,
44 R. (2019). A service-oriented human capital management recommendation platform.
45 SysCon 2019 - 13th Annual IEEE International Systems Conference, Proceedings, 1–8.
4612. Mohamed, A., Bagawathinathan, W., Iqbal, U., Shamrath, S., & Jayakody, A. (2018).
47 Smart Talents Recruiter - Resume Ranking and Recommendation System. 2018 IEEE 9th
3
1 Authors’ Instructions 17
2
3
4
1 International Conference on Information and Automation for Sustainability, ICIAfS 2018,
2 1–5.
313. Patel, B., Kakuste, V., & Eirinaki, M. (2017). CaPaR: A career path recommendation
4 framework. Proceedings - 3rd IEEE International Conference on Big Data Computing
5 Service and Applications, BigDataService 2017, 23–30.
614. Rimitha, S. R., Abburu, V., Kiranmai, A., Marimuthu, C., & Chandrasekaran, K. (2019).
7 Improving Job Recommendation Using Ontological Modeling and User Profiles. 2019
8 15th International Conference on Information Processing: Internet of Things, ICINPRO
9 2019 - Proceedings.
1015. Roy, P. K., Chowdhary, S. S., & Bhatia, R. (2020). A Machine Learning approach for
11 automation of Resume Recommendation system. Procedia Computer Science, 167(2019),
12 2318–2327.
1316. Vallejo Chávez, L. M. (2016). Gestión del talento humano ESPOCH 2016.
1417. Wang, H., Liang, G., & Zhang, X. (2018). Feature Regularization and Deep Learning for
15 Human Resource Recommendation. IEEE Access, 6, 39415–39421.
1618. Wierfi, A. D., Utami, E., & Sunyoto, A. (2019). The application of extended weighted tree
17 similarity algorithm for similarity searching. 2019 International Conference on
18 Information and Communications Technology, ICOIACT 2019, 428–433.
1919. Zaroor, A., Maree, M., & Sabha, M. (2018). A Hybrid Approach to Conceptual
20 Classification and Ranking of Resumes and Their Corresponding Job Posts. International
21 Conference on Intelligent Decision Technologies, 2, 13–21.
2220. Zhu, C., Zhu, H., Xiong, H., Ma, C., Xie, F., Ding, P., & Li, P. (2018). Person-Job Fit.
23 ACM Transactions on Management Information Systems, 9(3), 1–17.