Generative Job Recommendations With Large Language Model

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Generative Job Recommendations with Large

Language Model
Zhi Zheng1,2,† , Zhaopeng Qiu1,† , Xiao Hu1 , Likang Wu1,2 , Hengshu Zhu1∗ , Hui Xiong3,4∗
1
Career Science Lab, BOSS Zhipin.
2
University of Science and Technology of China,
3
The Thrust of Artificial Intelligence, The Hong Kong University of Science and Technology.
4
The Department of Computer Science and Engineering, The Hong Kong University of Science and Technology.
zhengzhi97@mail.ustc.edu.cn, zhpengqiu@gmail.com, zhuhengshu@gmail.com, xionghui@ust.hk

Abstract—The rapid development of online recruitment ser- in practical applications, these methods will encounter the
vices has encouraged the utilization of recommender systems
arXiv:2307.02157v1 [cs.IR] 5 Jul 2023

following challenges. First, these methods primarily rely on


to streamline the job seeking process. Predominantly, current end-to-end neural network models, which usually output the
job recommendations deploy either collaborative filtering or
person-job matching strategies. However, these models tend to matching score directly given the information of a specific job
operate as “black-box” systems and lack the capacity to offer seeker and a job. Nevertheless, these models suffer from poor
explainable guidance to job seekers. Moreover, conventional explainability in the black-box neural network computations,
matching-based recommendation methods are limited to retriev- which reduce the user trust, especially in scenarios like job
ing and ranking existing jobs in the database, restricting their seeking that have a significant impact on individuals. Second,
potential as comprehensive career AI advisors. To this end, here
we present GIRL (GeneratIve job Recommendation based on most of the existing models are discriminative model, which
Large language models), a novel approach inspired by recent are limited to retrieving and ranking existing jobs in the
advancements in the field of Large Language Models (LLMs). database, restricting their potential as comprehensive career
We initially employ a Supervised Fine-Tuning (SFT) strategy AI advisors, i.e., generating a novel Job Description (JD) for
to instruct the LLM-based generator in crafting suitable Job a job seeker personally based on the Curriculum Vitae (CV).
Descriptions (JDs) based on the Curriculum Vitae (CV) of a
job seeker. Moreover, we propose to train a model which can Last but not least, the existence of the considerable semantic
evaluate the matching degree between CVs and JDs as a reward gap between CVs and JDs has resulted in the underwhelming
model, and we use Proximal Policy Optimization (PPO)-based performance of traditional methods.
Reinforcement Learning (RL) method to further fine-tine the To address the aforementioned challenges, in this paper,
generator. This aligns the generator with recruiter feedback, inspired by the recent progress in the field of Large Language
tailoring the output to better meet employer preferences. In
particular, GIRL serves as a job seeker-centric generative model, Models (LLM), we propose a novel user-centered GeneratIve
providing job suggestions without the need of a candidate set. job Recommendation paradigm based on LLM called GIRL.
This capability also enhances the performance of existing job rec- As shown in Figure 1, different from traditional discriminative
ommendation models by supplementing job seeking features with job recommendation methods which aim to predict a matching
generated content. With extensive experiments on a large-scale score given a specific job seeker and job, GIRL aims to directly
real-world dataset, we demonstrate the substantial effectiveness
of our approach. We believe that GIRL introduces a paradigm- generate a personalized JD for a specific job seeker based on
shifting approach to job recommendation systems, fostering a the remarkable generation ability of LLMs. We propose two
more personalized and comprehensive job-seeking experience. ways to leverage the generated job descriptions. Firstly, the job
descriptions generated by the LLMs represent the job that the
I. I NTRODUCTION
model deems most suitable for the job seeker. Therefore, these
Recent years have witnessed the rapid development of descriptions can provide job seekers with references for their
online recruitment. According to the report from The In- job seeking and career development planning. Meanwhile, it
sight Partners1 , the global online recruitment market size is can improve the explainability of the whole recommender
expected to grow from $29.29 billion in 2021 to $47.31 system. Secondly, the generated results can be used to bridge
billion by 2028. For these platforms, the Recommendation the semantic gap between CVs and JDs, and further enhance
Systems (RS) which can provide valuable assistance to job the performance of traditional discriminative models.
seekers by recommending suitable jobs, serve as their core However, it is non-trival to train an LLM for job recommen-
components. Alone this line, considerable research efforts have dation. On one hand, given the significant differences between
been made in building RS for online recruitment [1]–[3]. recommendation tasks and NLP tasks, the LLM needs to
Indeed, existing studies mainly follow the collaborative filter- incorporate more domain-specific knowledge [4]. On the other
ing [1] or person-job matching paradigms [2], [3]. However, hand, to better assist with downstream recommendation tasks,
† Equal the LLM needs to further learn from historical interaction
Contribution.
∗Corresponding authors. records. Therefore, inspired by the InstructGPT [5], in this
1 https://www.theinsightpartners.com/reports/online-recruitment-market paper, we propose a three-step training methodology as:
(a) Discriminative Job Recommendation (b) Generative Job (c) Generation-Enhanced Job Recommendation
Recommendation
𝑜𝑢𝑡𝑝𝑢𝑡 𝑠𝑐𝑜𝑟𝑒 𝑜𝑢𝑡𝑝𝑢𝑡 𝑠𝑐𝑜𝑟𝑒
Generated JD

Predictor Predictor

CV JD CV JD
Embedding Embedding Recommend Embedding Embedding
Generated
LLM-based JD
Curriculum Job Generator Curriculum Job
Vitae Description Vitae Description

Job Seeker Recruiter Input CV Job Seeker Job Seeker Recruiter

Fig. 1. Schematic diagram of three distinct job recommendation paradigms.

1) Supervised Fine-Tuning (SFT): This step aims to teach A. Job Recommendation


the LLM how to generate an appropriate JD based on a In the era of burgeoning online job platforms, a variety of
given CV. Specifically, we build a dataset consisting of novel job recommendation techniques have been introduced.
previously matched CV-JD pairs, and use the intruction- These approaches can be primarily divided into two categories,
tuning method to train the LLM generator. respectively text-based methods and behavior-based methods.
2) Reward Model Training (RMT): In this step, we build For text-based methods, PJFNN [3] formulated this task as
a dataset consists of matched and mismatched CV-JD a joint representation learning problem and utilized CNN-
pairs, which contains the recruiter feedback for the job based models to get the representation of job seekers and
seekers. Then, we train a reward model to distinguish the recruiters, while APJFNN [2] enhanced the above model by
matched CV-JD pairs from mismatched ones to mimic taking the abilities of job seekers into consideration and used
the real-world recruiter. attention mechanisms for hierarchical ability-aware represen-
3) Reinforcement Learning from Recruiter Feedback tation. IPJF [1] conceived an interpretable model to match
(RLRF): In step three, we leverage Proximal Policy Op- job seekers and recruiters in a multi-task learning framework.
timization (PPO) based reinforcement learning method For behavior-based methods, DPGNN [6] proposed to build
to further align the LLM to the recruiter preference cap- an interaction graph between job seekers and recruiters to
tured by the reward model, making the LLM generation model the directed interactions. DPJF-MBS [7] proposed to
consider not only the preference of the job seeker but utilize memory networks to get the representation of the multi-
also the practical market demands. behavior sequences of different job seekers and recruiters.
Finally, the major contribution of this article can be summa-
rized as follows: B. Large Language Models
• To the best of our knowledge, this is the first piece Large Language Models (LLMs) are language models con-
of work which proposes an LLM-based generative job sisting of a neural network with many parameters (tens of
recommendation paradigm. millions to even trillions), and trained on large quantities
• We propose a novel three-step training methodology with of unlabeled text using self-supervised learning or semi-
reinforcement learning from recruiter feedback to train a supervised learning methods [8], [9]. Large language models
job description generator. primarily rely on the Transformer [10] architecture, which
• We evaluated the quality of the generated results with the has become the standard deep learning technique for Natural
help of GhatGPT2 , and we further conducted extensive Language Processing (NLP). Existing LLMs can primarily be
experiments on real-world dataset. divided into two categories, respectively discriminative LLMs
and generative LLMs. For discriminative LLMs, BERT [11]
II. RELATED WORK proposed a deep bidirectional transformer architecture, and
further proposed a Masked Language Model (MLM) objective
In this section, we will summarize the related works in the for model pre-training. Roberta [12] further refined the training
following three categories, respectively job recommendation, process of BERT and achiever better performance. XLNet [13]
large language models, and LLMs for recommendation. leveraged the permutation of the sequence order, enabling it to
learn the context of a word based on all the words before and
2 https://chat.openai.com/ after it in a sentence. For generative LLMs, GPT [14] proposed
to improve language understanding by generative pre-training. Definition 1 (Generative Job Recommendation): Given a job
During pre-training, the model learns to predict the next word seeker s with the corresponding C, the goal of generative job
in a sentence, without any specific task in mind. GPT-2 [15] recommendation is to train a generator G, which can generate
and GPT-3 [16] further increased the model scale and achieved a suitable JD for this user, i.e., G : C → J ′ .
better performance. InstructGPT [17] further proposed to fine- In the aforementioned definitions, the generated J ′ should
tune the GPT model using reinforcement learning from human has high quality and encompassing the most suitable job
feedback. Inspired by the above studies, in this paper, we also information for job seeker s, thereby providing meaningful
use reinforcement learning to fine-tune the JD generator and guidance for s. Furthermore, in this paper, we propose that
we use BERT for text embedding. J ′ can also serve as a synopsis of job seeker s, contributing
auxiliary support to traditional recommendation tasks. Along
C. LLMs for Recommendation this line, the generation-enhanced job recommendation can be
LLMs have recently gained significant attention in the formulated as:
domain of recommendation systems [4]. Generally, existing Definition 2 (Generation-Enhanced Job Recommendation):
studies can be divided into two categories, respectively rec- Given a job seeker s with the corresponding C, a job j with the
ommendation based on discriminative LLMs and generative corresponding J, and the generated J ′ , the goal of generation-
LLMs. For the former category, U-BERT [18] proposed a enhanced job recommendation is to train a model M, which
novel pre-training and fine-tuning method to leverage BERT can calculate the matching score between s and j, i.e., M :
for recommendation tasks. BERT4Rec [19] proposed to utilize C, J, J ′ → R.
BERT-based deep bidirectional self-attention architecture to
IV. G ENERATIVE R ECOMMENDATION F RAMEWORK
model user behavior sequences. For the latter category, some
studies focus on utilizing the zero/few shot abilities of LLMs, As shown in Figure IV, the generative recommendation
and use the LLMs for recommendation by prompting without framework is based on a large language model and consists
fine-tuning [20]–[22]. Moreover, some studies further fine- of three training steps. Specifically, we first convert the JD
tune the LLMs, endeavoring to achieve better performance. recommendation task to the NLG format with the manually
For example, TALLRec [23] proposed to fine-tuned the LLMs designed prompt template, and utilize supervised fine-tuning
by recommendation tuning, where the input is the historical to make the LLM generator understand the recommendation
sequence of users and the output is the ”yes or no” feed- task. Second, we train a reward model to learn the recruiter
back. InstructRec [24] designed 39 instruction templates and feedback and capture the interaction information. Third, we
automatically generated a large amount of instruction data for utilize reinforcement learning to further align the generator
instruction tuning. with the recruiting market. We will address the details of all
steps in the following sub-sections.
III. P ROBLEM D EFINITION
A. Supervised Fine-tuning
Here we introduce the problem formulation of generative
job recommendation and generation-enhanced job recommen- In this training step, we propose to train the generator in
dation. Let S and J denote the entire job seeker set and the supervised fine-tuning way based on the matched CV-
job set. The feedback matrix between the job seekers and JD pairs. First, given a specific job seeker s with the CV
recruiters is denoted as Z ∈ RNs ×Nj , where zs,j = 1 means C and a job j with the JD J, we first build a prompt T to
both job seeker s and the recruiter of job j are satisfied with describe the generation task as shown in Figure 3. To maintain
each other and this pair is matched, and zs,j = 0 means consistency with the training data, the original prompt is in
this pair is mismatched. Ns and Nj denote the numbers of Chinese. However, for better illustration, we have translated it
the job seekers and jobs, respectively. Furthermore, each job to English in Figure 3. The prompt template consists of the
seeker s has a corresponding CV which can be formatted as following four parts:
C = [w1 , . . . , wls ], where wi is the i-th word in C and ls is • Role: the green words, which aims to keep consistence

the length of C. Similarly, each job j has a corresponding JD with the instruction-tuning data of the our used backbone.
which can be formatted as J = [v1 , . . . , vlj ], where vi is the • Instruction: the black words, which describes the gener-

i-th word in J and lj is the length of J. Note that we omit ation task via the human natural language.
some subscripts to facilitate the reading. • Input: the blue words, which contains the information of

Traditionally, the objective of discriminative job recom- the job seeker.


mendation is to train a scoring model that can compute the • Output: the black words, which is the generation target,

matching score between a given job seeker s and a job i.e., the JD text. Note that this part will be blank in the
j. However, this traditional paradigm can only recommend inference phase.
existing jobs for job seekers, which may not fulfill the needs Then, we propose to train the generator with the casual
of some job seekers. Therefore, in this paper, we propose a language model pre-training task. Specifically, given the gen-
novel generative job recommendation paradigm which can be erator G, the CV C, and the prompt template T , we optimize
formulated as: the negative log-likelihood for generating the JD J as:
Step 1 – Supervised Fine-tuning Step 2 – Reward Model Training Step 3 - PPO
Collect matched data, and train a supervised generator. Collect comparison data, and train a reward model. Refine the generator using reinforcement learning.
[ Education Experience ]
[ Education Experience ] XX University, 2010~2014
XX University, 2010~2014 Bachelor
Matched Bachelor Sample a new CV [ Work Experience ]
[ Work Experience ] Company A, 2018~2023
Company A, 2018~2023 to construct the Senior Engineer, XXXX
Senior Engineer, XXXX
Matched prompt data
[ Education Experience ] [ Job Position ] Mismatched
XX University, 2014~2017 Senior Engineer
Select the Prompt
Computer Science, Master
Sample the XX University, 2010~2014 [ Responsibility ]
positive and Generate a JD by
matched CV-JD Computer Science, Bachelor Develop and maintain our
the policy LLM based Generator
web applications using negative pairs
pairs as the [ Work Experience ]
Company A, 2018~2023
modern front-end
technologies generator
training data Senior Engineer [ Job Position ]
Senior Engineer
[ Job Position ]
Engineer
XXXX [ Job Requirements ]
5+ years of experience in [ Job Position ]
[ Skills ] front-end development [ Responsibility ] [ Responsibility ]
Develop our web Learn from and collaborate Engineer
.. ..
applications using modern with senior front-end
front-end technologies engineers [ Responsibility ]
Input CV Target JD Assist in developing and

RL loss
Construct prompt Prompt Template maintaining our web
applications using modern
with manual front-end technologies

designed template Use the reward


model to predict the Calculate a
Reward Model

Ranking loss
matching scores reward for the Reward Model
Fine-tune LLM generated JD
with supervised LLM based Generator
learning Train the model by Update the policy
1 0 0.5
the ranking loss generator using PPO

Fig. 2. The training framework of the generative recommendation model.

Human: generator model G given the job seeker feature C and the
Please generate a suitable job description based on the curriculum vitae prompt template T .
of the following candidate:
Basic information: The candidate is a 27-year-old female with a
bachelor's degree, majoring in Software Engineering.
B. Reward Model Training
personal profile:
1. She has research experience in the field of data processing and
In this training step, our aim is to train a reward model U
computation, and development experience with components from that can predict the matching score between a CV-JD pair, i.e.,
ElasticSearch, Hbase, and the Flink ecosystem. U : (C, J) → R. The architecture of U is similar to that of
2. She possesses strong abilities to read and understand source code,
having reviewed core source code from JDK8, Spring, Spring Boot, the generator model G, but it has a linear prediction head that
and Netty. outputs scalar values. Additionally, the parameter scale of U
Assistant: is smaller than that of G.
Job title: Java Developer To train the reward model U, we collect pairwise training
Salary range: 25k-40k* 15
Responsibilities: data and construct a ranking task. Typically, a job seeker
1. Support the recommendation product and algorithm departments, applies for multiple jobs simultaneously and receives different
rapidly iterate daily needs, complete effect analysis and continuous
improvements. feedback (matched or rejected) from recruiters. Therefore, we
2. Track industry developments, timely introduction of external select a matched job J + and a mismatched job J − for each
advanced experiences.
Qualifications: CV C to construct comparable pairs. We then optimize the
1. Proficient in Java programming basics, with excellent abilities and pairwise ranking loss to train U as follows:
experience in Java IO and multithreading programming.
2. In-depth understanding of JVM, JVM tuning experience, and
experience with distributed systems and concurrency scenarios are
Lrmt = log σ(U(C, J + ) − U(C, J − )), (2)
preferred.
3. Proficient in applying mainstream development frameworks and where σ denotes the Sigmoid activation function.
open-source systems such as Spring Boot, MyBatis, MySQL, Redis, ES,
Kafka, etc.
This approach enables the reward model to capture the
4. Good stress resistance, communication, learning, collaboration skills, market preferences for job seekers based on the feedback
and a strong sense of responsibility. from recruiters. Moreover, we can use the reward model
5. Prior experience in recommendation/search engineering
development in Internet companies is preferred. to predict the matching score between a job seeker and a
generated job description, thereby verifying the suitability of
Fig. 3. The prompt template of training step one. the recommendation in advance.
C. Reinforcement Learning
In this stage, we aim to improve the alignment between
the generator G and the recruiter feedback acquired by the
Lsf t = − log Pr(C|J, T, G) reward model U through reinforcement learning. Drawing
|lj | inspiration from InstructGPT [5], we employ the Proximal
X (1)
=− log Pr(vi |v<i , C, T, G), Policy Optimization (PPO) [25] algorithm to facilitate this
i=1 alignment process. Specifically, we first utilize the generator
G and the reward model U obtained from the first two training
where lj is the length of J, vi is the i-th word in J. steps to initialize the actor-critic model, comprising the actor
Pr(C|J, T, G) denotes the generation probability for J of the model G a and critic model U c . Next, we collect a RL training
dataset, which only consists of the CVs of job seekers which 5) Critic Model Optimization: The critic model loss is the
do not appear in the first two stages. Then, we use the PPO MSE loss between the reward value and the estimated state
algorithm to train the actor-critic model based on these CVs value as:
while freezing the generator and the reward model. Finally, Lcm = (ri − U c (Cir , ))2 (8)
we use the actor as the new generator model. The entire
optimization algorithm is an iterative process and the ensuing The above five steps constitute one iteration of the optimiza-
sub-sections expound on the details of an iteration. tion process. Through minimizing the actor loss and critic loss,
1) Job Description Generation: We first samples some CVs we can optimize two models. In the RL process, the reward
C r from the training data and then leverage the actor model G a model and the generator model are froze. Moreover, the whole
to generate JDs J r = {G a (C)|C ∈ C r } for these samples. For RL process are shown in Algorithm 1.
simplicity, we take the i-th sample Cir with its corresponding
generated JD Jir as the example to illustrate the following Algorithm 1: Proximal Policy Optimization
calculation steps. Require: Initial actor model G a , critic model U c ,
2) KL Divergence Computation: To ensure the convergence optimization steps K, minibatch size Br , epochs E,
and stability of the RL algorithm, the PPO algorithm uses KL learning rates αam and αcm , clipping parameter ϵ, KL
divergence to limit the range of changes in the policy during coefficient λ.
each update. The KL divergence is a metric for measuring the 1: for iteration = 1, 2, . . . do
difference between the current policy, i.e., the actor model G a , 2: Sample a set of CVs C r from the training data.
and the old policy, i.e., G. 3: Generate JDs for the sampled CVs by the generator
Specifically, given the pair of CV Cir and generated JD Jir , model G a , J r .
i=|C r |
we can estimate the KL divergence as follows: 4: Compute the discounted rewards {ri }i=1 and the
i=|C r |
advantages {ai }i=1 using Eq.(5) and (6).
1 X
KL(Cir , Jir ) = (CE(vi,j ) − 1 − log CE(vi,j )) , 5: Update the actor model parameters θ(G a ) and critic
|Jir | r model parameters ϕ(U c ) as follows:
vj ∈Ji
Pr(vj |vi,<j , C, G a ) 6: for epoch = 1, 2, . . . , E do
CE(vj ) = , 7: Shuffle the dataset D.
Pr(vj |vi,<j , C, G)
(3) 8: Divide D into minibatches of size Br .
9: for each minibatch do
where vi,j and vi,<j denote the j-th token and first (j − 1) 10: Compute the policy loss Lam
tokens of the JD Jir , respectively. 11: Compute the value function loss Lcm
3) Reward and Advantages Computation: The final reward 12: Update the actor model parameters using the
consists of two different parts, respectively the matching score policy loss and learning rate αam :
predicted by the reward model and the KL divergence, and can
θ(G a ) ← θ(G a ) − αam ∇θ Lam
be fomulated as follows:

ri = U(Cir , Jir ) − λKL(Cir , Jir ), (4) 13: Update the critic model parameters using the
value function loss and learning rate αcm :
where λ is the coefficient of the KL divergence.
Furthermore, the advantage value is the difference between ϕ(U c ) ← ϕ(U c ) − αcm ∇ϕ Lcm
the reward and the value of the input CV estimated by the
critic model as: 14: end for
15: end for
ai = ri − U c (Cir , ). (5) 16: end for
4) Actor Model Optimization: After obtaining the above
values, we can finally calculate the policy loss, i.e., the loss
of actor model. Here, we use the importance sampling and V. G ENERATION -E NHANCED R ECOMMENDATION
clip tricks to estimate the loss as: F RAMEWORK

1 X   In Section IV we introduced the paradigm of generative


Lam = r min CE(vi,j )ai , clip(CE(vi,j ))ai , (6) job recommendation. As we mentioned before, given a CV
|Ji | r
vj ∈Ji corresponding to a job seeker, we can utilize LLMs to generate
the most suitable JD, thereby providing career development
 guidance for this job seeker. Furthermore, in this paper, we
1 + ϵ,
 CE(vi,j ) > 1 + ϵ propose that we can actually regard the above paradigm
clip(CE(vi,j )) = CE(vi,j ), 1 − ϵ < CE(vi,j ) < 1 + ϵ as a feature extraction process, which can further enhance


1 − ϵ. CE(vi,j ) < 1 − ϵ the performance of traditional discriminative recommendation
(7) methods. In this section, we delve into the details of how to
leverage the generated results provided by LLMs for enhanced TABLE I
job recommendation. S TATISTICS OF THE DATASETS .

A. Basic Recommendation Model Description Number


# of data for supervised fine-tuning 153,006
As shown in Figure 1 (a), in the paradigm of discriminative # of data for reward model training 303,929
recommendation based on text matching, given a job seeker # of data for reinforcement learning 37,600
# of data in training set for enhanced recommendation 37,158
s with the corresponding CV C, and a job j with the # of data in validation set for enhanced recommendation 4,542
corresponding JD J, we first need to get the text embedding # of data in test set for enhanced recommendation 6,300
based on a text encoder as:
c = Encoder(C), j = Encoder(J). (9) A. Data Description and Preprocessing
Then, we can get the matching score by feeding the above The real-world datasets used in this paper comes from
embedding vectors to a predictor. In this paper, we studied one of the largest online recruitment platform in China.
two different predictors, respectively MLP predictor as: In our datasets, each job seeker and recruiter is de-linked
from the production system by securely hashing with one-
score = M LP ([c; j]), (10)
time salt mapping. In this platform, each job seeker has a
where [; ] is the concatenation of two vectors, and dot predictor Curriculum Vitae (CV), encompassing their basic demographic
as follows: information, educational background, and work experience
score = c · j, (11) among other details. Meanwhile, each job is associated with
a Job Description (JD), detailing the responsibilities of the
where · calculates the dot product of two vectors.
role, the compensation package, and so on. A variety of
B. Enhanced Recommendation Model interaction types may occur between job seekers and jobs,
As shown in Figure 1 (c), in the paradigm of generation- such as browsing, applying, and matched. In this paper, we
enhanced job recommendation, we can get the generated JD categorize these interactions into two major types, respectively
J ′ based on the CV C and the LLM-based generator G. Then, matched and mismatched.
we can also get the text embedding of J ′ as: To train a large language model for generative job recom-
mendation, we built the following three dataset:
j′ = Encoder(J ′ ). (12) • Supervised Fine-tuning Dataset: This dataset contains

After that, we propose two different ways to utilize j for ′ multiple matched CV-JD pairs, ranging from Apr. 1,
enhancing the recommendation task corresponding to different 2023, to Apr. 30, 2023.
predictor. Specifically, for the MLP predictor, we propose to • Reward Model Training Dataset: This dataset contains

calculate the matching score as: multiple matched and mismatched CV-JD pairs, ranging
from May. 1, 2023 to May. 7, 2023.
score = M LP ([c; j; j′ ]). (13) • Reinforcement Learning Dataset: This dataset contains

For the dot predictor, we first get the enhanced job seeker CVs only, ranging from May. 8, 2023, to May. 10, 2023.
embedding as: Furthermore, to evaluate whether the generated results can
c′ = M LP ([c; j′ ]). (14) enhance the performance of traditional discriminative models,
we built the following dataset:
Then, we can calculate the dot product as:
• Enhanced Recommendation Dataset: This dataset con-
score = c′ · j. (15) tains multiple matched and mismatched CV-JD pairs,
ranging from May. 8, 2023 to May. 31, 2023.
VI. E XPERIMENTS
Detailed statistics of the above datasets are shown in Table I.
In this section, we first describe the dataset used in this
paper. Then, we propose to evaluate our approch from two B. Evaluation and Baselines
different perspectives. We further present some discussions In this paper, we propose to evaluate the effectiveness of our
and case studies on generative job recommendation. The GIRL approach from the following two perspectives. Firstly,
experiments are mainly designed to answer the research ques- with the assistance of ChatGPT, we evaluated the quality of
tions as follows: the generated results from semantic perspective. Secondly,
• RQ1: Can our LLM-based generator generate high- we evaluated whether the generated results can enhance the
quality JDs? performance of discriminative recommendation.
• RQ2: Can the generated results enhance the performance For generation quality evaluation, we first selected several
of discriminative job recommendation? baseline methods to compare with our method as:
• RQ3: Whether the specially designed training methods • GIRL: This is the method proposed in this paper which
for the LLM effective? utilized both SFT and RL for fine-tuning.
• RQ4: How do different settings influence the effective- • GIRL-SFT: This method is a simplified variant GIRL
ness of our model? which only utilized SFT for fine-tuning.
[Question] TABLE II
Please generate a suitable job description based on the curriculum vitae C OMPARISON OF G ENERATION Q UALITY ACROSS D IFFERENT M ODELS .
of the following candidate: xxx
[Assistant 1]
XXX Model Pair Win Tie Lose Adv.
[End of Assistant 1]
[Assistant 2]
GIRL v.s. LLaMA-7b 0.63 0.07 0.30 0.33
XXX GIRL v.s. BLOOMZ-7b 0.74 0.08 0.18 0.56
[End of Assistant 2] GIRL v.s. GIRL-SFT 0.45 0.25 0.26 0.19
[System] GIRL v.s. BELLE-7b 0.45 0.17 0.36 0.09
We would like to request your feedback on the performance of two Al
assistants in recommended job description (i.e., JD) to the job seeker GIRL-SFT v.s. LLaMA-7b 0.55 0.03 0.42 0.13
displayed above. Please evaluate the given three aspects of their
generated job descriptions: GIRL-SFT v.s. BLOOMZ-7b 0.73 0.07 0.19 0.54
GIRL-SFT v.s. BELLE-7b 0.49 0.06 0.41 0.08
Level of details: The job description must include the job title, job
requirements, skill requirements, job responsibilities, and may include BELLE-7b v.s. LLaMA-7b 0.61 0.04 0.34 0.27
salary information. BELLE-7b v.s. BLOOMZ-7b 0.65 0.07 0.17 0.48
Relevance: The job requirements need to match the candidate‘s skills,
educational background, and work experience.
Conciseness: The job description must not contain repetition or
redundancy. Unnecessary information, such as company introduction,
interview format, interview location, and contact information, should be
For evaluating the effectiveness of the generated results for
avoided as much as possible. recommendation enhancement, we selected several baseline
Please first clarify how each response achieves each aspect respectively.
methods to compare with our method as:
Then, provide a comparison on the overall performance among Assistant • Base: This method is a traditional two-tower text match-
1 - Assistant 2, and you need to clarify which one is better than or equal
to another. Avoid any potential bias and ensuring that the order in which ing model as shown in Figure 1 (a). We chose BERT [11]
the responses were presented does not affect your judgment. In the last as the text encoder for getting the CV and JD embedding.
line, order the two assistants. Please output a single line ordering
Assistant 1 – Assistant 2, where ‘›’ means ‘is better than’ and ‘=’ • GIRL-SFT: As shown in Figure 1 (c), this method uses
means ‘is equal to’. The order should be consistent to your the generated JDs for recommendation enhancement.
comparison. If there is not comparison that one is better, it is assumed
they have equivalent overall performance ('='). Only SFT is used for fine-tuning the LLM.
• GIRL: This method uses the generated JDs for recom-

Fig. 4. The prompt template for generation quality evaluation. mendation enhancement. Both SFT and RL are used for
fine-tuning the LLM.
Note that as we mentioned is Section V, we proposed two
• Other LLMs: BELLE-7b [26], BLOOMZ-7b [27], different methods for the predictor, respectively MLP and Dot.
LLAMA-7b [28]. We will test the performance of different models with these
Furthermore, we propose to utilize ChatGPT as the evaluator two different predictors. We selected AUC and LogLoss as the
to compare the generation quality of these methods. Specifi- evaluation metric for the enhanced recommendation task.
cally, we first input the CV and two different JDs generated C. Performance of Generation Quality (RQ1,RQ3)
by two different methods into the prompt. We then request
To validate the quality of the JDs generated by our model,
ChatGPT to evaluate the results from the following three
we first built a evaluation set with 200 different CVs which
different perspectives:
do not appear in other dataset. Then, we compared GIRL with
• Level of details: Whether the generated JD contains all the baseline models on this dataset, and the results are
enough necessary information about the job. shown in Table II. From the results, we can get the following
• Relevance: Whether the generated JD is suitable for the observations:
job seeker.
1) The performance of the BELLE model significantly
• Conciseness: Whether the generated JD is fluid and has
surpasses that of LLaMA and BLOOMZ. This under-
high readability.
lines that instruction-tuning with instructions on Chinese
The detailed prompt template [29], [30] for generation quality datasets can substantially enhance the quality of the
is shown in Figure 4, from which we can find that the outputs in Chinese.
output results of ChatGPT can be divided into three categories, 2) Both GIRL and GIRL-SFT outperform all the baseline
respectively “Win”, “Tie”, and “Lose”. Based on the output methods, emphasizing the necessity of instruction tuning
results, given the dataset for generation quality evaluation, we on domain-specific data.
selected “Win Rate (Win)”, “Tie Rate (Tie)”, and “Lose Rate 3) GIRL exceeds GIRL-SFT in performance, demonstrat-
(Lose)”, which is obtained by calculating the proportion of ing that reinforcement learning can better align the
the above three results, as three different evaluation metrics. results generated by the LLMs with human preferences,
Note that we use boot strapping [21] strategy to avoid the thereby improving the quality of generated results.
position bias when using ChatGPT as the ranker. Furthermore,
we define “Advantage (Adv.)”, which is the difference between D. Performance of Enhanced Recommendaion (RQ2,RQ3)
“Win Rate” and “Lose Rate”, as another evaluation metrics to To demonstrate the effectiveness of the generation results
reflect the relative improvement. for enhancing the discriminative recommendation task, we
TABLE III TABLE IV
OVERALL P ERFORMANCE OF D IFFERENT M ODELS ON THE P ERFORMANCE OF D IFFERENT M ODELS ON THE D ISCRIMINATIVE J OB
D ISCRIMINATIVE J OB R ECOMMENDATION . R ECOMMENDATION U NDER C OLD - START C ONDITION .

Predictor Model AUC(↑) LogLoss(↓) Predictor Model AUC(↑) LogLoss(↓)


Base 0.6349 0.4043 Base 0.6198 0.4270
MLP GIRL-SFT 0.6438(+1.4%) 0.3973(+1.7%) MLP GIRL-SFT 0.6293(+1.5%) 0.4154(+2.8%)
GIRL 0.6476(+2.0%) 0.3908(+3.3%) GIRL 0.6347(+2.4%) 0.4229(+1.0%)
Base 0.6258 0.4964 Base 0.6136 0.5233
Dot GIRL-SFT 0.6291(+0.5%) 0.3688(+20.3%) Dot GIRL-SFT 0.6231(+1.5%) 0.3827(+26.9%)
GIRL 0.6436(+2.8%) 0.3567(+28.1%) GIRL 0.6457(+5.2%) 0.3673(+29.8%)


*,5/6)7 in Figure 5. Note that we employed only 75% of the data
*,5/ in Section VI-D to accelerate the computation process. From
 the results, we can find that as the number of generated JDs
increases, the model performance initially improves before
$8&

subsequently declining. This suggests that moderately increas-


 ing the number of generated JDs can further enhance model
performance. However, a larger number of JDs also implies
a substantial increase in computational cost. Moreover, the
 performance of GIRL surpasses that of GIRL-SFT in most
cases, which once again affirms the superiority of the RL-
     based fine-tuning method proposed in this paper.
*HQUHDWLRQ1XPEHU
F. Discussion on Cold Start (RQ4)
Fig. 5. Performance of different models with different generation number.
In this section, we will explore the performance of differ-
ent models under cold-start condition on the discriminative
compare GIRL with all the baseline methods, and the results job recommendation task. Specifically, cold start condition
are shown in Table III. From the results, we can get the refers to recommending jobs for job seekers who have not
following observations: appeared in the training set. The results are shown in Table IV.
Compared with Table III, we can find that the performance
1) Both GIRL and GIRL-SFT outperform the Base model,
improvement of our models in cold-start conditions is more
demonstrating that the JDs generated by fine-tuned
significant. This indicates that the JDs generated by LLMs can
LLMs can effectively enhance the performance of dis-
more effectively assist discriminative recommendation models
criminative job recommendation.
in enhancing performance under cold-start conditions.
2) GIRL surpasses GIRL-SFT on all the evaluation metrics.
The rationale behind this is that through the reward G. Case Study
model training stage, our reward model encapsulates In this section, we will conduct a case study of the generated
extensive real-world experiences. By incorporating this results from different models for the same CV, and the results
knowledge into the LLMs through reinforcement learn- are shown in Figure 6. From the results we can find that the
ing, the generated JDs are enabled to capture job- vanilla BELLE model without finetuning fails to generate job
seeker traits precisely and align with the preferences of JDs in a standard format, and the generated JDs present vague
recruiters better. descriptions of job-related skills and requirements, providing
inadequate guidance for job seekers. Moreover, we can find
E. Discussion on Generation Number (RQ4)
that after being trained through reinforcement learning, the
In Section V we studied how to utilize the generated JD for GIRL model generates results that are more standardized in
enhancing discriminative recommendation, where we focus on format, more detailed and comprehensive in content, and more
utilizing a single JD. Indeed, owing to the inherent randomness aligned with the individual circumstances of job seekers. The
in the text generation process, given a specific CV, the LLM above results demonstrate the effectiveness of the three-stage
is capable of generating multiple distinct JDs. In this section, training method proposed in this paper.
we will explore how to utilize multiple generated JDs and
discuss the influence of the number of JDs on the model VII. CONCLUSION
performance. Specifically, given multiple JDs, we first get the Reflecting on the recent advancements in the field of Large
text embedding of each JD by Equation 12. Then, we use Language Models, this study presented a novel generative job
mean pooling to fuse these JD embedding, and calculate the recommendation paradigm named GeneratIve job Recommen-
matching score following Equation 13. The results are shown dation based on Large language model (GIRL). Specifically,
Input CV:
Basic information: The candidate has a bachelor's degree, majoring in Software Engineering.
personal profile:
1. He is cheerful and positive, with strong organizational and communication skills. His ability to learn quickly enables him to competently undertake work across various fields.
2. He served as a project assistant previously with four years of work experience. Prior job duties involved participating in project initiation, being responsible for holding project
bidding meetings, participating in the setting of project milestones, following up on project progress, and handling the daily management of the project and regular reporting
to the second party.

Generation Result from BELLE:


This candidate is suitable for roles such as Project Manager or Project Assistant. Job responsibilities include project management, progress control, risk management,
communication coordination, and funding supervision. They are expected to participate in project tendering meetings, set project milestones, and carry out daily management
tasks. They will be responsible for the collection and summary of project tail materials. Job requirements include relevant educational background and work experience, strong
communication, organizational and teamwork abilities, along with a robust learning capacity and work attitude. Salary varies according to different levels of work experience.

Generation Result from GIRL-SFT:


Job Title: Requirements Analyst Educational Requirements: Bachelor's Degree Experience Requirements: 1~3 years Salary: (15k~30k) * 12
Job Description:
1. Responsible for project requirement analysis, planning, and writing
2. Responsible for project requirement research, requirement analysis report writing, and requirement exploration, with a deep understanding of customer business processes
and business requirements
3. Able to formulate reasonable, effective, and clear requirement documents, ensuring the standardization and readability of documents, and sharing and explaining these
documents
4. Responsible for project plan formulation, requirement planning, as well as tracking and managing changes in requirements
5. Able to handle complex project requirements, analyze customer business logic, and provide industry-leading business requirements
6. Complete other tasks assigned by leaders.

Generation Result from GIRL:


Job Title: Project Assistant Educational Requirements: Bachelor's Degree Experience Requirements: 3~5 years Salary: (20k~35k) * 12
Job Description:
1. Responsible for project requirement analysis, planning, and planning, and organizing the writing of project requirement specification documents;
2. Responsible for project management, promoting project progress, coordinating all participants, and ensuring the smooth progress of the project;
3. Responsible for project risk management, identifying, evaluating, giving early warning and controlling project risks;
4. Responsible for project document preparation, including requirement specifications, project plans, execution plans, summary reports, etc;
5. Responsible for project delivery, monitoring and tracking the quality and quantity of deliverables.
Job Requirements:
1. Bachelor's degree or above, with more than 3 years of relevant work experience in the Internet or software industry;
2. Possess project management, system design, system analysis design abilities, familiar with software engineering architecture, and understand software project management
process;
3. Possess certain system design and system analysis abilities, able to independently conduct software design and system design work;
4. Possess good team collaboration ability, communication ability and learning ability, and can withstand a certain work pressure;
5. Possess better stress resistance, initiative in work, and can bear a certain work pressure;
6. Possess strong communication ability, logical thinking ability, problem-solving ability and learning ability;
7. Those with software project management experience are preferred.

Fig. 6. A case study of the generation results from different models.

we first utilized supervised fine-tuning to guide the LLM- learning,” ACM Trans. Manag. Inf. Syst., vol. 9, no. 3, pp. 12:1–12:17,
based generator in creating an appropriate job description 2018.
[4] L. Wu, Z. Zheng, Z. Qiu, H. Wang, H. Gu, T. Shen, C. Qin, C. Zhu,
given a specific curriculum vitae. Subsequently, we devel- H. Zhu, Q. Liu et al., “A survey on large language models for
oped a reward model predicated on feedback from recruiters, recommendation,” arXiv preprint arXiv:2305.19860, 2023.
and then implemented a proximal policy optimization based [5] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin,
C. Zhang, S. Agarwal, K. Slama, A. Ray et al., “Training language
reinforcement learning methodology to synchronize the gen- models to follow instructions with human feedback,” Advances in Neural
erator with recruiter preferences. Furthermore, we proposed Information Processing Systems, vol. 35, pp. 27 730–27 744, 2022.
to enhance the job seeker features by the generated results, [6] C. Yang, Y. Hou, Y. Song, T. Zhang, J. Wen, and W. X. Zhao,
“Modeling two-way selection preference for person-job fit,” in RecSys
aiming to improve the performance of the discriminative job ’22: Sixteenth ACM Conference on Recommender Systems, Seattle, WA,
recommendation model. The series of experiments conducted USA, September 18 - 23, 2022. ACM, 2022, pp. 102–112.
on a real-world dataset from a large-scale online recruitment [7] B. Fu, H. Liu, Y. Zhu, Y. Song, T. Zhang, and Z. Wu, “Beyond
matching: Modeling two-sided multi-behavioral sequences for dynamic
platform provided substantial evidence of the effectiveness of person-job fit,” in Database Systems for Advanced Applications - 26th
our proposed approach. International Conference, DASFAA 2021, Taipei, Taiwan, April 11-14,
2021, Proceedings, Part II, ser. Lecture Notes in Computer Science, vol.
12682. Springer, 2021, pp. 359–375.
R EFERENCES [8] B. Min, H. Ross, E. Sulem, A. P. B. Veyseh, T. H. Nguyen, O. Sainz,
E. Agirre, I. Heinz, and D. Roth, “Recent advances in natural language
[1] R. Le, W. Hu, Y. Song, T. Zhang, D. Zhao, and R. Yan, “Towards processing via large pre-trained language models: A survey,” arXiv
effective and interpretable person-job fitting,” in Proceedings of the preprint arXiv:2111.01243, 2021.
28th ACM International Conference on Information and Knowledge [9] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang,
Management, CIKM 2019, Beijing, China, November 3-7, 2019. ACM, J. Zhang, Z. Dong et al., “A survey of large language models,” arXiv
2019, pp. 1883–1892. preprint arXiv:2303.18223, 2023.
[2] C. Qin, H. Zhu, T. Xu, C. Zhu, L. Jiang, E. Chen, and H. Xiong, [10] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
“Enhancing person-job fit for talent recruitment: An ability-aware neural Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in
network approach,” in The 41st International ACM SIGIR Conference neural information processing systems, vol. 30, 2017.
on Research & Development in Information Retrieval, SIGIR 2018, Ann [11] J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of
Arbor, MI, USA, July 08-12, 2018. ACM, 2018, pp. 25–34. deep bidirectional transformers for language understanding,” in NAACL-
[3] C. Zhu, H. Zhu, H. Xiong, C. Ma, F. Xie, P. Ding, and P. Li, “Person-job HLT (1). Association for Computational Linguistics, 2019, pp. 4171–
fit: Adapting the right talent for the right job with joint representation 4186.
[12] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis,
L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized BERT
pretraining approach,” CoRR, vol. abs/1907.11692, 2019.
[13] Z. Yang, Z. Dai, Y. Yang, J. G. Carbonell, R. Salakhutdinov, and
Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language
understanding,” in NeurIPS, 2019, pp. 5754–5764.
[14] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever et al., “Improving
language understanding by generative pre-training,” 2018.
[15] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever et al.,
“Language models are unsupervised multitask learners,” OpenAI blog,
vol. 1, no. 8, p. 9, 2019.
[16] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal,
A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language mod-
els are few-shot learners,” Advances in neural information processing
systems, vol. 33, pp. 1877–1901, 2020.
[17] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin,
C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton,
F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. F. Christiano,
J. Leike, and R. Lowe, “Training language models to follow instructions
with human feedback,” in NeurIPS, 2022.
[18] Z. Qiu, X. Wu, J. Gao, and W. Fan, “U-bert: Pre-training user repre-
sentations for improved recommendation,” in Proceedings of the AAAI
Conference on Artificial Intelligence, vol. 35, no. 5, 2021, pp. 4320–
4327.
[19] F. Sun, J. Liu, J. Wu, C. Pei, X. Lin, W. Ou, and P. Jiang, “Bert4rec:
Sequential recommendation with bidirectional encoder representations
from transformer,” in Proceedings of the 28th ACM international confer-
ence on information and knowledge management, 2019, pp. 1441–1450.
[20] J. Liu, C. Liu, R. Lv, K. Zhou, and Y. Zhang, “Is chatgpt a good
recommender? a preliminary study,” arXiv preprint arXiv:2304.10149,
2023.
[21] Y. Hou, J. Zhang, Z. Lin, H. Lu, R. Xie, J. McAuley, and W. X.
Zhao, “Large language models are zero-shot rankers for recommender
systems,” arXiv preprint arXiv:2305.08845, 2023.
[22] D. Sileo, W. Vossen, and R. Raymaekers, “Zero-shot recommendation
as language modeling,” in Advances in Information Retrieval: 44th
European Conference on IR Research, ECIR 2022, Stavanger, Norway,
April 10–14, 2022, Proceedings, Part II. Springer, 2022, pp. 223–230.
[23] K. Bao, J. Zhang, Y. Zhang, W. Wang, F. Feng, and X. He, “Tallrec: An
effective and efficient tuning framework to align large language model
with recommendation,” arXiv preprint arXiv:2305.00447, 2023.
[24] J. Zhang, R. Xie, Y. Hou, W. X. Zhao, L. Lin, and J.-R. Wen, “Recom-
mendation as instruction following: A large language model empowered
recommendation approach,” arXiv preprint arXiv:2305.07001, 2023.
[25] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov,
“Proximal policy optimization algorithms,” CoRR, vol. abs/1707.06347,
2017.
[26] Y. Ji, Y. Deng, Y. Gong, Y. Peng, Q. Niu, B. Ma, and X. Li,
“Belle: Be everyone’s large language model engine,” https://github.com/
LianjiaTech/BELLE, 2023.
[27] N. Muennighoff, T. Wang, L. Sutawika, A. Roberts, S. Biderman,
T. L. Scao, M. S. Bari, S. Shen, Z.-X. Yong, H. Schoelkopf, X. Tang,
D. Radev, A. F. Aji, K. Almubarak, S. Albanie, Z. Alyafeai, A. Webson,
E. Raff, and C. Raffel, “Crosslingual generalization through multitask
finetuning,” 2022.
[28] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux,
T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al.,
“Llama: Open and efficient foundation language models,” arXiv preprint
arXiv:2302.13971, 2023.
[29] Z. Chen, J. Chen, H. Zhang, F. Jiang, G. Chen, F. Yu, T. Wang, J. Liang,
C. Zhang, Z. Zhang, J. Li, X. Wan, H. Li, and B. Wang, “Llm zoo: de-
mocratizing chatgpt,” https://github.com/FreedomIntelligence/LLMZoo,
2023.
[30] Z. Chen, F. Jiang, J. Chen, T. Wang, F. Yu, G. Chen, H. Zhang,
J. Liang, C. Zhang, Z. Zhang, J. Li, X. Wan, B. Wang, and H. Li,
“Phoenix: Democratizing chatgpt across languages,” arXiv preprint
arXiv:2304.10453, 2023.

You might also like