Professional Documents
Culture Documents
Generative Job Recommendations With Large Language Model
Generative Job Recommendations With Large Language Model
Generative Job Recommendations With Large Language Model
Language Model
Zhi Zheng1,2,† , Zhaopeng Qiu1,† , Xiao Hu1 , Likang Wu1,2 , Hengshu Zhu1∗ , Hui Xiong3,4∗
1
Career Science Lab, BOSS Zhipin.
2
University of Science and Technology of China,
3
The Thrust of Artificial Intelligence, The Hong Kong University of Science and Technology.
4
The Department of Computer Science and Engineering, The Hong Kong University of Science and Technology.
zhengzhi97@mail.ustc.edu.cn, zhpengqiu@gmail.com, zhuhengshu@gmail.com, xionghui@ust.hk
Abstract—The rapid development of online recruitment ser- in practical applications, these methods will encounter the
vices has encouraged the utilization of recommender systems
arXiv:2307.02157v1 [cs.IR] 5 Jul 2023
Predictor Predictor
CV JD CV JD
Embedding Embedding Recommend Embedding Embedding
Generated
LLM-based JD
Curriculum Job Generator Curriculum Job
Vitae Description Vitae Description
the length of C. Similarly, each job j has a corresponding JD with the instruction-tuning data of the our used backbone.
which can be formatted as J = [v1 , . . . , vlj ], where vi is the • Instruction: the black words, which describes the gener-
i-th word in J and lj is the length of J. Note that we omit ation task via the human natural language.
some subscripts to facilitate the reading. • Input: the blue words, which contains the information of
matching score between a given job seeker s and a job i.e., the JD text. Note that this part will be blank in the
j. However, this traditional paradigm can only recommend inference phase.
existing jobs for job seekers, which may not fulfill the needs Then, we propose to train the generator with the casual
of some job seekers. Therefore, in this paper, we propose a language model pre-training task. Specifically, given the gen-
novel generative job recommendation paradigm which can be erator G, the CV C, and the prompt template T , we optimize
formulated as: the negative log-likelihood for generating the JD J as:
Step 1 – Supervised Fine-tuning Step 2 – Reward Model Training Step 3 - PPO
Collect matched data, and train a supervised generator. Collect comparison data, and train a reward model. Refine the generator using reinforcement learning.
[ Education Experience ]
[ Education Experience ] XX University, 2010~2014
XX University, 2010~2014 Bachelor
Matched Bachelor Sample a new CV [ Work Experience ]
[ Work Experience ] Company A, 2018~2023
Company A, 2018~2023 to construct the Senior Engineer, XXXX
Senior Engineer, XXXX
Matched prompt data
[ Education Experience ] [ Job Position ] Mismatched
XX University, 2014~2017 Senior Engineer
Select the Prompt
Computer Science, Master
Sample the XX University, 2010~2014 [ Responsibility ]
positive and Generate a JD by
matched CV-JD Computer Science, Bachelor Develop and maintain our
the policy LLM based Generator
web applications using negative pairs
pairs as the [ Work Experience ]
Company A, 2018~2023
modern front-end
technologies generator
training data Senior Engineer [ Job Position ]
Senior Engineer
[ Job Position ]
Engineer
XXXX [ Job Requirements ]
5+ years of experience in [ Job Position ]
[ Skills ] front-end development [ Responsibility ] [ Responsibility ]
Develop our web Learn from and collaborate Engineer
.. ..
applications using modern with senior front-end
front-end technologies engineers [ Responsibility ]
Input CV Target JD Assist in developing and
RL loss
Construct prompt Prompt Template maintaining our web
applications using modern
with manual front-end technologies
Ranking loss
matching scores reward for the Reward Model
Fine-tune LLM generated JD
with supervised LLM based Generator
learning Train the model by Update the policy
1 0 0.5
the ranking loss generator using PPO
Human: generator model G given the job seeker feature C and the
Please generate a suitable job description based on the curriculum vitae prompt template T .
of the following candidate:
Basic information: The candidate is a 27-year-old female with a
bachelor's degree, majoring in Software Engineering.
B. Reward Model Training
personal profile:
1. She has research experience in the field of data processing and
In this training step, our aim is to train a reward model U
computation, and development experience with components from that can predict the matching score between a CV-JD pair, i.e.,
ElasticSearch, Hbase, and the Flink ecosystem. U : (C, J) → R. The architecture of U is similar to that of
2. She possesses strong abilities to read and understand source code,
having reviewed core source code from JDK8, Spring, Spring Boot, the generator model G, but it has a linear prediction head that
and Netty. outputs scalar values. Additionally, the parameter scale of U
Assistant: is smaller than that of G.
Job title: Java Developer To train the reward model U, we collect pairwise training
Salary range: 25k-40k* 15
Responsibilities: data and construct a ranking task. Typically, a job seeker
1. Support the recommendation product and algorithm departments, applies for multiple jobs simultaneously and receives different
rapidly iterate daily needs, complete effect analysis and continuous
improvements. feedback (matched or rejected) from recruiters. Therefore, we
2. Track industry developments, timely introduction of external select a matched job J + and a mismatched job J − for each
advanced experiences.
Qualifications: CV C to construct comparable pairs. We then optimize the
1. Proficient in Java programming basics, with excellent abilities and pairwise ranking loss to train U as follows:
experience in Java IO and multithreading programming.
2. In-depth understanding of JVM, JVM tuning experience, and
experience with distributed systems and concurrency scenarios are
Lrmt = log σ(U(C, J + ) − U(C, J − )), (2)
preferred.
3. Proficient in applying mainstream development frameworks and where σ denotes the Sigmoid activation function.
open-source systems such as Spring Boot, MyBatis, MySQL, Redis, ES,
Kafka, etc.
This approach enables the reward model to capture the
4. Good stress resistance, communication, learning, collaboration skills, market preferences for job seekers based on the feedback
and a strong sense of responsibility. from recruiters. Moreover, we can use the reward model
5. Prior experience in recommendation/search engineering
development in Internet companies is preferred. to predict the matching score between a job seeker and a
generated job description, thereby verifying the suitability of
Fig. 3. The prompt template of training step one. the recommendation in advance.
C. Reinforcement Learning
In this stage, we aim to improve the alignment between
the generator G and the recruiter feedback acquired by the
Lsf t = − log Pr(C|J, T, G) reward model U through reinforcement learning. Drawing
|lj | inspiration from InstructGPT [5], we employ the Proximal
X (1)
=− log Pr(vi |v<i , C, T, G), Policy Optimization (PPO) [25] algorithm to facilitate this
i=1 alignment process. Specifically, we first utilize the generator
G and the reward model U obtained from the first two training
where lj is the length of J, vi is the i-th word in J. steps to initialize the actor-critic model, comprising the actor
Pr(C|J, T, G) denotes the generation probability for J of the model G a and critic model U c . Next, we collect a RL training
dataset, which only consists of the CVs of job seekers which 5) Critic Model Optimization: The critic model loss is the
do not appear in the first two stages. Then, we use the PPO MSE loss between the reward value and the estimated state
algorithm to train the actor-critic model based on these CVs value as:
while freezing the generator and the reward model. Finally, Lcm = (ri − U c (Cir , ))2 (8)
we use the actor as the new generator model. The entire
optimization algorithm is an iterative process and the ensuing The above five steps constitute one iteration of the optimiza-
sub-sections expound on the details of an iteration. tion process. Through minimizing the actor loss and critic loss,
1) Job Description Generation: We first samples some CVs we can optimize two models. In the RL process, the reward
C r from the training data and then leverage the actor model G a model and the generator model are froze. Moreover, the whole
to generate JDs J r = {G a (C)|C ∈ C r } for these samples. For RL process are shown in Algorithm 1.
simplicity, we take the i-th sample Cir with its corresponding
generated JD Jir as the example to illustrate the following Algorithm 1: Proximal Policy Optimization
calculation steps. Require: Initial actor model G a , critic model U c ,
2) KL Divergence Computation: To ensure the convergence optimization steps K, minibatch size Br , epochs E,
and stability of the RL algorithm, the PPO algorithm uses KL learning rates αam and αcm , clipping parameter ϵ, KL
divergence to limit the range of changes in the policy during coefficient λ.
each update. The KL divergence is a metric for measuring the 1: for iteration = 1, 2, . . . do
difference between the current policy, i.e., the actor model G a , 2: Sample a set of CVs C r from the training data.
and the old policy, i.e., G. 3: Generate JDs for the sampled CVs by the generator
Specifically, given the pair of CV Cir and generated JD Jir , model G a , J r .
i=|C r |
we can estimate the KL divergence as follows: 4: Compute the discounted rewards {ri }i=1 and the
i=|C r |
advantages {ai }i=1 using Eq.(5) and (6).
1 X
KL(Cir , Jir ) = (CE(vi,j ) − 1 − log CE(vi,j )) , 5: Update the actor model parameters θ(G a ) and critic
|Jir | r model parameters ϕ(U c ) as follows:
vj ∈Ji
Pr(vj |vi,<j , C, G a ) 6: for epoch = 1, 2, . . . , E do
CE(vj ) = , 7: Shuffle the dataset D.
Pr(vj |vi,<j , C, G)
(3) 8: Divide D into minibatches of size Br .
9: for each minibatch do
where vi,j and vi,<j denote the j-th token and first (j − 1) 10: Compute the policy loss Lam
tokens of the JD Jir , respectively. 11: Compute the value function loss Lcm
3) Reward and Advantages Computation: The final reward 12: Update the actor model parameters using the
consists of two different parts, respectively the matching score policy loss and learning rate αam :
predicted by the reward model and the KL divergence, and can
θ(G a ) ← θ(G a ) − αam ∇θ Lam
be fomulated as follows:
ri = U(Cir , Jir ) − λKL(Cir , Jir ), (4) 13: Update the critic model parameters using the
value function loss and learning rate αcm :
where λ is the coefficient of the KL divergence.
Furthermore, the advantage value is the difference between ϕ(U c ) ← ϕ(U c ) − αcm ∇ϕ Lcm
the reward and the value of the input CV estimated by the
critic model as: 14: end for
15: end for
ai = ri − U c (Cir , ). (5) 16: end for
4) Actor Model Optimization: After obtaining the above
values, we can finally calculate the policy loss, i.e., the loss
of actor model. Here, we use the importance sampling and V. G ENERATION -E NHANCED R ECOMMENDATION
clip tricks to estimate the loss as: F RAMEWORK
After that, we propose two different ways to utilize j for ′ multiple matched CV-JD pairs, ranging from Apr. 1,
enhancing the recommendation task corresponding to different 2023, to Apr. 30, 2023.
predictor. Specifically, for the MLP predictor, we propose to • Reward Model Training Dataset: This dataset contains
calculate the matching score as: multiple matched and mismatched CV-JD pairs, ranging
from May. 1, 2023 to May. 7, 2023.
score = M LP ([c; j; j′ ]). (13) • Reinforcement Learning Dataset: This dataset contains
For the dot predictor, we first get the enhanced job seeker CVs only, ranging from May. 8, 2023, to May. 10, 2023.
embedding as: Furthermore, to evaluate whether the generated results can
c′ = M LP ([c; j′ ]). (14) enhance the performance of traditional discriminative models,
we built the following dataset:
Then, we can calculate the dot product as:
• Enhanced Recommendation Dataset: This dataset con-
score = c′ · j. (15) tains multiple matched and mismatched CV-JD pairs,
ranging from May. 8, 2023 to May. 31, 2023.
VI. E XPERIMENTS
Detailed statistics of the above datasets are shown in Table I.
In this section, we first describe the dataset used in this
paper. Then, we propose to evaluate our approch from two B. Evaluation and Baselines
different perspectives. We further present some discussions In this paper, we propose to evaluate the effectiveness of our
and case studies on generative job recommendation. The GIRL approach from the following two perspectives. Firstly,
experiments are mainly designed to answer the research ques- with the assistance of ChatGPT, we evaluated the quality of
tions as follows: the generated results from semantic perspective. Secondly,
• RQ1: Can our LLM-based generator generate high- we evaluated whether the generated results can enhance the
quality JDs? performance of discriminative recommendation.
• RQ2: Can the generated results enhance the performance For generation quality evaluation, we first selected several
of discriminative job recommendation? baseline methods to compare with our method as:
• RQ3: Whether the specially designed training methods • GIRL: This is the method proposed in this paper which
for the LLM effective? utilized both SFT and RL for fine-tuning.
• RQ4: How do different settings influence the effective- • GIRL-SFT: This method is a simplified variant GIRL
ness of our model? which only utilized SFT for fine-tuning.
[Question] TABLE II
Please generate a suitable job description based on the curriculum vitae C OMPARISON OF G ENERATION Q UALITY ACROSS D IFFERENT M ODELS .
of the following candidate: xxx
[Assistant 1]
XXX Model Pair Win Tie Lose Adv.
[End of Assistant 1]
[Assistant 2]
GIRL v.s. LLaMA-7b 0.63 0.07 0.30 0.33
XXX GIRL v.s. BLOOMZ-7b 0.74 0.08 0.18 0.56
[End of Assistant 2] GIRL v.s. GIRL-SFT 0.45 0.25 0.26 0.19
[System] GIRL v.s. BELLE-7b 0.45 0.17 0.36 0.09
We would like to request your feedback on the performance of two Al
assistants in recommended job description (i.e., JD) to the job seeker GIRL-SFT v.s. LLaMA-7b 0.55 0.03 0.42 0.13
displayed above. Please evaluate the given three aspects of their
generated job descriptions: GIRL-SFT v.s. BLOOMZ-7b 0.73 0.07 0.19 0.54
GIRL-SFT v.s. BELLE-7b 0.49 0.06 0.41 0.08
Level of details: The job description must include the job title, job
requirements, skill requirements, job responsibilities, and may include BELLE-7b v.s. LLaMA-7b 0.61 0.04 0.34 0.27
salary information. BELLE-7b v.s. BLOOMZ-7b 0.65 0.07 0.17 0.48
Relevance: The job requirements need to match the candidate‘s skills,
educational background, and work experience.
Conciseness: The job description must not contain repetition or
redundancy. Unnecessary information, such as company introduction,
interview format, interview location, and contact information, should be
For evaluating the effectiveness of the generated results for
avoided as much as possible. recommendation enhancement, we selected several baseline
Please first clarify how each response achieves each aspect respectively.
methods to compare with our method as:
Then, provide a comparison on the overall performance among Assistant • Base: This method is a traditional two-tower text match-
1 - Assistant 2, and you need to clarify which one is better than or equal
to another. Avoid any potential bias and ensuring that the order in which ing model as shown in Figure 1 (a). We chose BERT [11]
the responses were presented does not affect your judgment. In the last as the text encoder for getting the CV and JD embedding.
line, order the two assistants. Please output a single line ordering
Assistant 1 – Assistant 2, where ‘›’ means ‘is better than’ and ‘=’ • GIRL-SFT: As shown in Figure 1 (c), this method uses
means ‘is equal to’. The order should be consistent to your the generated JDs for recommendation enhancement.
comparison. If there is not comparison that one is better, it is assumed
they have equivalent overall performance ('='). Only SFT is used for fine-tuning the LLM.
• GIRL: This method uses the generated JDs for recom-
Fig. 4. The prompt template for generation quality evaluation. mendation enhancement. Both SFT and RL are used for
fine-tuning the LLM.
Note that as we mentioned is Section V, we proposed two
• Other LLMs: BELLE-7b [26], BLOOMZ-7b [27], different methods for the predictor, respectively MLP and Dot.
LLAMA-7b [28]. We will test the performance of different models with these
Furthermore, we propose to utilize ChatGPT as the evaluator two different predictors. We selected AUC and LogLoss as the
to compare the generation quality of these methods. Specifi- evaluation metric for the enhanced recommendation task.
cally, we first input the CV and two different JDs generated C. Performance of Generation Quality (RQ1,RQ3)
by two different methods into the prompt. We then request
To validate the quality of the JDs generated by our model,
ChatGPT to evaluate the results from the following three
we first built a evaluation set with 200 different CVs which
different perspectives:
do not appear in other dataset. Then, we compared GIRL with
• Level of details: Whether the generated JD contains all the baseline models on this dataset, and the results are
enough necessary information about the job. shown in Table II. From the results, we can get the following
• Relevance: Whether the generated JD is suitable for the observations:
job seeker.
1) The performance of the BELLE model significantly
• Conciseness: Whether the generated JD is fluid and has
surpasses that of LLaMA and BLOOMZ. This under-
high readability.
lines that instruction-tuning with instructions on Chinese
The detailed prompt template [29], [30] for generation quality datasets can substantially enhance the quality of the
is shown in Figure 4, from which we can find that the outputs in Chinese.
output results of ChatGPT can be divided into three categories, 2) Both GIRL and GIRL-SFT outperform all the baseline
respectively “Win”, “Tie”, and “Lose”. Based on the output methods, emphasizing the necessity of instruction tuning
results, given the dataset for generation quality evaluation, we on domain-specific data.
selected “Win Rate (Win)”, “Tie Rate (Tie)”, and “Lose Rate 3) GIRL exceeds GIRL-SFT in performance, demonstrat-
(Lose)”, which is obtained by calculating the proportion of ing that reinforcement learning can better align the
the above three results, as three different evaluation metrics. results generated by the LLMs with human preferences,
Note that we use boot strapping [21] strategy to avoid the thereby improving the quality of generated results.
position bias when using ChatGPT as the ranker. Furthermore,
we define “Advantage (Adv.)”, which is the difference between D. Performance of Enhanced Recommendaion (RQ2,RQ3)
“Win Rate” and “Lose Rate”, as another evaluation metrics to To demonstrate the effectiveness of the generation results
reflect the relative improvement. for enhancing the discriminative recommendation task, we
TABLE III TABLE IV
OVERALL P ERFORMANCE OF D IFFERENT M ODELS ON THE P ERFORMANCE OF D IFFERENT M ODELS ON THE D ISCRIMINATIVE J OB
D ISCRIMINATIVE J OB R ECOMMENDATION . R ECOMMENDATION U NDER C OLD - START C ONDITION .
* , 5 / 6 ) 7 in Figure 5. Note that we employed only 75% of the data
* , 5 / in Section VI-D to accelerate the computation process. From
the results, we can find that as the number of generated JDs
increases, the model performance initially improves before