Professional Documents
Culture Documents
Major Project Report
Major Project Report
Major Project Report
LANGUAGE PROCESSING
by
LANGUAGE PROCESSING
by
We declare that this written submission represents our ideas in our own words and where
others’ ideas or words have been included; we have adequately cited and referenced
the original sources. We also declare that we have adhered to all principals of
academics honestly and integrity have not misrepresented or fabricated or falsified any
idea/data/fact/sources in my submission. We understand that any violation of the above
will be cause for disciplinary action by the institute and can also evoke penal action from
the source which has thus not been properly cited or from whom proper permission has
not been taken when needed.
Date:
Project Report Approval for Bachelor Of Engineering
The project report entitled Resume Analyzer Using Natural Language Processing
by Suryaprakash Dinesh Gaud, Chandraprakash Dinesh Gaud, Dipraj Sandeep Raut, is
approved for the degree of Bachelor of Engineering in Computer Science & Engineering
(AI&ML).
Date: Examiners:
Place:
This is to certify that the project entitled “Resume Analyzer Using Natural
Language Processing” is a bonafide work of “Suryaprakash Dinesh Gaud (09),
Chandraprakash Dinesh Gaud (08), Dipraj Sandeep Raut (20)” submitted to
the University of Mumbai in partial fulfillment of the requirement for the award of
the degree of “Bachelor of Engineering” in “Computer Science & Engineering
(AI&ML) ” has been carried out under my supervision at the department of Computer
Science And Engineering (AI&ML)of Theem College of Engineering, Boisar. The work
is comprehensive, complete and fit for evalautaion.
First and foremost, we thank God Almighty for blessing us immensely and empowering us
at times of difficulty like a beacon of light. Without His divine intervention we wouldn’t
have accomplished this project without any hindrance.
We are also grateful to the Management of Theem College of Engineering for their
kind support. Moreover, we thank our beloved Principal Dr.Riyazoddin Siddiqui,
our Director, Dr.N.K. Rana for their constant encouragement and valuable advice
throughout the course.
We are profoundly indebted to Prof. K. N. Attarde , Head of the Department of
Computer Science & Engineering (AI&ML) and Prof. MD Ameenuddin, Project
Coordinator for helping us technically and giving valuable advice and suggestions from
time to time. They are always our source of inspiration.
Also, we would like to take this opportunity to express our profound thanks to our
guide Prof. MD Ameenuddin, Assistant Professor, Computer Science & Engineering
(AI&ML) for his/her valuable advice and whole hearted cooperation without which this
project would not have seen the light of day.
We express our sincere gratitude to all Teaching/Non-Teaching staff members of
Computer Engineering department for their co-operation and support during this project.
The aim of this project is to design and develop a tool that results into an easy and
helpful solution for applicants as well as recruiters “RESUME ANALYZER” which
parses information from a resume using natural language processing, finds the keywords,
cluster them onto sectors based on their keywords and lastly show the recommendation,
prediction, analytics to the applicant / recruiter based on keyword matching.
The Resume Analyzer is an innovative Natural Language Processing (NLP) based
system designed to revolutionize the traditional recruitment process by automating and
enhancing the initial screening of job applicants. In today’s highly competitive job
market, companies receive an overwhelming number of resumes for each open position,
making it challenging for recruiters to efficiently identify the most qualified candidates.
This project addresses this issue by developing a cutting-edge NLP-powered solution.
Our Resume Analyzer employs advanced NLP techniques to extract, analyse, and
categorize key information from resumes, such as skills, qualifications, work experience,
and education. The system uses machine learning algorithms to score and rank applicants
based on their compatibility with job requirements. This automated screening process
significantly reduces the time and effort required by human recruiters, allowing them to
focus on more strategic aspects of candidate evaluation.
The project also prioritizes user-friendliness, ensuring that both recruiters and job
seekers can easily interact with the system. The Resume Analyzer is a valuable tool for
organizations seeking to streamline their recruitment processes and find the best-fitting
candidates efficiently. It holds the potential to reshape the hiring landscape, making it
more efficient, transparent, and fair.
Keywords: Natural Language Processing (NLP), Resume Parser, Resume Analysis,
Part- of-speech tagging, Named Entity Recognition (NER).
i
LIST OF FIGURES
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
List Of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
1 INTRODUCTION 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Purpose and Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 LITERATURE REVIEW 3
2.1 Paper Name: Natural Language Processing (almost) from Scratch by
Ronan Collobert, Jason Weston L´eon Bottou ,Michael Karlen, Koray
Kavukcuoglu, Pavel Kuksa. . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Paper Name: Resume Information Extraction with A Novel Text Block
Segmentation Algorithm by Shicheng Zu, Xiulai Wang and Seth Darren . 3
2.3 Paper Name: A Few Shot Approach to Resume Information Extraction
via Prompts by Chengguang Gan,Tatsunori Mori. . . . . . . . . . . . . . 4
2.4 Paper Name: Resume Parser Analysis Using Machine Learning
And Natural Language Processing” International Research Journal of
Modernization in Engineering Technology and Science Volume . . . . . . 4
2.5 Pepar Name: NLP based Extraction of Relevant Resume using Machine
Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.6 Peparle Name: A CV parser Model using Entity Extraction Process and
Big Data Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.7 Paper name: A Resume Parser Using Natural Language Processing
Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.8 Paper Name: B. A Keyword Extraction Method Based on learning to
Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.9 Paper Name: Automatic Extraction of Usable Information from Un-
structured Resumes to Aid Search . . . . . . . . . . . . . . . . . . . . . . 5
iii
2.10 Paper Nmae:Overview of the Speech Recognition Technology . . . . . . . 6
3 SYSTEM ANALYSIS 7
3.1 Present System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.1 Limitations of present system . . . . . . . . . . . . . . . . . . . . 7
3.2 Proposed System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.1 Advantages of proposed system . . . . . . . . . . . . . . . . . . . 8
3.3 Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4 Software Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.5 Justification of selection technology . . . . . . . . . . . . . . . . . . . . . 9
4 SYSTEM DESIGN 10
4.1 Module Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Database Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4 Event Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.5 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5 IMPLEMENTATION 16
5.1 Programming Language and Frameworks . . . . . . . . . . . . . . . . . . 16
5.2 Data Collection and Preprocessing . . . . . . . . . . . . . . . . . . . . . 16
5.3 NLP Model Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.4 Keyword Extraction and Resume Parsing . . . . . . . . . . . . . . . . . . 16
5.5 Matching Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.6 Machine Learning Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.7 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.8 Feedback Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.9 Testing and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.10 Scalability and Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.11 Privacy and Data Security . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.12 Documentation and Maintenance . . . . . . . . . . . . . . . . . . . . . . 17
5.13 Ethical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.14 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.15 Monitoring and Continuous Improvement . . . . . . . . . . . . . . . . . . 18
iv
6.4 Recommendation and Prediction . . . . . . . . . . . . . . . . . . . . . . 20
6.5 Resume tips and ideas with overall Scorer . . . . . . . . . . . . . . . . . 21
6.6 Feedback Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.7 Past user ratings and comments . . . . . . . . . . . . . . . . . . . . . . . 22
6.8 Admin Login . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.9 Total user’s, user data table, csv file download link, feedback data table . 23
6.10 Downloaded csv file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.11 Analytics Sheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
7 Conclusion 25
v
Chapter 1
INTRODUCTION
1.1 Background
Corporate companies and recruitment agencies process numerous resumes daily. This
is no task for humans. An automated intelligent system is required which can take out
all the vital information from the unstructured resumes and transform all of them to a
common structured format which can then be ranked for a specific job position.
Parsed information includes (name, email address, phone number, work experiences,
education, hobbies, interests, achievements, certifications, projects) keywords and finally
the cluster of the resume (ex: Web Development, Data Science etc.). The parsed
information is then stored in a database (MySQL in this case) for later use.
Unlike other unstructured data (ex: email body, web page contents, etc.), resumes
are a bit structured. Information is stored in discrete sets. Each set contains data about
the person’s contact, work experience or education details. In spite of this, resumes are
difficult to parse. This is because they vary in types of information, their order, writing
style, etc. To parse the data from different kinds of resumes effectively and efficiently,
the model must not rely on the order or type of data.
To solve this tedious process our tool comes into action which makes the process fast,
easy and reliable. Using NLP Techniques, it extracts keywords from the resume and use
it for predictions, recommendation and analytical representation.
1.2 Objectives
The aim is to design and develop a model that can parse information from unstructured
(pdf) data, and transform it to JSON for furthermore processing A tool that analyses
applicants resume and transform it to a structured JSON format, using parsing
techniques and some programming fundamentals Which can be used by any organization
(company/college/individual user) that handles resume screening process To keep a track
of all records into database for further admin side analytics. And also, to provides tips
and recommendations based on applicants resume
1
1.3 Purpose and Scope
1.3.1 Purpose
1. Research shows that 90 of all CVs/Resumes are checked for much less than 2
minutes via the employers
2. This implies that in a maximum of the instances recruiters simply study the bits
of critical components or the points of interest within the Resumes and ignores the
rest.
3. Therefore, the first goal was to make a tool that covers all the parts in a resume
and keeps a track of all records within minimal time span
5. To make it recruiter friendly by providing them user data and export it to csv, and
also to provide insights and analytics.
1.3.2 Scope
1. It can be used for getting all the resume data into a structured tabular format and
csv as well, so that the organization can use those data for analytics purposes
3. And it can increase more traffic to our tool because of user section
4. It can be used by colleges to get insight of students and their resume before
placements
5. Also, to get analytics for roles which users are mostly looking for
2
Chapter 2
LITERATURE REVIEW
This paper has proposed neural network architecture which can be used for different
natural language processing tasks like part-of-speech tagging, name entity recognition,
chinking, etc. This paper is focused on benchmarking of proposed neural network using
four standards of NLP such as Part-of-Speech tagging (POS), Name Entity Recognition
(NER) and Semantic Role Labeling (SRL). POS labels each word with unique tag that
represents its syntactic role. Chunking is also called as Shallow Parsing. Chunking
labels segments of sentence with syntactic parts such as noun or verb or phrases. Named
Entity Recognition atomic elements from sentence into categories such as ‘person’ or
‘location’ or ‘organization’, etc. Semantic Role labelling – It gives semantic role to
syntactic constituents of sentence. In this experimental study, authors used F1 scores
over chunks for models in three tasks NER, CHUNK and SRL. For POS evaluation,
pre-word accuracy is used.
2.2 Paper Name: Resume Information Extraction with A Novel Text Block
Segmentation Algorithm by Shicheng Zu, Xiulai Wang and Seth Darren
In era of rapid development of deep neural networks, this proposed system fills the gap
between lack of systematic investigation in resume parsing using neural networks. In
this proposed system, they normalize the resume parsing process by focusing on six
important information fields of resume. The fields are personal information, education,
work experience, projects, skills and publications. As these are important factors in
any resume, authors didn’t focus much on other trivial information such as interests
and hobbies, leadership and references. Authors found that, BLSTM’s performance
was better in text block classification and robustness. For named entity recognition,
BLSTM-CNNs-CRF algorithm was found effective
3
2.3 Paper Name: A Few Shot Approach to Resume Information Extraction
via Prompts by Chengguang Gan,Tatsunori Mori.
Prompt learning shows great performance in most text classification tasks with few
training examples, It is useful for NLP methods where samples are dynamic, In
this paper, authors proposed a system where they created multiple sets of templates
manually and verbalizers based on textual characteristics of resumes. They compared
performances of Masked Language Model (MLM), Pre-training Language Models
(PLMs),and Seq2SeqPLMs. They further improved the design of prompt templates
and verbalizers for knowledgeable Prompt-Tunning. Experiments conducted by authors
show that, verbalizers designed based on their rules were more effective and robust than
existing manual templates and automatically generated prompt methods
The first paper introduces a comprehensive resume parsing system that integrates
machine learning and NLP techniques. Emphasizing the importance of subjective
evaluation and rating scales, the system employs advanced models such as brain
organizations, CRF, CNN, and Bi-LSTM-CNN for data extraction and classification.
This approach not only automates the parsing of resumes but also ensures unbiased
evaluations by incorporating subjective measures.
2.5 Pepar Name: NLP based Extraction of Relevant Resume using Machine
Learning
The second paper focuses on content analysis techniques for extracting relevant
information from resumes during the recruitment process. By automating the parsing
and extraction of CVs, the proposed system aims to save time for employers and
streamline the recruitment process. However, specific details on implementation and
evaluation are not provided, leaving room for further exploration in future work.
The third paper [3] introduces a novel approach to resume parsing and analysis, utilizing
big data tools for entity extraction. The system employs NLP with R language for
preprocessing, cleaning, tokenization, POS tagging, and transformation, emphasizing
4
the use of Hadoop MapReduce for efficient processing of large datasets. The paper
highlights the importance of accurate models and statistical techniques for effective text
analysis and entity extraction.
The paper proposes a model that uses Natural Language Processing (NLP) techniques to
extract details and statistics from resumes and rank them based on company preferences
and requirements. The model aims to build a job portal where employees and applicants
can upload their resumes for specific jobs. The NLP technique is used to parse the
necessary information and generate structured resumes. Resumes are also ranked
based on the company’s skill requirements and the skills mentioned by the applicants.
Techniques such as neural networks, CRF, CNN, and segmentation models are used for
information extraction from resumes. The results of the system involve parsing resumes
into plain documents, extracting entities, and comparing them with required keywords.
The results are presented in the form of pie charts and bar graphs.
his paper speaks about TransR method for knowledge graph completion. TransR is
an approach that combines graph embedding and rule mining techniques to improve
the accuracy of knowledge graph completion. It incorporates both entity and relation
embeddings to enhance the performance of link prediction and triple classification tasks.
Experimental results on benchmark datasets demonstrate the effectiveness of the TransR
approach, outperforming existing methods in terms of evaluation metrics such as mean
reciprocal rank and precision at different ranks. The paper also discusses the limitations
of the proposed method and suggests future research directions in the field of knowledge
graph completion.
The paper proposes a system for automated resume information extraction using natural
language processing (NLP) techniques to support rapid resume search and management.
The system is capable of extracting several important fields from free format resumes,
including per- sonal information, education, contact telephone number, postal address,
languages known, present company, and designation. The proposed system can handle a
5
large variety of resumes in different document formats with a precision of 91 and a recall
of 88 parcent The system aims to eliminate the need for job seekers to fill in predefined
templates and allows enterprises to extract the required information from any format of
resume automatically. The paper highlights the challenges of extracting information from
non-standardized resume structures and emphasizes the benefits of an automated system
for resume management, including the construction of an electronic resume database and
quick processing of resumes. The performance of the system is evaluated using precision
and recall metrics on a set of resumes that were not used as reference resumes to build
the knowledge base.
The paper highlights two key approaches in speech recognition: Hidden Markov Model
(HMM) and Artificial Neural Network (ANN). HMM is a statistical model used for
fast and accurate speech recognition, while ANN mimics biological nervous systems and
offers features like training, parallel processing, rapid judgment, and fault tolerance.
Artificial neural networks (ANN) are employed to improve the adaptability and response
of speech recognition systems to error inputs. Hidden Markov Models (HMM) are
utilized as a statistical model to train the acoustic and voice models in speech recognition
systems, leading to accurate and fast recognition results. The paper addresses challenges
in noisy environments, such as variations in pronunciation, speech rate, pitch, and
formant changes and suggests the use of new signal analysis and processing approaches.
Additionally Representative speech recognition methods, including dynamic time warp-
ing (DTW), vector quantization (VQ), and support vector machine (SVM), are also
mentioned in the paper, but the
6
Chapter 3
SYSTEM ANALYSIS
The process of hiring has evolved over the period of time. In the first-generation hiring
model, the companies would advertise their vacancies on newspapers and television.
The applicants would send in their resumes via post and their resumes would be sorted
manually. Once shortlisted, the hiring team would call the applicants for further rounds
of interview.
Needless to say, this was a time-consuming procedure. But the industries started
growing and so did the hiring needs. Hence the companies started outsourcing their
hiring process. Hiring consultancies came into existence. These agencies required the
applicants to upload their resumes on their websites in particular formats. The agencies
would then go through the structured data and shortlist candidates for the company.
This process had a major drawback. There were numerous agencies and each had
their own unique format. To overcome all the above problems an intelligent algorithm
was required which could parse information from any unstructured resumes, sort it based
on the clusters and rank it finally.
• This is no task for humans and time consuming. It is challenging task to handle
resume manually. Clashes due to their own unique format.
• The same amount of time and effort is often expelled for candidates who are
qualified as the ones who are.
7
of a resume
• The model uses natural language processing to understand the resume and then
parse the information from it.
• Insightful for admin/recruiter due to its powerful analytics and informative data
which is fetched from user/applicants resume
• Tracks and Analyze Resume Based on Job Roles. Fast, Safe, Real-time Predictions.
• Does the task within less timespan Provides more Efficient Review Overall.
• So that other devices within network can connect through network URL
• MySQL
• Python
8
3.5 Justification of selection technology
Frontend
• HTML5:- HTML is the standard markup language for Web pages With HTML you
can create your own website
• CSS3:- CSS is the language we use to style an HTML document. CSS describes
how HTML elements should be displayed.
Backend
• JSON:-JSON is a text format for storing and transporting data. JSON is "self-
describing" and easy to understand
Database
9
Chapter 4
SYSTEM DESIGN
Client
1. Basic Info
2. Skills
3. Keywords
9. Overall Score
Admin
10
3. View all saved uploaded pdf in Uploaded Resume folder
6. Ratings
8. Experience level
9. Resume score
11. City
12. State
13. Country
Feedback
1. Form filling
2. Rating from 1 – 5
3. Stores data of parsed and fetched information from user and it’s resume user
feedback
11
Figure 4.2.0.1: Database Design
12
Figure 4.3.0.1: Class Diagram for Resume Analyzer
A pictorial representation of how the process works using class diagram. The process
starts by uploading resume from user. While uploading resume user has to provide
their name, email-address, phone number A process will work behind the scene which
will fetch Ip address and based on Ip address it will fetch user location details and
some miscellaneous data also. After the resume is uploaded and saved to the root
folder the parser will start parsing resume and convert those data into JSON format.
The recommender will use those data for predicting experience, field of interest, resume
overall score and also provides recommendations like skills, tips ideas, courses, video
(resume writing and interview preparation). After the process is done all the data gets
stored into database
13
4.4 Event Table
The below event table shows types of events that can be used with our tool
The below figures describe how the data flows through out the process
14
Figure 4.5.0.3: DFD from Extraction and Recommender
Figure 4.5.0.6: DFD FOR Admin User Table and Data Visualizations
15
Chapter 5
IMPLEMENTATION
The implementation of the Resume Analyzer involves turning the methodology into
a functional software application. Below are the key steps and considerations for
implementing this NLP-based system
Choose a programming language suitable for NLP tasks, such as Python, and select
relevant libraries and frameworks like spaCy, NLTK, or Hugging Face Transformers for
NLP tasks, and libraries like Flask or Django for web application development.
Gather and preprocess the resume and job description data. Clean the text, tokenize it,
and structure it into a format that can be used for analysis.
Integrate pretrained NLP models (e.g., BERT, GPT-3) using appropriate libraries and
APIs. These models will help extract contextual information from text.
Create an algorithm that calculates matching scores between resumes and job
descriptions. This algorithm should consider factors like keyword relevance, skill overlap,
and context.
16
5.6 Machine Learning Model
Train a machine learning model for scoring and ranking resumes based on the matching
algorithm’s output. Implement the model using libraries like scikit-learn or TensorFlow.
Design and develop a user-friendly web-based interface for uploading resumes and job
descriptions. Ensure a responsive and intuitive user experience. Implement both
recruiter and job seeker interfaces.
Optimize the system for scalability, potentially using cloud services like AWS, Azure, or
Google Cloud to handle increased load efficiently.
17
5.14 Deployment
Implement monitoring for system performance and user feedback. Continuously refine
the system based on feedback and emerging NLP technologies.
The implementation process may require collaboration between NLP experts,
machine learning engineers, web developers, and domain experts to ensure the
system’s effectiveness and user-friendliness. Regular testing, evaluation, and iterative
improvements are key to making the Resume Analyzer a valuable tool in the recruitment
process.
18
Chapter 6
RESULTS AND DISCUSSION
19
6.3 Analysis
20
6.5 Resume tips and ideas with overall Scorer
21
6.7 Past user ratings and comments
22
6.9 Total user’s, user data table, csv file download link, feedback data table
Figure 6.9.0.1: Total user’s, user data table, csv file download link, feedback data table
23
6.11 Analytics Sheet
Future Works
Add more fields for other roles, and its recommendations respectively. Ranking out the
resume based on score and view individual user details. Decide more accurately and
authentically, whether or not to offer candidate a job
25
REFERENCES
[2] . L. S. Chen, L., “Enhancing candidate experience in the recruitment process: A case
study of tech companies.,” International Journal of Human Capital Management,
vol. 07, no. 03, pp. 112–125, Dec. 2019.
[4] . G. M. A. Brown, K. L., “he role of big data analytics in talent acquisition: A
systematic review,” Journal of Big Data, vol. 5, no. 02, pp. 75–89, 2017.
26