Ex NLP

You might also like

Download as pdf
Download as pdf
You are on page 1of 21
Explainable NLP: A Novel Methodology to Generate Human-Interpretable Explanation for Semantic Text Similarity Tanusree De? and Debapriya Mukherjee Accenture Technology, Building 14, Pritech Park, 560103 Bangalore, India (tanusree. de, debapriya. mukherjee}@accenture. com Abstract. Text Similarity has significant application in many real-world problems. Text Similarity Estimation using NLP techniques can be leveraged for automating a variety of tasks that are relevant in business and social context. The outcomes given by Al-powered automated systems provide guidance for humans to take decisions. However, since the Al-powered system is a “black- box”, for the human to trust its outcome and to take the right decision or action based on the outcome, there needs to be an interface between the human and the machine which can explain the reason for the outcome and that interface is what we call “Explainable AI”. In this paper, we have made a twofold attempt, first, 1) to build a state-of-the-art Text Similarity Scoring System which would match ‘two texts based on semantic similarity and then, 2) build an Explanation Gen- eration Methodology to generate human-interpretable explanation for the text similarity match score. Keywords: Bi-directional LSTM - Explainable AI - Explainability - Human- Interpretable explanation « Sentence embedding - Sentence extraction network + TextRank - Text similarity scoring framework 1 Introduction Semantic Text Similarity measures to what extent two pieces of text carry the same meaning. Semantic text similarity can be applied for a variety of tasks in Natural Language Processing (NLP), such as information retrieval, text classification, docu- ment clustering, topic detection, question answering, machine translation, text sum- marization and many others. In recent times a lot of research is going on in the NLP space, especially on semantic text similarity using neural language modeling © Springer Nature Singapore Pte Ltd. 2021 S. M. Thampi et al. (Eds.): SIRS 2020, CIS 1365, pp. 28-48, 2021 https://doi.org/10.1007/978-981-16-0425-6_; Explainable NLP: A Novel Methodology 29. techniques; researchers have invented state-of-the-art model architectures such as Siamese Recurrent Neural Network architecture for sentence similarity and advanced pre-trained word embedding models such as word2vec, GloVe, Bi-directional LSTM, Universal Sentence Encoder, ELMo, ULMFiT, BERT and so on, However, all these AL-Neural Language Models are extremely complex and complete “black-box”; they do not provide the reasons for the outcome or decision they give, Due to this lack of transparency of AI models, itis difficult to trust these models and use it in production or in business applications, For a human to trust the outcome given by “black-box” AI system and to take the right decision or action based on the outcome, there needs to be an interface between the human and the machine which can explain the reasons for the outcome and that interface is what we call “Explainable AI”. Explainability is the key enabler for humans and machines to work productively together. In this paper, we have made a twofold attempt, first, 1) to build a state-of-the-art Text Similarity Scoring System which would match two texts based on semantic similarity and then, 2) build an Explanation Generation Methodology to generate human-interpretable explanation for the text similarity match score. We demonstrate our Text Similarity Scoring System on “Resume-to-Job Description (JD) Matching” problem. We have implemented an approach for training a Hybrid Bi-directional LSTM Siamese Recurrent Neural Net- work [5] for learning sentence similarity and leveraging sentence level similarity scores we have developed a new framework for generating resume level similarity score. For explainability, we propose a novel approach for extracting key sentences from the resume using TextRank algorithm in NLP [11, 14, 15]; and then using a trained Sentence Similarity Model which is a Bi-directional LSTM Siamese Network Model to find out the similarity between the selected pairs of JD-Resume sentences; and finally based on the similarity scores of each pair of sentences we have developed a scoring mechanism to estimate the match score for each JD sentence and then extract sentences from the resume which are semantically similar to the top scoring JD sentences. The resume sentences identified through this methodology provide precise explanation for the Resume-to-JD match. We present a detailed similarity scoring and ranking framework to identity and extract the relevant key sentences from a resume that give human-interpretable explanation for the Resume-to-ID match score, 2. Text Similarity Scoring Framework In this section, we present a novel framework for text (document) level similarity scoring, where we estimate similarity between two pieces of texts by semantically matching each and every sentence of one text with each and every sentence of the other text, The framework is illustrated below. 30 -‘T. De and D. Mukherjee Fig. 1. Text similarity scoring framework In this framework, the system considers one text as the Target Text and the other text as the one that needs to be matched to the Target Text. The system compares each sentence of Target text, Text-I with each and every sentence of Text-2 to score cach pair of sentences (Text-1-Sentence i with Text-2-Sentence j) on similarity; and based on a threshold k (in Fig. 1 shown as 0.5), selects the scores above the threshold, and takes average of the selected scores to first find out sentence level similarity for each sentence of the Target Text, Text-I. If a sentence of Text-I when matched with all the sentences of Text-2, has all the similarity scores below the threshold k, it gets a sentence level similarity score of 0. The system repeats the process for every sentence of Text-l to find out the sentence level similarity scores for all the sentences of Text-I. It then takes average of the sentence level similarity scores to estimate the Text level similarity score. This average score is a normalized score because the Target Text is fixed and has a constant number of sentences. The backbone of the above Text Similarity Scoring Framework is the Language Model that learns the sentence similarity and gives the similarity score for a pair of sentences. The model we have applied is a Bi-directional LSTM Siamese Network discussed in the next section, 2.1 Bi-directional LSTM Siamese Network for Learning Sentence Similarity After doing extensive research and experimentation on various Neural Language Modeling techniques, we have found the state-of-the-art method for sentence similarity is a Siamese Recurrent Neural Network model [5]. A Siamese network is a type of deep learning network that uses two or more identical subnetworks that have the same architecture and share the same parameters and weights. A high-level architecture of Siamese network is given in Fig. 2. Explainable NLP: A Novel Methodology 31 Opt > Siamese Layer a ~ a aw Sentence Reresentaon Sentence Represeatation [Neural Network Saree, ‘Neural Network > Bidirectional LSTM Layer | Warde Woravee + Eimbodting Layer ‘Goo ans skis, problem song sis Provide solutions comple busines problems» ypu Laper ing stata & anata exert (00 Sentence) fResme Senne) Fig. 2. High-level architecture of siamese network As shown in Fig. 2, the model consists of four layers: input layer, embedding layer, Bi-directional LSTM layer and Siamese layer. The input of the model is a pair of sentences and we want the Siamese network to lea whether the two sentences are similar or dissimilar. The embedding layer transforms an input sentence into a sequence of word vectors. We have used Word2Vec model for creating the word embeddings as shown in the figure, however, any other pre-trained word embedding model, such as GloVe, ELMo etc. can also be used. Each sentence represented as a sequence of word vectors is passed through a Bi-directional LSTM network, which updates its hidden state via the typical learning process of an RNN. The same Bi-directional LSTM model architecture is applied to the twin networks and the two networks use the same weights while working in tandem on two different input vectors to compute comparable output vectors. The final representation of each sentence is the last hidden state of the cor- responding network. Thus, the twin networks basically transform the two sentences into two feature vectors. In the next layer, referred to as the Siamese layer, a single network is trained with the two inputs, the two sentence vectors and then the model applies a pre-defined similarity function to the two dense layer of the sentence vectors to calculate the distance between the two sentence vectors. The distance between the 32 ‘T. De and D. Mukherjee two sentence vectors in the representation space is subsequently used to infer the underlying semantic similarity. The smaller the distance, icc. if the pair of S are nearer to each other, we project the two sentences as similar and if the distance is large, i.e. the two vectors are farther away from each other, we project them as dissimilar. The model optimizes by comparing the predicted similarity score with human-annotated label, the ground-truth relatedness and accordingly adjusts the weights using backpropagation under the mean squared-error (MSE) loss function. 3 The Proposed Methodology to Generate Human- Interpretable Explanation for Text Similarity The goal of our research is to generate precise, human-interpretable explanation for the match score given by a Text Matching Model. For example, in Resume-to-JD matching problem, the Text Matching model scores a resume as a “match” or “not-match”, Suppose the model scores a resume as “match”, To trust this decision given by the model and also to take action based on the decision, it is imperative to know why the model has scored the resume as “match”. To generate the explanation, we introduce a new methodology, a two-step process. In the first step, we extract key sentences from the resume, using TextRank [11] technique in NLP. The objective is to extract rep- resentative sentences from the document that describe the key aspects of its content. We do not apply TextRank on JD as the sentences in the JD are target sentences that summarize the job requirements and from a logical standpoint, we want to consider all the requirements or sentences of JD. The sentences extracted from the Resume using ‘TextRank summarize the key profile of the candidate. In the second step, the sentences extracted from the resume are semantically matched to the sentences in the JD. This semantic matching between JD sentences and selected Resume sentences is done using the base model, i.e. the trained sentence similarity model. We have used our trained Bi directional LSTM Siamese Network Model. Leveraging the similarity scores given by the model, we have developed an Explanation Generation Methodology to extract the relevant sentences from a resume that provide the top reasons for high or low similarity between a JD and a Resume. The approach is to first determine the top n JD sentences that depict the requirements that the resume fulfills; and then investigate the top ‘m sentences in the resume that describe how it fulfills the top n requirements in the JD. These top m sentences in the resume provide human-interpretable explanation for the Resume-to-ID Match. The detailed methodology is illustrated in Fig. 3 and Fig. 4 below. Explainable NLP: A Novel Methodology 33, Fig. 3. A framework for sentence extraction, matching and scoring As depicted in Fig. 3, suppose i sentences are extracted from a JD and j sentences are extracted from a resume, using the TextRank algorithm. We then use our trained Bi- directional LSTM Siamese model to generate the sentence similarity scores. A JD sentence, say, Sentence-1, is matched with each and every resume sentence, and for every pair a similarity score is obtained; this means we generate j similarity scores for JD Sentence-1. The maximum score among the j scores is identified and it is the best similarity score for JD Sentence-1. We consider a similarity score threshold, 0.5, such that score >0.5 is considered a match and score <=0.5 is a mismatch. We find out the count of pairs for JD Sentence-1 with similarity score >0.5; i.e. the number of matches for JD Sentence-1. Similarly, the JD sentence level Match Scores are calculated for all the JD sentences. We calculate the Match Score for JD Sentence-i as: (Max similarity score for JD Sentence — i) % (count of similarity score >0.5 for JD Sentence — i) () We rank the ID sentences based on the JD Sentence Level Match Scores and select top three ranked sentences, as depicted in Fig. 4, top table. These three JD sentences describe the requirements that the resume under consideration fulfills. 34‘. De and D. Mukherjee 1D Sentence ID Match Score (WS) M505 ‘mesetey wD stz2 Ms_22 Yes ue Top Slaiy Sere 55) i so Sentences [pes ST | Res S72 Res 57-20 | Top3 Scores ws | saa | 812 S5.120 | $5.12, ssi, ss19 wre | saa) sma sn | 52244 85.2220, s6.22-1 wostas | sast | 5.052 A520 cette Fig. 4. A scoring methodology to generate human-interpretable explanation for semantic text similarity Corresponding to the three selected JD sentences, we look into the similarity scores between the selected JD sentences and the resume sentences, as depicted in Fig. 4, bottom table, Corresponding to every selected JD sentence, we select the top three scored resume sentences. We finally make a unique list of resume sentences, like the 5 resume sentences as shown in Fig, 4 below, which provide human-interpretable explanation for the Resume-to-ID Match. To check the fidelity of the explanation methodology, we have to compute the resume level score based on explanation and compare it with the score based on the Resume-to-ID Matching model. We estimate resume level explanation score as: Yj Match Score for ID Sentence; 35, count of similarity score 0.5 for JD Sentence, 2) Explainable NLP: A Novel Methodology Pseudo Algorithm: Explanation Model for Resume-to-JD Match Score Input: JD Sentences, Resume Sentences, TextRank Model Score, Trained Semantic Sentence Similarity Model (Bi-directional LSTM Siamese Network). 1, Take all sentences of JD 2, Take top n sentences of Resume based on their Importance Score given by TextRank Model 3. Create pairs of JD-Resume sentences for each and every JD sentence. This ‘means n pair of sentences for each JD sentence and total (n*k) pairs of sentences for all the k JD sentences, 4. Using the Trained Semantic Sentence Similarity Model, score the (n*k) pairs of JD-Resume sentences on semantic similarity. 5. For every JD sentence, find the maximum semantic similarity score For every JD sentence, find the count of semantic similarity scores > 0.5 7. Calculate the Match Score for JD Sentence i as Match Score for JD Sentence i = (Max similarity score for JD Sentence i) * (count of similarity score > 0.5 ‘for JD Sentence i) 8. Calculate the ‘Match Score’ for all the JD sentences using the formula given under step 7. 9. Calculate the Resume Level Explanation Score as: Match Score for JD Sentence, Yj count of similarity score > 05 for JD Sentence, Output: Resume Level Explanation Score for Resume-to-JD Match Score 3.1 Overview of TextRank Algorithm Used for Key Sentence Extraction We have extensively researched and performed experimentation on many keyword extraction methods, such as TF_IDF, TextRank, RAKE and LDA. After doing a comparative analysis on the results obtained from these methods, we have found TextRank as the most appropriate method for key sentence extraction from a document. TextRank basically finds out how similar each sentence in a document is to all other sentences in the document. An important sentence is the one that is similar to a sig- nificant number of other sentences in the document. It implies that the content or subject matter of that sentence has been (partly or fully) expressed several times in the document, therefore it is most likely a key sentence. To explain it in an alternative way, 36 —-T. De and D. Mukherjee the “important” sentence as mentioned above is recommended or voted by several other sentences which are similar to it. Like this, the sentences that are highly recommended or voted by other sentences in the document are likely to be more informative for the given document and will be therefore given a higher score on importance and based on the importance scores of the sentences, the model ranks the sentences and extracts the top n ranked sentences; hence the name TextRank. The way the TextRank algorithm works is, similarities between sentence vectors are stored in a matrix, called similarity matrix which is converted into a graph, with sentences as vertices and similarity scores between the sentences as edges. When one vertex links to another one, it is basically casting a vote for that other vertex. The higher the number of votes that are cast for a vertex, the higher the importance of the vertex, Moreover, the importance of the vertex casting the vote determines how important the vote itself is, and this information is also taken into account by the ranking model, Hence, the score associated with a vertex is determined based on the votes that are cast for it, and the score of the vertices casting the votes, The following illustrates a pseudocode of the TextRank algorithm. The first step is to split the text into individual sentences. Next step is to generate vector representation (sentence embeddings) for each and every sentence, ‘* Given two sentences, S; and S,, with a sentence being represented by a set of n words that appear in the sentence as S= Wri, Woiy «+» Wni and Sj = Wij, Ways «2+. Wj. The similarity function for S; and S; is defined as _ [{welwe © Sikewe € Sj}] Sim(Ss 5) = Ce(iSI) + loa (5) Similarity function used. Source: [11] Similarities between sentence vectors are stored in a matrix, The similarity matrix is then converted into a graph, with sentences as vertices and similarity scores as edges. ‘The resulting graph is highly connected, with a weight associated with each edge, indicating the strength of the connections established between various sentence pairs in the text. ‘© The weighted graph-based ranking formula is depicted in step 7 under sub-Sect. 4.2 ‘Implemented TextRank’. * After the ranking algorithm is run on the graph, sentences are sorted in descending order of their score, and the top ranked sentences are selected as key sentences. 4 Experimentation and Results To evaluate the efficacy of our explanation methodology, we performed experiments on one of the functions of recruitment, i.e. assessment and shortlisting of candidates based on match between job requirement given in “ID” (job description document) and Explainable NLP: A Novel Methodology 37 candidate profile as mentioned in the “Resume”, precisely a “Resume-to-JD” matching application. The fundamental objective of such application is to minimize human effort in shortlisting resumes against a particular JD. An automated system helps to accomplish the task much faster, more efficiently and without any bias. However, this also demands explanation for the outcome given by the automated system, first of all to trust the system and secondly to take further decisions on the shortlisted resumes. Dataset for Experimentation The text corpus we considered for experimentation consists of 70 resumes of three different profiles, Data Scientist, Web Developer and Software Developer and corre- sponding JDs for these profiles. We first created a dataset with 180,000 pairs of sentences, each JD sentence paired with each Resume sentence. We labelled the pairs of sentences using a semi-supervised approach; i.e. we manually labelled about 5% of the data, trained a sentence similarity model on it and used the trained model to score the rest of the data and labelled it. We then combined the manually labelled data with the pseudo-labeled data to create the data for training our Bi-directional LSTM Siamese Network Model. The input data to our Bi-directional LSTM Siamese Network is shown in Table 1. below. Table 1. Example of input data Index [JD sentence Resume sentence Label No 24156 | Model building using Experience in customer analyties and marketing 1 supervised and unsupervised | analyties projects based on statistical techniques logistic algorithms regression, linear regression, cluster analysis, market basket analysis, CHAID and forecasting techniques 1805 | Model building using Developed a response model (Logistic regression) using | 1 supervised and unsupervised | MBPA & SAS for a campaign of high-end store selling algorithms luxury goods to Identify (predict) and target the prospect customers for sending promotional offers and to increase the Response rate 2160 | Model building using Create a procedure manual for operations and 0 supervised and unsupervised | compliance in all GCC equity markets in order to algorithms streamline processes within the brokerage network 43164 | Experience with HTML, CSS __| Coding using front-end technologies, such as CSS and | 1 web technologies HTML 4.1 Implemented Resume-to-JD Match Model We trained a MaLSTM Siamese Model [5] in python. First, we tokenized each sen- tence into words and created word vectors using Gensim Word2Vec model. With the word vectors we created the embedding matrix. We then trained two Bi-directional LSTM networks, LSTM_left which processed the JD sentence and LSTM_right which processed the resume sentence in a given pair. Both the networks had same tied weights such that LSTM_left = LSTM_right which typically represents a Siamese network architecture, The final hidden state of each network was obtained as a vector representation for each sentence. 38 —-T. De and D. Mukherjee Finally, we trained a single network with two inputs, the JD sentence vector and the resume sentence vector and then subtracted the dense layer of JD sentence vector from the dense layer of resume sentence vector to calculate the Manhattan Distance between the two sentence vectors and passed it through a single neuron with a sigmoid acti- vation function. The sigmoid activation function gives a probability score between 0 and 1, interpreted as the semantic similarity score, The model compares the estimated semantic similarity score with the ground-truth label to optimize the weights through back-propagation using the mean squared error (MSE) loss function. Pseudo Algorithm: Bi-directional LSTM Siamese Network Input: Sentence 5{, Sentence 5; ‘Label (value: 1, indicating the two sentences are similar; 0, indicating the two sentences are dissimilar). 5{° is the i* sentence of text @ and 5,” is the j* sentence of text 6. For each pair of sentences in the input data: 1. Use word tokenizer to break each sentence into words. 2. Apply Gensim Word2Vec model to create word embeddings or vector representation of the tokenized words in a sentence. 3. With the word vectors, create word embedding matrix 4, Train a Bi-directional LSTM network with the word embedding matrix as input. Train two such networks, one for Sentence 1 and the other for Sentence 2. Assign the same architecture and the same parameters and weights to both the networks, 5. Obtain the final hidden state of each network as the vector representation for the corresponding sentence. 6. Train a single network with two inputs, vector representation of S{®, vector representation of 5° and the label. 7. Calculate the similarity between the dense layer of ${®) vector, denoted by {° and the dense layer of 5{° vector, denoted by h) using the Manhattan Distance metric and apply a sigmoid activation function so that the semantic similarity score lies between 0 and 1. The pre-defined similarity function is given below: 9 (WO, K) = exp (1A = AP Ih) € [0.1 9. The model then compares the estimated semantic similarity score with the ground-truth label to optimize the weights through back-propagation using the mean squared error (MSE) loss function. Output: Semantic Similarity Score for each pair of sentences. We trained our model (step 6 in the Pseudo Algorithm) with 100 hidden neurons, batch size 64 and number of epochs 100. We used ‘Adam’ as the optimizer, ‘mean squared error’ as the loss function and ‘accuracy’ as the evaluation metric. Explainable NLP: A Novel Methodology 39 The model accuracy was 92% in Training data and 86% in Test data. The precision, recall and F1 score for the model in Training data for Label 1 (“Similar”) were 92%, 85% and 88% respectively and in Test data, 84%, 77% and 80% respectively. The TensorFlow graphs for the training history are shown in Fig. 5 below, ao Fig. 5. The L-H.S. graph shows the model loss (mean squared error) decreases with increase in number of epochs and correspondingly the R.HLS. graph shows the model accuracy increases with increase in number of epochs. After generating the semantic similarity score for each pair of sentences, we applied our Text Similarity Scoring Framework depicted in Fig. 1. to generate the Resume-to- JD Match score for each resume when matched against a JD. Pseudo Algorithm: Resume-to-JD Matching Model Tnput: JD Sentences, Resume Sentences, Semantic Similarity Scores for each pair of JD-Resume Sentences (given by the Trained Bi-directional LSTM Siamese Model). 1. For each JD sentence i 2. Select the pairs of ID-Resume sentences that got semantic similarity score > 05 3. Take average of the selected scores. This is the Resume-to-JD Match Score for JD Sentence i. (Note: some of the JD sentences may not get a match score if the selection condition under point#2 does not satisfy) 4. Calculate the Resume-to-JD Match Score for all the JD sentene: 5. Calculate the Resume-to-JD Match Score for Resume as Z,Resume-to-JD Match Score for JD Sentence i Total Number of JD Sentences Output: Resume-to-ID Match Score for a resume 40 -T. De and D. Mukherjee 4.2 Implemented TextRank We have applied TextRank algorithm in python to identify relevant sentences from the JD and the Resume. The algorithm basically computes weights between sentences by looking which words are overlapping. However, we do not want to look for overlap in words like ‘the’, ‘and’, ‘or’, ... etc. That is why, we have done the following text pre- processing. We have used the opensource python package, NLTK that provides the most common algorithms for NLP, such as tokenizing, part-of-speech tagging, lemmatizing, stopword removal and so on. Step 1: Cleaning: We have removed punctuations and special characters to make the data noise free. We also made all alphabets lowercase. Step 2: Tokenization: It is essentially breaking down a text into smaller units, such as paragraphs, sentences or words. Each of these smaller units are called tokens. We have used sentence tokenizer to break the entire text into sentences. Step 3: Stopword Removal: Stop-words are commonly found words in documents such as ‘a’, ‘the’, ‘and’, ‘what’ and so on, Their contribution to document comparison is almost insignificant; hence they are removed, NLTK (Natural Language Toolkit) in python has a list of stopwords stored in 16 different languages. We have used the one in English. Step 4: Sentence Vectorization: First we have created vectors for the words in a sentence applying GloVe pre-trained model and then we have taken average of the constituent word vectors to arrive at a consolidated vector for the sentence. Step 5: Similarity Matrix Preparation: We have applied cosine similarity function to find similarity between two sentences and generated similarity score for each and every pair of sentences. We have created similarity matrix with the similarity scores. Step 6: Graph Creation: We have built a graph on the similarity matrix, The vertices or nodes of this graph represent the sentences and the edges represent the similarity scores between the senten Step 7: Application of TextRank algorithm: We have applied TextRank algorithm (Mihalcea, R. and Tarau, P., 2004), conceptually based on Google’s PageRank algo- rithm Brin and Page, 1998), on our graph to score and rank sentences. The algorithm scores each vertex on importance based on two factors, a) the number of other vertices it (the vertex which is under consideration) connects to and b) the importance of those vertices it connects to. The algorithm runs for several iterations, updating all of the sentence scores based on the related sentence scores, until the scores stabilize. Let G = (V, E) be a directed graph with the set of vertices V and set of edges E, where E is a subset of V * V. Fora given vertex Vi, let In(Vi) be the set of vertices that point to it (predecessors), and let Out(Vi) be the set of vertices that vertex Vi points to (successors). The score of a vertex Vi is defined as follows (Brin and Page, 1998): 1 S(Vi) =(1=d)+ax D> aay”) sei (4) where d is a damping factor that can be set between 0 and 1, which has the role of integrating into the model the probability of jumping from a given vertex to another Explainable NLP: A Novel Methodology 41 random vertex in the graph. The factor d is usually set to 0.85 (Brin and Page, 1998). In our implementation we have also used this value. Step &: Key Sentence Extraction: Based on the TextRank Scores of the sentences given by TextRank algorithm, we have ranked the sentences and extracted the top N sentences from the document. The algorithm flow is depicted in Fig. 6. gut Document fe] special characters and |] Word Tokenization [| Remove stopwords Vecton Yectoraton of ‘ply Cosine Sma ‘Siar Mae ASSP claves FA Sentences ting |] funtion tin sitar -—F] cretion sing Cosine (sng love) rege of word vector benveen two sence Sly Scorer Graph reo on Aon Tear agri Rack senses based it Top N Seon Fig. 6. Implemented textrank algorithm Output of Implemented TextRank We present here the output of TextRank applied on a Data Scientist Resume. TextRank algorithm extracted 69 sentences from the resume. In Table 2 we have shown the top 10 sentences extracted from a Data Scientist Resume based on TextRank Scores. Table 2, Top 10 sentences from data scientist resume based on textrank score Top 10 Resume sentences ‘TextRank score 3.6 years of experience in data science (Statistics, Machine Learning, 0.015585 Descriptive analytics, Predictive analytics, Prescriptive analytics, Deep learning, Artificial ‘Neural Network, Text mining, Text classification, Sentiment analysis, R, SQL) in various domain like oil & energy, banking, telecom and human resource and 24 Years of experience in software engineering (ETL, data warehouse) using Informatica, SQL, Unix Business problem: to find the hidden insight from the production and non- | 0.015509 production Incident email text generated due to various server issue Experience summary: 0.015397 Over 3.6 years of experience in statistics, data analytics, machine learning, R Business problem: to predict whether the mechanical machine will fail or not | 0.015359 in next 15 days, so that business can take appropriate preventive maintenance or any alternative action to avoid breakdown (continued) 42 ‘T. De and D. Mukherjee Table 2. (continued) Top 10 Resume sentences TextRank Skill: 0.015338 Statistics and Algorithms Statistics Logistic Regression Linear Regression Decision Tree Random Forest Support Vector Machine K Nearest Neighbors PCA NaA”ve Bayes Sentiment Analysis K Means Clustering Association Rule Text Analytics Deep Learning (ANN) Programming Language: R Software (Used for project implementation) Basic SAS (not used for Analytics/ML prediction) RDBMS (SQL), ETL: Informatica Business Problem: To predict the kind of failure the mechanical machine is | 0.015305 going to face within next 15 days so that operational team can focus on those areas to avoid shutdown of extraction of oil Extensively worked on understanding data to find the important features 0.015193 Work Experience 0.015156 Project: 1 (Predicting kind of failure) Oct 2017 - Present (Deloitte) ‘Used descriptive analytics and visualization to find statistic for various terms | 0.015152 Business Problem: To find the chur customers who are going to change the | 0.015126 existing telecom operator in next 6 months In the next stage, for generating explanation, we have considered all the sentences from the Data Scientist JD and top 15 sentences from the Data Scientist resume based on TextRank. 4.3. Application of Trained Sentence Similarity Model to Score JD- Resume Sentence Pairs on Similarity After extracting and selecting key sentences from the resume, we have used our trained Bi-directional LSTM Siamese Network Model to generate similarity score between each pair of sentences. Using the scoring methodology depicted in Fig. 3., we have scored the JD sentences on their match with resume sentences. We have got the output as shown in Table 3. Explainable NLP: A Novel Methodology 43 Table 3. Selection of JD sentences for Explanation Generation xplanation Top three key sentences in the JD describing the top three requirements that the resume fulfill + Managing and Leading client implementations of Al related products. + Exposure in implementing data science solutions in production environments. + Extensive coding on Python or R language. For the top three selected JD sentences in Table 3, we looked into the similarity scores between these selected JD sentences and the resume sentences, as depicted in Table 4. Corresponding to every JD sentence, we selected the top three resume sen- tences based on similarity score which are highlighted in Table 4, Taking the unique of resume sentences, we got 9 sentences from the resume that provide human- interpretable explanation for the Resume-to-JD Match, (Implemented the pseudo algorithm given under Sect. 3). 44‘. De and D. Mukherjee ‘Table 4. Human-Interpretable Explanation for Resume-to-JD Match Cc —— ~ Human-Interpretable Explanati Key sentences in the Resume that provide human-interpretable explanation for Resume-to-JD Match ‘Business Problem: To find the churn customers who are going to change the existing telecom operator in next 6 months. ‘*Business Problem: To find the credit score for individual loan applicant and finding the good customers so that bank, ‘*Business Problem: To find the employee who are going to leave organization in next one year and factors causing them to leave. ‘Business Problem: To find the hidden insight from the production and non-production Incident email text generated due to various server issue. ‘*Business Problem: To predict the kind of failure the mechanical machine is going to face within next 15 days so that operational team can focus on those areas 10 avoid shutdown of extraction of oi ‘* Business Problem: To predict whether the mechanical machine will fail or not in next 15 days, so that business can take appropriate preventive maintenance or any alternative action to avoid breakdown. ‘*Extensively worked on understanding data to find the important features. ‘# Work Experience Project: 1 (Predicting kind of failure) Oct 2017 - Present (Deloitte). ‘Worked on automating the process as per business requirement. Explainable NLP: A Novel Methodology 45 5 Fidelity of Our Explanation Model To assess the fidelity of our Explainable AI methodology, we compare the ranking of the resumes against a JD, based on our Resume Scoring Framework (described in Sect, 2, “Text Similarity Scoring Framework” and pseudo algorithm given under Sect, 4, “Experimentation and Results”) with the resume ranking obtained from our Explanation Model Score (described under Sect. 3, “The Proposed Methodology for Generating Explanation for Semantic Text Matching” along with pseudo algorithm). We have tested with a Data Scientist JD and resumes with different skillsets and have got the result as shown in Table 5. Table . Comparison of Fidelity between Resume-to-JD Match Model based ranking and Explanation Model based ranking Table 5 shows, when matched with a Data Scientist JD, all our Data Scientist Resumes have ranked higher than the UI and the Java Developer resumes w.r.t both the Resume-to-JD Matching Model and the Explanation Model. Further, we see that among the Data Scientist resumes, the Resume-to-ID Matching model-based ranking and Explanation Model based ranking are comparable. Thus, the fidelity of our Explanation Model is quite high. 6 Novelty of the Proposed Framework of Explanation Generation In our literature survey, we have come across various novel approaches and algorithms for semantic text similarity. However, we have not found much research work around building explainability for textual similarity score given by AI models. Given that semantic text similarity has huge application, it is critical to understand the rationale behind the similarity score given by the AI system; so that the system can be trusted, and decisions can be taken appropriately and faster. With this objective in mind, we took up Explainable AI for Semantic Text Similarity as our area of research and developed this novel methodology (as depicted in this paper), to generate human- interpretable explanation for semantic text similarity, One of the existing methodologies for Explainable AI is LIME, proposed by Ribeiro et al. [19]. What LIME does is that it first perturbs the text of interest by 46 ‘T. De and D. Mukherjee removing word(s) from the text to create a dataset of new text samples and uses the black box AI model to get labels for the new samples; it then trains an interpretable model, basically a regression model on the newly created dataset with variations and explains the prediction by interpreting the local “white-box” regression model. Basi- cally, LIME generates feature importance of each word in the model through sensitivity analysis; and the explanations are derived based on the feature importance score of the words. The words with higher score and in favor of the predicted label are used for explanation. The fit of the regression model gives the overall LIME score for explainability. One big shortcoming of LIME approach is that it does not output phrases or sentences as explanation but simply throws out some individual words in favor of the predicted outcome and some individual words that contradict the predicted outcome. Individual words do not provide concrete explanation and therefore further human intervention is required for carving out explanations from the words The novelty of our methodology is that it extracts key sentences from the text that provide human-interpretable explanation for the match. The explanation is compre- hensible and there is no further human intervention required to further decipher the explanation, Our methodology, as depicted in this paper, is more scientific than LIME as we have first used TextRank algorithm to extract representative sentences from a text that describe the key aspects of its content. Intuitively, this can be interpreted as extracting the principal components of the text that depict its intent. Once we have identified the intent of the two texts by extracting key sentences, we have designed a scoring mechanism to estimate the extent to which the intent or key sentences of one text are similar to the intent or key sentences of the other text; and finally based on the similarity scores between pairs of sentences, we have extracted appropriate sentences as explanation for the match between the two texts, This methodology is highly intuitive, thorough and effective in precisely generating human-interpretable explanation for semantic text similarity. 7 Conclusion In this paper, we have introduced a new methodology for explaining the outcome of Neural Language Model based Text Matching. We have leveraged the concept of text summarization and have used TextRank algorithm to extract important sentences from a document, We have created a detailed framework for scoring each important sentence of the target text (against which the other text is matched) on match with the sentences of the other text and based on the match score, we ranked the sentences and shortlisted the top n sentences in the target text that have a good match in the other text. The top n shortlisted sentences explain the aspects in the target text that contribute to the match between the two texts, Then corresponding to the shortlisted aspects from the target text, we have extracted the sentences from the other text having a high similarity score with the shortlisted sentences of the target text, Those sentences extracted from the other text provide human-interpretable explanation for the match between the two texts. Explainable NLP: A Novel Methodology 47 We have established the fidelity of our explanation methodology on Resume-to-JD Matching problem. However, the methodology is applicable to any text matching problem. Our methodology is also model agnostic and is applicable for any black-box Text Similarity model. One area for future research is to leverage our explanation methodology for Semantic Text Matching in building Explainable AI solution for other NLP applica- tions, such as text classification, question-answering and so on, References 1. Ling, M., et al: Finding function in form: Compositional character models for open vocabulary word representation. arXiv:1508,02096v2 (2016) 2. Liu, H., Yin, Q, Wang, W.Y.: Towards explainable NLP: a generative explanation framework for text classification. arXiv:1811.00196v2 (2019) 3. Reimers, N., Gurevych, I: Sentence-BERT: sentence embeddings using Siamese BERT- networks. arXiv:1908.10084v1 (2019) 4, Utkin, L.V., Kovalev, M.S., Kasimov, E.M.: An explanation method for Siamese neural networks. In: Intemational Scientific Conference Telecommunications, Computing and Control (TELECCON-2019) arXiv:1911.07702 (2019) 5, Mueller, J., Thyagarajan, A.: Siamese recurrent architectures for learning sentence similarity In: Proceedings of the Thirtieth AAAT Conference on Artificial Intelligence (AAAI-16) 2016) 6. Kandi, S.M.: Language modelling for handling out-of-vocabulary words in natural language processing (2018). https://doi.org/10.13140/rg,2.2.32252.08329 7. Guo, S.: RésuMatchet: a personalized résumé-job matching system. Expert Syst. Appl. 60, 169-182 (2016) 8. Le, Q., Mikolov, T: Distributed representations of sentences and documents. In Intemational Conference on Machine Learning, pp. 1188-1196 (2014) 9. Cui, Z., Pan, L., Liu, S.: Hybrid BiLSTM-siamese network for relation extraction. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1907-1909 (2019) 10. Forman, G., Kirshenbaum, E.: Extremely fast text feature extraction for classification and indexing. In: Proceedings of the 17th ACM Conference on Information and Knowledge ‘Management, pp. 1221-1230 (2008) 11, Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404~411 (2004) 12. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., Dean, J.: Distributed representations of words and phrases and their compositionality. arXiv:1310.4546 (2013) 13. Beliga, S.: Keyword extraction: a review of methods and approaches. University of Rijeka, Department of Informatics Publication (2014) 14, Subramanian, L., Karthik, R.S.: Keyword extraction: a comaparative study usiing graph based model and rake. In: Publication on International Journal of Advanced Research, Article (2017). https://doi.org/10.21474/ijar01/3616 48, 15. 16. 17, 18, 19, T. De and D. Mukherjee Thushara, M.G., Mownika, T., Mangamuru, R.: A comparative study on different keyword extraction algorithms. In; ResearchGate Conference Paper (2019) https://doi.org/10.1109/ iccme.2019.8819630 Bennani-Smires, K., Musat, C., Hossmann, A. Baeriswyl, M., Jaggi, M.: Simple unsupervised keyphrase extraction using sentence embeddings, arXiv:1801.04470v3 (2018) Papagiannopoulou, E., Tsoumakas, G.: A review of keyphrase extraction, arXiv:190S. 05044v2 (2019) Yinga, Y., Qingpinga, T., Qinzhenga, X., Pinga, Z., Panpana, L.: A graph-based approach of automatic keyphrase extraction. Procedia Comput. Sci. 107, 248-255 (2017) Ribeiro, M. T., Singh, S., & Guestrin, C.: Why should i trust you? explaining the predictions of any cclassifier. In: Proceedings of the 22nd ACM SIGKDD Intemational Conference on Knowledge Discovery and Data Mining, ACM, pp. 1135-1144 (2016)

You might also like