Professional Documents
Culture Documents
A Novel Natural Language Processing (NLP) Approach To Automatically Generate Conceptual Class Model From Initial Software Requirements
A Novel Natural Language Processing (NLP) Approach To Automatically Generate Conceptual Class Model From Initial Software Requirements
A Novel Natural Language Processing (NLP) Approach To Automatically Generate Conceptual Class Model From Initial Software Requirements
1 Introduction
Although there is a noticeable research available that deals with the generation of
class model from initial plain text software requirements, the existing studies usually
requires few manual processing on textual requirements before generating the class
model that makes the whole process semi-automatic. This deviates the actual spirit of
true automation. Therefore, in this article, we propose a novel and fully automated NLP
approach to generate the class model from early software requirements. The overview
of this study is shown in Fig. 1.
Rules for Splitting Sentences Rules for Tokenization Rules for POS Tagging
Implementation
(AR2DT Tool)
Firstly, we defined the novel and improved rules for splitting sentences, tokenization
and POS tagging (Sect. 2). Secondly, we implement the defined rules in AR2DT tool
(Sect. 2.1). There are three components of AR2DT tool i.e. Implementation of Rules,
Class generation and User Interface. It takes early software requirements as a plain text
and generate conceptual class model with code as shown in Fig. 1. Finally, we utilize
three case studies for the validation of proposed approach (Sect. 3). The comparative
analysis with state-of-the-art is given in Sect. 4. The paper is concluded in Sect. 5.
The proposed NLP approach comprises the novel rules of sentence splitting, tokeniza‐
tion and POS tagging as shown in Fig. 2. The defined rules are applied to the initial plain
text software requirements to generate conceptual class model. Our proposal mainly
concerns with the extraction of Noun Plural (NNS), Proper Noun Singular (NNP) and
Proper Noun Plural (NNPS) by using matching nouns. The summary of rules is as
follows:
478 M.A. Ahmed et al.
Conversion of Plural to Singular: We are converting the plural nouns to singular e.g.
convert books to book.
Remove Redundant Classes: Repeated classes are only considered once. This concept
is implemented by defining a dictionary which includes all the irrelevant glossary words
e.g. user, software, number etc. The special set of the standard guidelines are defined
while developing the glossary of the dictionary in order to avoid any sort of biasness.
The identification of classes from plain text are performed on the basis of pre-defined
rules. A class can be described by this equation: C: ϵ [{C, A, O, R}] Where C is the
candidate class, A belongs to the attribute of this class, O is the operation or function of
the class and R represents the relationship of the class. The relationships between the
classes can be expressed as follow: R: ϵ [{rT, Cr, Rc}] Where R belongs to relationship,
rT is the relationship type i.e. association, Cr is the cardinality and Rc is the related class.
A Novel Natural Language Processing (NLP) 479
2.1 Implementation
AR2DT is developed in Visual Studio 2010 and written in C# with 1500 line of codes.
The SQL Server 2012 has been used for the storage. In AR2DT, the rules are imple‐
mented through SharpNLP-1.0.2529 [14] library. Subsequently, Regular Expression
library is used to match classes by utilizing the concept of dictionary. The interface of
AR2DT implementing ATM case study (Sect. 3) is shown in Fig. 3.
The text area is provided to write and copy/paste the desired case study. The classes
can be identified by pressing Identify Classes button where the business logic for the
rules of sentence splitting, tokenization and POS tagging has been implemented. The
Generate Class Diagram Code button creates the code of the class diagram. The gener‐
ated classes can be viewed in a grid view. The operations like tokenization and spitting
can be performed separately (without the generation of classes) as shown in Fig. 3. The
details about AR2DT tool like installation/user manual, executable file, source code and
sample case studies can be found at [20].
3 Validation
Automatic Teller Machine (ATM) Case Study: Rumbaugh et al. [19] first analyzed
the automatic teller machine case study by using OMT methodology. We took the same
problem statement to present the results of analysis. The initial software requirements
of ATM, expressed as a plain text, are shown in Fig. 4.
480 M.A. Ahmed et al.
Design the software to support a computerized banking network including both human cashiers and
automatic teller machines (ATMs) to be shared by a consortium of banks. Each bank provides its own
computer to maintain its own accounts and process transactions against them. Cashier stations are
owned by individual banks and communicate directly with their own bank's computers. Human
cashiers enter account and transaction data. Automatic teller machines communicate with a central
computer which clears transactions with the appropriate banks. An automatic teller machine accepts a
cash card, interacts with the user, communicates with the central system to carry out the transaction,
dispenses cash, and prints receipts. The system requires appropriate record-keeping and security
provisions. The system must handle concurrent accesses to the same account correctly. The banks will
provide their own software for their own computers.
Rumbaugh et al. [19] took all the nouns and created a list of classes from the case
study. The set of classes are 23: Software, Consortium, Cash receipt, Cash card, Account
data, Baking network, Bank computer, Bank, Traction, Access, Cashier station, Central
computer, Transaction, ATM, Cashier,, Transaction data, Security provision, Record
keeping provision, System, Cost, Receipt, Account, and Customer. In our case, AR2DT
generate 10 classes for ATM case study as shown in the Fig. 5.
We evaluate the performance of AR2DT tool against three case studies i.e. ATM,
Electronic Filling Program (EFP) and Local Hospital Problem (LHP). However, due to
space limitations, we only provide the details of ATM case study and further details can
be found at [20]. We compare the results of AR2DT tool with high impact journal
research study (i.e. Class-Gen [8]). The results are summarized in Table 1.
It can be seen from the Table 1 that the results of AR2DT for precision, recall and
over specification are significantly improved as compared to Class-Gen [8].
It can be seen from the Table 2 that our approach fully automate the requirement to
design automation process which is a significant contribution. Furthermore, our exper‐
imental results (Sect. 3.1) are more encouraging as compared to other studies. However,
we are not dealing with the generation of association and operation. We intend to include
such missing features in AR2DT in near future.
References
1. Meteer, M., Borukhov, B., Crivaro, M., Shafir, M., Thamrongrattanarit, A.: MedLingMap: a
growing resource mapping the bio-medical NLP field. In: Proceedings of the 2012 Workshop
on Biomedical Natural Language Processing (BioNLP 2012), Montreal, Canada, 8 June 2012,
pp. 140–145 (2012)
2. Umber, A., Bajwa, I.S., Asif Naeem, M.: NL-based automated software requirements
elicitation and specification. In: Abraham, A., Lloret Mauri, J., Buford, J.F., Suzuki, J.,
Thampi, S.M. (eds.) ACC 2011. CCIS, vol. 191, pp. 30–39. Springer, Heidelberg (2011). doi:
10.1007/978-3-642-22714-1_4
3. Sneed, H.M.: Testing against natural language requirements. In: Seventh International
Conference on Quality Software. IEEE (2007)
4. Ibrahim, M., Ahmad, R.: Class diagram extraction from textual requirements using natural
language processing (NLP) techniques. In: Second International Conference on Computer
Research and Development, pp. 200–204. IEEE Computer Society, IEEE (2010)
5. Kumar, D.D., Sanyal, R.: Static UML model generator from analysis of requirements
(SUGAR). In: Advanced Software Engineering and Its Applications, pp. 77–84. IEEE (2008)
6. Liu, D., Subramaniam, K., Eberlein, A., Far, Behrouz, H.: Natural language requirements
analysis and class model generation using UCDA. In: Orchard, B., Yang, C., Ali, M. (eds.)
IEA/AIE 2004. LNCS (LNAI), vol. 3029, pp. 295–304. Springer, Heidelberg (2004). doi:
10.1007/978-3-540-24677-0_31
7. Deeptimahanti, D.K., Sanyal, R.: Semi-automatic generation of UML models from natural
language requirements. In: ISEC 2011, pp. 165–174. ACM (2011)
8. Elbendak, M., Vickers, P., Rossiter, N.: Parsed use case descriptions as a basis for object-
oriented class model generation. J. Syst. Softw. 84, 1209–1223 (2011). 2011 Published by
Elsevier Inc.
9. Sharma, V.S., Sarkar, S., Verma, K., Panayappan, A., Kass, A.: Extracting high-level
functional design from software requirements. In: 16th Asia-Pacific Software Engineering
Conference. IEEE (2009)
10. Vinay, S., Aithal, S., Desai, P.: An NLP based requirements analysis tool. In: International
Advance Computing Conference. IEEE (2009)
11. Alkhader, Y., Hudaib, A., Hammo, B.: Experimenting with extracting software requirements
using NLP approach. In: ICIA. IEEE (2006)
12. Tripathy, A., Rath, S.K.: Application of natural language processing in object oriented
software development. In: International Conference on Recent Trends in Information
Technology. IEEE (2014)
13. Harmain, H.M., Gaizauskas, R.: CM-builder: a natural language-based CASE tool for object-
oriented analysis. Autom. Softw. Eng. 10, 157–181 (2003). Springer
484 M.A. Ahmed et al.