Download as pdf or txt
Download as pdf or txt
You are on page 1of 62

International Journal of Computer Science

and Business Informatics


(IJCSBI.ORG)

ISSN: 1694-2507 (Print)


VOL 15, NO 2
ISSN: 1694-2108 (Online) MARCH 2015
IJCSBI.ORG
Table of Contents VOL 15, NO 2 MARCH 2015

Enhancing AODV Routing Protocol to Eliminate Black Hole Attack in MANET ....................................... 1
Ei Ei Khin and Thandar Phyu

Adaptive Search Information Technology in the University Library ........................................................ 15


Andriy Andrukhiv and Dmytro Tarasov

Educational Data Mining: Performance Evaluation of Decision Tree and Clustering Techniques Using
WEKA Platform ........................................................................................................................................... 26
Ritika Saxena

Hamiltonian cycle in graphs 4 n ....................................................................................................... 38


Nguyen Huu Xuan Truong and Vu Dinh Hoa
International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Enhancing AODV Routing


Protocol to Eliminate Black Hole
Attack in MANET
Ei Ei Khin
Faculty of Information and Communication Technology
University of Technology (Yatanarpon Cyber City)
Pyin Oo Lwin, Myanmar

Thandar Phyu
Department of Advanced Science and Technology
Ministry of Science and Technology
Nay Pyi Taw, Myanmar

ABSTRACT
MANET is an open wireless system that includes several mobile nodes to form an arbitrary
and temporary network. As the lack of infrastructure network, the mobile nodes send the
routing packets to each other in the network when they want to communicate. So, the nodes
use the routing protocols. However, as the lack of security mechanism of the routing
protocols, MANETs are facing various severe attacks. Black hole attack is such types of
attacks and can carry great damage to the network. As a result, an efficient and simple
routing algorithm for MANET is very important. This paper presented a simple approach to
find and eliminate the black hole attack for MANET. The proposed system slightly
modifies ad hoc on-demand distance vector (AODV) routing protocol by adding two tables
and packet type alarm. The proposed mechanism removes the malicious node and chooses
the reliable node by using these tables. When the malicious node is detected, the proposed
system is automatically sending out the alarm packets to all nodes in the network.
Keywords
MANET, AODV, Black Hole, Two Tables, Alarm.

1. INTRODUCTION
MANETs are the wireless network that consists of dynamic mobile nodes.
The mobile nodes may be personal digital assistance (PDA), laptop, mobile
phone and any devices that are mobile. The mobile device or node can
easily join and leave to the network and can design dynamic topologies for
the network based on their connectivity. They have the ability to configure
themselves without needing any infrastructure. When the nodes want to
communicate with each other via a wireless channel, they give the

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 1


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
connectivity by sending the packets among themselves. So, these nodes may
be router or host or both at the same time.
MANET have the basic characteristics such as open medium, self-
organization, dynamic mobile nodes and topology, limited resources, lack of
infrastructure network and lack of defense mechanisms. Because of these
factors, MANET often suffers from various security attacks [2]. Moreover,
the mobile nodes in the MANET communicate with one another based on
the mutual trust. The mobile nodes exist during the range of wireless
channel may be overhear and participate to the network. The wireless
channel causes MANET more prone to various attacks.
So, the security of transmission and communication in MANET is a
challenge and important issue. To get secure communication and
transmission in networks, the attacks type and their impacts on the MANET
is understanding. There are different types of attacks to harm MANET.
They are wormhole attack, selfish node misbehaving attack, routing table
overflow attack, flooding attack, black hole attack, sybil attack,
impersonation attack, denial of service (DoS) attack and so forth.
In the black hole operation, the intruder node sends the false reply with high
sequence number. When it is received the data packets, it discards all. So, it
disturbs the network and makes great damage to this network. In this paper,
the defense mechanism is presented to identify and remove this attack and
the feasible solution is proposed to get a reliable route to the destination.
The rest of this paper is arranged such as: Section 2 describes an overview
of black hole attack and AODV routing protocol. Section 3 reviews some
researches about defense mechanism. Section 4 presents the proposed
detection and prevention mechanism. The simulation results are described in
section 5. Then, section 6 makes a conclusion about this paper.

2. BACKGROUND STUDY
2.1 Overview of AODV Routing Protocol
AODV is widely used routing protocol for MANETs [7, 9]. It is an
extension of destination sequenced distance vector routing protocol[8] and it
gives dynamic link conditions, low network utilization, low control message
overhead, low memory overhead, and so on. There are two processes in
AODV routing protocol. They are route discovery and route maintenance
processes.
In the AODV protocol, when the nodes need to communicate with each
other to send the data packets, firstly a node find an already route in its
routing table. If it is an active or fresh route to the destination, the source
node uses this route. If it has no route or it is not fresh route, the source node
starts the route discovery process. So, it sends Route Request (RREQ)

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 2


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
packet to all neighbors and the neighbor nodes send back Route Reply
(RREP) packet if its the destination itself or it has a fresh route. Otherwise,
they forward the RREQ message. When the source node is received RREP,
it can communicate with the destination vice versa.
In the route maintenance process, whenever there is a link failure or link
broken down during the operation, the Route Error (RERR) packet is sent to
the nodes in an active link. The Hello message is periodically sent for
maintaining the route information. Although AODV is a well known
reactive routing protocol for MANET, there is no security mechanism
against the types of attack [1]. Thus, the malicious nodes makes the AODV
protocol is defenseless various types of attacks.
2.2 Black Hole Attack
It is one type of DoS attacks and active attack [10] in MANET. In the black
hole attack [11, 12], the malicious node declares to the nodes that it is the
best route to the destination with false route reply message. It is always used
the highest sequence number value and the lowest hop count value.
However, when it is received the data packets, it discards all packets.
For example, the following scenario in Figure 1 is considered. In this figure,
it is assumed that S, D and M are the source, the destination and the
malicious node respectively. When S wants to communicate with D, it
first sends Route Request packet to all neighbor nodes. Thus, F, E and
M receive it. As M is a black hole, it immediately sends back a Route
Reply packet with high sequence number. When S receives Route Reply
packet from M, it is assumed this route is fresh enough route. Then, it
communicates with the destination through this way. However, M does
not forward any data packet anywhere and discards all them.

Figure 1. AODV protocol with black hole attack

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 3


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
3. RELATED WORKS
There are various defense mechanisms in the literature to protect black hole
attacks. Some of the research papers are reviewed in this regard.
Mohammad Abu Obaida et al. [3] has presented lots of modules such as
Threshold Tester, Packet Classifier, RREP sequence number Tester,
Extractor, Alarm Broadcaster and Blacklist Tester. This mechanism
modifies the format of Route Reply packet and uses a new packet type
Alarm. The router calculates the range of the accepted sequence numbers
and gives the threshold value. When any node is exceeding the threshold
values for many times, this node is identified as attacker. But, the
calculation of the threshold value is bit overwhelming. So, it has the
network delay. Although the calculation of correct threshold prevent black
hole node, the wrong calculation may disgrace an authentic node as a black
hole.
Himral, Vig and Chand [5] have defined a mechanism to eliminate the
malicious nodes in the MANET and to discover the reliable paths to the
intended node by checking the sequence number difference between the
source and intermediate node. In AODV protocol, the destination sequence
number is very important. It is 32-bit integer value and is used to determine
the fresh enough route or not. The larger destination sequence number, the
better the route. So, in this paper, the proposed system is assumed that the
malicious node sends the first RREP packet with high sequence number to
the source node. Then, the source node stores it as the first RREP in the
table and compares it with its sequence number. If there is very different,
the node is surely the attacker and eliminates this entry from table.
However, the proposed method cannot find multiple black hole nodes.
Nital Mistry et al. [4] modifies the original AODV routing protocol by using
a new field Mali_node, a MOS_WAIT_TIME timer and a Cmg_RREP_Tab
table. The time period that the source node waits for the Route Replies is
defined as RREP_WAIT_TIME. The half of RREP_WAIT_TIME is
defined as MOS_WAIT_TIME. Route Reply packets are kept in the
Cmg_RREP_Tab table and Mali_node is stored the ID of attacker node. The
source node analyzes and discards Route Replies with very high sequence
number from the Cmg_RREP_Tab table. The experimental results
demonstrate that this method has a good packet delivery ratio than the
original AODV protocol. However, it has high processing delay and the
end-to-end delay is increasing.
Jalil, Ahmad and Manan [6] have proposed an ERDA mechanism that
modifies the existing route discovery mechanism recvReply() function of
AODV routing protocol. The new elements are mali_list to store the ID of
malicious nodes, rrep_table to keep RREP packets from other nodes and

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 4


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
rt_upd to do the update operation of routing table. The source node stores
RREP packets in the rrep_tab table and then updates its routing table with
first Route Reply of the malicious node from the rrep_tab table. However,
the source node updated again the routing table with the next Route Reply
packet from other node although it has a lower sequence number because
the value of rt_upd is true. If the value rt_upd is false, the source node stops
the update operation of routing table. The source node set the value of
rt_upd as false when it receives Route Reply from the destination. ERDA
mechanism removes the false Route Reply entry by replacing the later entry.
However, it has high processing delay.

4. IMPLEMENTATION OF DETECTION AND PREVENTION


MECHANISM
In this module, the detection and prevention algorithm for black hole attack
on the context of AODV protocol (MAODV) is implemented to isolate the
black hole nodes and to discover a safe route from source to destination in
MANET.
3.1 Route Reply Record Table and Malicious Node Table
The proposed system modifies the procedure of source node by introducing
two tables and alarm packet into existing AODV protocol. These tables are
Route Reply (RREP) Record Table (RRT) and Malicious Node Table
(MNT). The RRT table stores all RREP packets from the neighbors node
and the MNT table stores the information of malicious node. The examples
of these two tables are shown in Table 1 and Table 2. The RRT table is
stored only by the source node and the MNT table is stored by all nodes in
MANET to eliminate the black hole node.
Table 1. Route reply (RREP) record table (RRT)
Dest Dest Reply
Next Hop Lifeti- Timesta
Time Node Node Source
Hop Count me -mp
ID Seqno Address
5.203 C 100 M 1 M 9 20.4855

5.247 C 12 E 2 A 10 20.4855
5.301 C 10 D 3 B 9 20.4855
5.302 C 11 G 1 H 9 20.4855

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 5


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Table 2. Malicious node table (MNT)

Node ID Time
P 5.0143

M 10.6542

T 50.4968

3.2 Threshold Value Calculation


It is the value of averaging the difference between the destination sequence
numbers from RRT table and routing table in each time interval (t) for
destination. This value is used for detecting and removing the attacker node
in the network. is control parameter and variable. The value of is
different from the number of node, the number of connection, the network
area, the mobility speed and the pause time. is used to avoid the authentic
node disgrace to be a malicious node.
( )
= +

3.3 Extension to Routing Table


The proposed system has implemented to yield a strong method for
detecting and preventing black hole attack. For the design of our scheme,
the routing table field of AODV protocol is modified as follows. The reply
initiator filed is added to the routing table and is used to store the ID of node
that the route reply sends initially. When the malicious node is detected
comparing with the threshold value, we can find the malicious node ID by
seeing this field. So, the fields of the routing table of our proposed protocol
are below:

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 6


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
3.4 Alarm Packet
In the original AODV protocol, it uses four different types of packets to
communicate with each other. They are:

RREQ packet and RREP packet are used for route discovery process to
discover a route to the destination. RERR packet is used for route
maintenance process in order to notify earlier nodes down the path of such a
breakage when a link failure occurs. The HELLO packet is used to maintain
the connectivity of the neighbor nodes.
In the proposed system, the ALARM packet is added to the packet types of
AODV protocol. The ALARM packet is used to notify all neighboring
nodes in the network about the black hole node and the format of ALARM
packet type is shown in Table 3.
Table 3. ALARM packet format
Type Reserved Hop Count
Broadcast ID
Malicious Node IP Address
Originator IP Address

3.5 Detection and Prevention Algorithm


The following terms are used to express the proposed algorithm.

The proposed detection and prevention algorithm are as follows:


Begin
1. SN broadcasts RREQ to neighbors.
2. Store RREPs into RRT when SN receives RREP from IN until the
waiting time.
3. Retrieve the Seqno from RRT and calculate the Threshold value.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 7


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
4. Detect and remove the malicious node from RRT.
while ( RRT is not NULL)
if (( rep_seqno rt_seqno) > Threshold ), then
assume IN is MN
remove entry from RRT and store this IN as MN to
MNT
send Alarm message
end
end
5. Select the reliable packet from the rest packets and continue the normal
operation of AODV protocol.
6. Flush the RRT after completing step 4-5.
End

3.6 Working Principle of the Proposed System


When the source node needs to communicate with the destination to send
the data packets, it sends RREQ packet to all neighbors. In original AODV
protocol, the source node accepts the first fresh RREP form the neighbor
node. Thus, the malicious node always sends the route reply with high
destination sequence number ahead of other neighbor node to the source
node. As compared, in this paper, the source node keeps all RREP from
neighbor nodes in RRT until the waiting time. The waiting time is a timer
that the source node waits other RREPs after getting the first RREP. We
used 0.1 second as the value of waiting time.
Then, the source node retrieve the destination sequence number from RRT
table and routing table and calculate Threshold value using the above
equation. To detect the malicious node, we calculate the difference of
sequence number from routing table and RRT. If the value of the difference
is greater than Threshold, this intermediate node is assumed as the black
hole node. The source node stores this malicious node ID in MNT and
discards that entry from the RRT table and broadcasts ALARM message to
all nodes in the network to notify about this attack node. Then, the source
node chooses the reliable node from the resting node and continues the
normal operation of AODV protocol. After choosing the reliable node and
removing the malicious node, the RRT table must be clear all data.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 8


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
5. SIMULATION RESULTS
We have implemented the black hole attack behavior in AODV protocol
using Network Simulator (NS-2.34) [13]. The main traffic generator used in
this simulation is the Constant Bit Rate (CBR) and the overall simulation
parameters are presented in Table 4. The performance of the AODV
protocol and the proposed protocol with the black hole attack are analyzed
and evaluated. The following metrics are used to analyze the results of our
solution.
End-to-End Delay: It is the average delay of sending and receiving data
packet between the source and the destination. It is measured in
milliseconds
Packet Delivery Ratio (PDR): It is the ratio of total number of data packets
transmitted by the sources and received by the destinations. Higher value
means the better results [14].
Routing Overhead: It is the ratio of total number of control packet
generated to the data packets transmitted. .
Table 4. Simulation parameters
Parameter Value
Simulator NS-2.34
Area 800m x 800m
Routing Protocol AODV, BlackholeAODV, MAODV
Simulation time 200s
Application Traffic CBR
Number of Nodes 50-200
Malicious Node 1-4
Pause time 2s
Packet Size 512 bytes
Transmission rate 2 packets/s
Mobility speed 10 m/s
No of Connections 20-40
Movement Model Random Waypoint

5.1 Performance Analysis on Variation of Malicious Node


We have created a network by using simulation parameters shown in Table
4. Figure 2 illustrates the effect of malicious nodes on PDR in MANET. The
numbers of malicious nodes for simulation are used randomly from one to
four. It can be seen that AODV heavily suffers from the black hole attack. In
Figure 2, when the number of malicious node in the network increases, the
PDR of AODV protocol decreases. On the other hand, the experimental

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 9


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
results show that the PDR of MAODV protocol is above 95% even though
the malicious node is increased. MAODV has higher average packet
delivery than AODV. This is due to the fact that the proposed protocol can
prevent the black hole attack that occurs in the network.

The impact of malicious nodes to the routing overhead and the average end-
to-end delay are presented in Figure 3 and Figure 4. In AODV under attack,
the routing overhead is very high comparing to MAODV protocol. The
delay of MAODV is higher than the AODV protocol under attack due to the
additional waiting time for route replies. There is decrease in the delay of
the AODV protocol with black hole attack as the immediate reply of
malicious node without checking its routing table.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 10


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

5.2 Performance Analysis on Variation of Node


The performance results of all protocols are shown in Figure 5 to Figure 7
when the network size is increasing. When the number of nodes in the
network increases, the PDR of AODV also decreases in Figure 5. It is due to
the larger the number of intermediate nodes on an active route, the more
increases the route failure. The PDR of AODV with attack decrease even
more due to the probability that the malicious node become an intermediate
node. On the other hand, the PDR of MAODV is greater than AODV with
attack because our detection approach is able to identify and eliminate the
malicious node which greatly increases the network PDR.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 11


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
The routing overhead over number of nodes is depicted in Figure 6. The
routing overhead for all protocols increases as the network size is growth.
The routing overhead of the blackholeAODV protocol is greater than the
normal AODV and MAODV protocol since the black hole node is present.
The overhead of MAODV is the same as the normal AODV except 200
node scenario. This is the proposed protocol generate any additional
requests for discovering secure routes. The impact of number of nodes on
delay is shown in Figure 7. The delay of the proposed protocol increases at
100 node scenario since it has to avoid the malicious node.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 12


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
6. CONCLUSIONS
In this paper, a simple approach for eliminating the black hole attack for
MANET is proposed. The proposed mechanism can apply to remove black
hole node and to find a reliable route form source to destination in the
MANET. In this mechanism, the process of source node in AODV protocol
is modified by introducing two tables and alarm packet type. These tables
are Route Reply Record Table (RRT) to store Route Reply from neighbors
nodes and Malicious Node Table (MNT) to store the information about the
malicious nodes. The black hole node can be removed and the reliable node
can be chosen by using these tables. The alarm packet type is also proposed
to inform the intruder node to all neighboring nodes when the black hole
node is detected.

To evaluate the applicability of this routing algorithm, we simulated


different scenarios using AODV protocol and proposed protocol with the
black bole node. We considered the performance metrics such as routing
overhead, PDR and delay on different scenarios with number of nodes and
number of malicious nodes as variable parameters. The experimental results
present that the proposed system performs better than the AODV protocol.
However, the proposed system assumed that the route reply comes from
more than one node within the waiting time. If the source node receives the
only one route reply from the black hole node or the route replies from all
the black hole nodes during the waiting time, the malicious node can enter
the network.

REFERENCES
[1] Ramaswami, S. S., and Upadhyaya, S. Smart Handling of Colluding Black Hole
Attacks in MANETs and Wireless Sensor Networks using Multipath Routing.
Proceedings of the 2006 IEEE Workshop on Information Assurance, 2006.
[2] Luo, J., Fan, M., and Ye, D. Black Hole Attack Prevention Based on Authentication
Mechanism. IEEE, 2008.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 13


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
[3] Obaida, M. A., Faisal, S. A., Horaira, M. A., and Roy, T. K. AODV Robust (AODVR):
An Analytic Approach to Shield Ad-hoc Networks from Black Holes. International
Journal of Advanced Computer Sciences and Applications, 2, 8 (2011), pp. 97-102.
[4] Mistry, N., Jinwala, D. C., and Zaveri, M. Improving AODV Protocol against Black
Hole Attacks. International Multi Conference of Engineers and Computer Scientists, 2,
(2010).
[5] Himral, L., Vig, V., and Chand, N. Preventing AODV Routing Protocol from Black
Hole Attack. International Journal of Engineering Science and Technology, 3, 5
(2011).
[6] Jalil, K. A., Ahmad, Z., and Manan, J. A. Mitigation of Black Hole Attacks for AODV
Routing Protocol. Society of Digital Information and Wireless Communications, 1, 2
(2011).
[7] Perkins, C. E., Royer, E. B., and Das, S. Ad-Hoc on Demand Distance Vector (AODV)
Routing. IETF RFC 3561, 2003.
[8] Perkins, C. E., and Bhagwat, P. Highly Dynamic Destination-Sequenced Distance-
Vector Routing (DSDV) for Mobile Computers. ACM SIGCOMM94 Conference on
Communications Architectures, Protocols and Applications, (1994), pp. 231241.
[9] Ochola, E., and Eloff, M. A Review of Black Hole Attack on AODV Routing in
MANET. http://icsa.cs.up.ac.za/issa/2011/Proceedings/Research/Ochola_Eloff.pdf.
[10] Shurman, M. A., Yoo, S. M., and Park, S. Black hole Attack in Mobile Ad Hoc
Networks. Proceedings of the 42nd Annual Southeast Regional Conference ACM-SE
42, (2004), pp. 96-97.
[11] Deng, H., Li, W., and Agarwal, D. P. Routing Security in Wireless Ad Hoc Networks.
IEEE Communications Magazine, 40, 10 (2002).
[12] Sandhu, G., and Dasgupta, M. Impact of Black Hole Attack in MANET. International
Journal of Recent Trends in Engineering and Technology, 3, 2(2010).
[13] Fall, K., and Varadhan, K. The NS Manual. (November 2011),
http://www.isi.edu/nsnam/ns/doc/index.html.
[14] Jaafar, M. A., and Zukarnain, Z. A. Performance Comparisons of AODV, Secure
AODV and Adaptive Secure AODV Routing Protocols in Free Attack Simulation
Environment. European Journal of Scientific Research, 32, 3(2009), pp. 430-443.

This paper may be cited as:


Khin, E. E., and Phyu, T., 2015. Enhancing AODV Routing Protocol to
Eliminate Black Hole Attack in MANET. International Journal of
Computer Science and Business Informatics, Vol. 15, No.2, pp.1-14.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 14


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Adaptive Search Information


Technology in the University Library
AndriyAndrukhiv and DmytroTarasov
Social Communications and Information Activity Department,
Lviv Polytechnic National University, Ukraine, Lviv, 12 Bandera Str.

ABSTRACT
Information provision of the educational process in the university is aimed at implementing
of new information technologies and software into all spheres of university academic
activity in order to provide students with qualitative educational material. Forming of new
information society, development of Internet network and growing number of electronic
resources caused new conditions for research libraries that traditionally are guides in
information environment. Modern library user demands new standards of services to satisfy
information needs during ones studies. Simultaneously, university management sets the
goal to increase education quality through organization of work on information support of
educational process scientific methodic literature. Consequently, research on information
support of educational process in higher educational institution would allow organization of
qualitative collaboration between the library and subdivisions of higher educational
institution, and consequently improve book supply for educational process and optimize
funds necessary to buy new materials. The article describes the algorithm of literature
recommendation to academic courses. The algorithm improves the information support of
students during course study and enables academic teachers to form recommended
literature list in automated way. The algorithm work is a basis for information system
introduced in Lviv Politechnic National University, Ukraine.

Keywords
Library information system; OPAC; web-based library systems; e-learning materials,
educational process.

1. INTRODUCTION
The problem of online access to scientific and education information is a
vital issue nowadays. A few years ago, the center of access to scientific
information was the library, but with the development of information
technologies the Internet took librarys place[1,6]. This change was caused
by late development of information technologies in Ukrainian libraries.
Ancient and traditional role of library as an institution of acquiring,
organizing, preserving, retrieving and disseminating information to users
has changed. In current information society, libraries are trying to stay
influential institutions via providing access to Internet resources. However,
due to global commercialization of scientific resources, the Internet may not

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 15


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
always provide all the necessary information. Taking into account the
tendency of scientific community to turn to electronic documents and the
expansion of the range of services, libraries have to organize their work in
the way to satisfy user information requirements as soon and as well as
possible.
To solve this problem, the automation of library processes is being
conducted. This involves:
- computerization of workplaces;
- creation of a data center for storing and processing the large-scale data
sets;
- selection/development of own and support of appropriate hardware and
software;
alteration of the principles of library management, taking into
consideration new forms of service, etc.
Solution of one of library informatization problems leads to a range
of other problems[3]. For example, computerization of a workplace
involves:
computer equipment maintenance (repair, software updates and
technical support);
equipping the workplace (desk, chair, peripheral devices, etc.);
access to network resources (connection to the local area network and
electric power supply network);
selection and purchase of software;
managing occupational health and safety and health requirements
(organization of appropriate lighting, safe work with the computer, air-
conditioning, heating, choice of room for installing computer, etc.)
training staff to work with the computer;
All-inclusive library automation is a difficult task to implement in the field
of information technologies. It requires enrollment of skilled librarians,
systems analysts, linguists and programmers.

2. THE ACTUALITY OF THE RESEARCH


Reformation of educational system in Ukraine and implementation of new
educational technologies provided by the Bologna process significantly
increase the role of libraries in information support of academic, learning
and research activity of the university. Library must consider an information
flowthat is constantlybeing increased and find new ways and possibilities to
collaborate with university departments as with subjects ofnewinformation
andeducational environment in order to stay needed and essential institution.
Everyone is interested in improving the information provision of students:
library will have its service user, academic department will be given

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 16


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
accreditation and students will have qualitative sources of information for
studies. Therefore, information support of education process is one of the
key goals of university and its library.

3. WEB-TOOLS OF INFORMATION SEARCH IN THE


UNIVERSITY LIBRARY
Informatization of the Scientific Library of Lviv Polytechnic National
University started in 2007 and a large amount of work has been done by
now. The Scientific Library proposes the following services: library
website, institutional repository[2], electronic catalogue (OPAC) and several
new special web-based services for our users such as form to find UDC,
Ask-a-librarian, literature recommendation system for studying academic
disciplines in our university (see Fig.1).

Library
web-
services

Special
Institutional information
Web site repository search OPAC
services

Literature
Form to find UDC Ask-a-librarian recommendation
system

Fig. 1. The Scientific Library user-oriented web services.


Let us briefly and comprehensively dwell upon each information resource.
1. Library website (http://library.lp.edu.ua) plays a significant role in the
life of the library, as it is the primary resource of library events
representation on the Internet. The website has been built with the help
of Drupal 7 CMS. We use Drupal 7 because there are a lot of different
modules that give us opportunity to integrate library services with web-
site. We analyze web-site each year quarter and get average
statistics:statistics:
About 5 000 visits per month;
About 750 pages indexed in Google;
Links to web site 57 200;

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 17


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
2. Lviv Polytechnic National University Institutional Repository
(http://ena.lp.edu.ua) was created on May 15, 2010. Currently the
repository contains more than 25 000 items. It has been built with the
help of Dspace software.
3. OPAC (Online Public Access Catalogues) is an important part of many
digital library collections. It allows users to search for the bibliographic
records within library collections. Nowadays, some OPACs also
provide access to electronic resources and databases, in addition to the
traditional bibliographic records. Our OPAC was created in 2008 by the
IT Department of Scientific Library
(http://library.lp.edu.ua/en/it_department). This solution gave us
opportunity to connect google-analytics to the OPAC and create own
thematic statistic system. So we know two main points of visiting
statistics: number of users and kind of literature they are looking for.
So, we have 396212 records in OPAC and about 500 visits per month.
The next figure shows what kind of literature users are looking for.
other
6%
Humanities
25%

Sociology
4%

Technical
65%

Fig. 2. Thematic searches in OPAC


In spite of that, we know that the most popular literature in OPAC is
scientific literature about 60 percent of search queries, literature for
students 37%, and 3% other literature.
4. Special web-based services for users are:
form to find UDC;
Ask-a-librarian;
literature recommendation system for studying academic
courses in our university.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 18


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
4. NEW SPECIAL INFORMATION SYSTEM LITERATURE
RECOMMENDATION SYSTEM
The new service for library users is service that selects literature needed to
study academic discipline from the library collection. In Ukraine, the
official document that defines the qualitative and quantitative characteristics
of the process of discipline study is course program. One of the sections of
this document includes book list, which student must work out to learn the
material well. To perform the task of qualitative and complete formation of
literature and reference list for the course teacher must be aware of the
available literature in the library and be able to find this literature and
constantly keep up with new materials within ones research field.
Information system that recommends above named literature to academic
teacher would significantly ease ones work. Teacher would only make
decisions whether to include or not include this or that position from course
program to the recommended literature list. Moreover, when the list is
formed on the base of library collection it improves up to date information
support of educational process.
In Lviv Polytechnic National University there is information system for
recommended literature selection developed and introduced. Algorithm of
its functioning is shown on the Fig. 3.
Course
1 1
programs

Selection of Selection of Selection of


book authors book titles UDC values

2 3 4

Search through the Search through the Changing


library collection library collection of UDC values
by author by book title to certain form
5 6 7

Search through the


Consolidation of search results 8 library collection
by UDC
9

Ranking of search results

Fig. 3. The algorithm of formation of literature list for the course


where
1. Bibliographic records from recommended literature list for the
course;

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 19


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
2. Tuple of book authors from recommended literature list;
3. Tuple of book titles from recommended literature list;
4. Tuple of UDC from recommended literature list;
5. Results of search by author through the library collection
includes all book sof authors from recommended literature list;
6. Results of search by title through the library collection includes
all book titles from recommended literature list;
7. Tuple of UDC adapted into form suitable for next iterations;
8. Results of book search through the library collection includes
positions with certain UDC;
9. Combined book list from different iterations.
The next step after list formation is list ranking. Ranking of selected
search results will be done based on multi-criteria estimation of the
relevance of found documents taking into account the following criteria:
author and title of a book;
factor of research technical bookaging (year of publication);
statistical data on book demands;
number of book pages.
Algorithm work is shown on the Fig. 4.
We divide the mentioned criteria into two groups: main and specified. Main
criteria are autonomous criteria that are used to find relevant results.
Specified criteria are to increase the pertinence of relevant results received
after using the main criteria. In our case main criteria include the title and
author of a book and specified criteria include factor of research technical
book aging, statistical data on book demands and number of book pages.

4.1 Criterion of Book Author and Title


The below given presumptions are based on the thesis that if an expert (staff
member) has chosen books of one author, other books by the same author
may also be valuable for the same expert. Researchers commonly publish
their works within a specific research field and quite rarely bounce to
another one. For example, if an author works in building and architecture
then this person will rather not write next papers on computer processors. In
practice, it means that authors from recommended literature list have to be
the same as these from library collection. The similar situation is also with
book titles.
4.2 Factor of Research Technical Book Aging
Factor of research technical book aging is important because documents
with flow of time last their value as information source; consequently, they
are being used less and less. American researchers Burton R.E. and
Kebler[5] R.W. proposed the term half-life to describe literature
obsolescence.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 20


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Forming a set of options

Forming a criteria set

For criterion "book age" defines the class to


which the discipline define

Determination of the age limit for book,


depending on the class

Define variant

Evaluation criteria
determines the group of Evaluation criteria determines Evaluation criteria
experts the programmers determines the user

The standardization of criteria


min
values by the formula (1) max

Formula (2) Formula (3)

Determination of criteria importance

The calculation of the integral criterion by formula (4)

Sorting by a combined criterion

Fig.4. Evaluation of literature search results. Ranking algorithm

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 21


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
This number means period of time during which a half of all published
papers in some branch of science is out of use. For instance, they found that
the half-life of journal articles in physics is only 4.6 years, in chemistry it is
8.1 years, in mathematics this number is 10.5 years and in physiology it is
7.2 years. It must be emphasized that, by the aging process we understand a
process of information obsolescence, but not physical aging of the
information carrier. According to the study 62% of users are turning to
journals, whose age does not exceed 1.5 years; 31% use journals aged 1,5-5
years and 7% count on publications older than 10 years. Thus, we can claim
that the document age is clearly connected with the intensity of its use and
therefore can be used in ranking.
4.3 Statistical Data on Book Demands
In order to improve the effective reader service and library collection
management we use the analysis of statistical data on loaned and requested
books[4].
Data on library book use is an important criterion in ranking. Analysis of
information needs and requests can show current reader trends and
dynamics of reader interests. If the book is popular among users then it is
valuable for them and satisfies their information needs. The level of demand
is measured in number of books loaned by readers. These data are kept in
Library Information System (ALIS). In this case we do not take into account
time aspect (number of books loaned during certain period of time), because
not all books were registered in catalogue at once. Consequently, book
could not be found, because it was not in catalogue in the moment when
user needed it. Statistical data have to be analyzed before using in order to
remove splashes of activity caused by the period of exams. In classical tasks
of decision-making theory, normalizing coefficient that lies in the area of
feasible solutions is introduced.
Statistical data on literature demands are subjective. They show some
aspects of books value from the point of view of reader, but do not mirror
real level of readers information satisfaction. Students are major users of
university library and they use books recommended by academic teacher.
That is why books recommended by teacher would be highly demanded
while others would be undemanded.
4.4 Number of Book Pages
This criterion appeared as a consequence of library collection peculiarity on
the step of transferring information from library collection to electronic
catalogue. Some libraries definition of book includes guidelines for
laboratory or term papers, promotional materials, etc. This is to simplify the
library work. The distinctive peculiarity of academic textbooks is huge
number of pages. There were selected 2,000 guidelines for laboratory and
term papers, lectures and brochures for our study from the collection of

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 22


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Scientific Library of Lviv Polytechnic National University. It has been
found that 1910 (or 96 percent) of these documents does not exceed two
publishing sheets. Therefore, it is advisable to analyze books containing
more than 50 pages.
Problem of multi-criteria optimization plays key role in decision-making
theory. To solve this problem, partial criteria must be merged in one integer
criterion and then its minimum or maximum found. There are few types of
generalized criteria depending on the way of partial criteria combination:
maximin, multiplicative, additive (or linear convolution). To solve the
problem of ranking of bibliographic description list it is needed to build the
objective function that combines partial criteria. With this function we can
determine the relevance of each bibliographic description. For ranking of
found literature we use the algorithm of multiplicative criterion using above
mentioned criteria.
5. RESULTS
The literature recommendation system was developed via
PHP+Apache+MySQL by the Librarys IT Department of Lviv Polytechnic
National University Library. Data in this sytem are synchronized daily
according to ALIS through ODBC protocol. The peculiarity of this system
is that user is given an opportunity to fill feedback form (see Fig.6), find
details of system functioning and ways to implement defined books in
educational process.
The algorithm which is described in this article is the basis of an
information system which is available on http://library.lp.edu.ua/ttp/. It is a
separate webpage, where the user is offered to look for the necessary subject
and review recommended references.
If user thinks that the available references have to be changed, the form is
offered, where user can enter ones requisites and list of reference changes.
On the same webpage user can operate the function of literature
recommendation for a certain course.
The generated list is analyzed by the teacher. The marked literature that can
be attached to the subject goes to the librarian, who analyzes it and makes
changes (Fig.5).

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 23


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Fig. 5. An example of literature recommendations for the course of


Foreign languages.

Fig. 6. An example of literature recommendations feedback form


For processing and permanent storage of the results of a web module
work person may use:
regular automatic data export in the ALIS (for users profiles updates in
the ALIS, literature ranking for rank forming in the electronic catalogue
of the ALIS);
sending automatic information to library staff via e-mail (for the
function of feedback).

6. CONCLUSIONS
The article describes a new service that the library offers for its users. It will
allow the university to improve the quality of information support of
educational and management processes, and provides automated solution of
formation of recommended literature lists and making updates to them. The
teacher receives a number of integration benefits: formation of
recommended literature lists, designed according to the current
bibliographic standards, intelligent algorithms of automatic selection of
literature for the courses, and notification about new books related to the
course.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 24


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
7. REFERENCES
[1] Kanamadi, S., & Kumbar, B.D., 2006 . Web-based services expected from libraries: A
case study of management institutes in Mumbai city, Webology, 3 (2), Article 26.
Available at: http://www.webology.org/2006/v3n2/a26.html
[2] Tarasov D., Andrukhiv A., 2012 . Analysis of the development of Ukrainian
repositories, Modern Problems of Radio Engineering, Telecommunications and
Computer Science: Proceeding of the National University Lviv Polytechnic, p. 383.,
Ukraine
[3] Andrukhiv A., Sokil M., Fedushko S., 2014, Integrating new library services into the
University Information System, Library management, 1 (6), Proceedings of the Institute
of Polish Language and Literature, pp. 79-88, Poland
[4] Bhatnagar A., 2012, Web-based library services. Available at:
http://ir.inflibnet.ac.in/dxml/handle/1944/1418
[5] Burton R.E. and Kebler R.W., 1960, The "half-life" of some scientific and technical
literatures., American Documentation, pp. 98109.
[6] Sannella, M. J., 1994, Constraint Satisfaction and Debugging for Interactive User
Interfaces. Ph.D. Thesis, University of Washington, Seattle, WA,

This paper may be cited as:


Andrukhiv, A. and Tarasov, D., 2015. Adaptive Search Information
Technology in the University Library. International Journal of Computer
Science and Business Informatics, Vol. 15, No. 2, pp. 15-25.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 25


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Educational Data Mining: Performance


Evaluation of Decision Tree and Clustering
Techniques Using WEKA Platform

Ritika Saxena
(M.Tech, Software Engineering (CSE))
BBD University, Lucknow.

ABSTRACT
Data Mining plays a vital role in information management technology. It is a
computational process of finding patterns from large databases. It mainly focuses on
extracting knowledge from the given or the available data. Different knowledge extracting
tools are used. This tool is most common among every sector be it educational,
organizational etc. Educational Sector can take advantage out of these tools in order to
increase the quality of education. But the sad part is still in present educational systems are
not using it. Higher education Institutions needs to know which student will enrol in which
course, which student needs more assistance. In data mining users are facing the problem
when database consists of large number of features and instances. These kinds of
problem[s] could not be handled using decision trees alone or clustering technique alone.
Because, decision trees depend upon the dataset used and the configuration of the trees.
Similarly, clustering alone doesnt work for all kind of patterns. So in order to find that
which technique is most suitable, in this paper we have evaluated the performance of both
the algorithms. Educational data is mined and the algorithms are applied to it so as to
predict the results.

Keywords
Weka, EDM, Decision Trees, Clustering, KDD.

1. INTRODUCTION
Data mining is widely used in diverse areas. Data Mining is the process
through which we can analyse the different type of data and further extract
the useful information. Data Mining is sometimes known as KD i.e.
Knowledge Data. Data Mining is one of the tools which help in analyzing the
data. It helps in analyzing the data from different angles, categorising them
and hence summarize the relationships identified. Therefore, data mining is
the process through which we can find different relations and patterns
generated among different fields of databases. Educational Data Mining is a
different research field with the application of data mining, machine leaning
and statistics to information generated from educational settings. The main
goals of educational data mining are predicting students future learning
behaviour, advancing scientific knowledge, effects of educational support. In
this research paper we are using data mining methodologies in order to

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 26


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

understand students performance using the two commonly used techniques


of data mining. Data mining helps in studying the performance of students
based on their past records. In this paper, we are using two techniques of data
mining that is clustering and decision trees so as to get the results of the
inputs provided by us and compare those results as to which of the
techniques is more preferable and gives the better results .

2. BACKGROUND STUDY
Data Mining is the analysis step of KDD (Knowledge Discovery Database).
Data mining tasks are semi automatic or automatic based on the quantity of
the data available. Data collection, preparation, interpretation and reporting
are not the part of data mining whereas they are the part of KDD. Data
Mining could be performed using different tools some of the tools are listed
below. Different Phases in Data Mining include;

Problem Data
Definition Exploration

Data
Data
Preparation

Deployment Modelling

Evaluation

Data Mining Phases

The data in the educational institutions that is stored in electronic form has
seen a dramatic increase. Historical as well as organisational data is stored in
the databases. It is really cumbersome to manage such data manually.
Different relations have to be produced out of the stored data in the database.
The categorization of the students according to their academic result is the
important task for the institutions in order to increase the credibility of their

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 27


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

institution. There is no assurance whether there are predictors that can


determine or predict whether the student is academic is below average,
average or a genius student. In present day of the educational system, a
students performance is determined by the combination of primary results,
secondary results, internal assessments, tests, assignments and attendance
etc.Therefore, in this research paper we are analyzing the results as to which
algorithm is more efficient and predictable.

3. RELATED WORK
Lakshmi et al[1] describes the students performance . They used ID3
algorithm in order to classify the students performance and according to
which they will be allotted the area for their master. ID3 algorithm is the
classification algorithm through which we can construct decision trees using
top down, bottom up and greedy search methodologies. In order to select the
attribute which is most useful for the classification of the datasets, metric-
information gain is used.
Ali [2], also describes that how data mining could be used in educational
sector. As the information is collected by the students at the time of
admission and saved in the computer this provides benefit for business point
of view. He used data mining to classify and cluster the information based on
psychographic, behavioural and demographic variables. Therefore, it helps in
describing about the students profile whether they are successful or
unsuccessful based on their percentage or GPA secured during secondary
examination and semesters.
Sembiring [6], created a model based on psychometric analysis of the
students using data mining techniques. He created a rule model of the
students performance based on their psychometric behaviour. The predictor
variables used are- Interest, Believe, Family Time, Study Behaviour. The
model developed here uses the two main methodologies i.e. kernel k-means
clustering and support vector machine (SVM) classification. As it could be
used on large as well as high dimensional data that are nonlinearly
separable.
Bhullar [7], describes a data mining tool that helps to find out the student that
are weak in academics and need assistance. He used Weka classification
algorithms that provide stability between precision, speed and interpretability
of results. J48 algorithm is used which helps in classifying similarly as the
decision trees.
Baradwaj [8], collected the information of the 200 students from VBS
Purvanchal University, Jaunpur(U.P) such as their previous semester marks

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 28


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

(PSM), class test grade (CTG), Seminar performance(SEM), General


Proficiency(GP),Attendence(ATT) and Lab Work(LW) . Using this record
he used the classification technique in order to cluster them based upon the
percentage and good, average, poor. He also measured entropy so as to check
the impurity.
Romero [9], in his survey described the different studies carried out in this
field. He described different types of used techniques and educational
environments and also the similar work that is done related to educational
data mining. Data mining tools are normally designed in their flexibility
rather than their simplicity. He explains both the aspects where on the one
hand the user has to select the algorithm to carry out with the given data and
on the other hand the algorithm has to be configured before its execution.
XML, PMML, OWL, RDF, SCORM are some of the current data mining
tools.
Calders [11], presents their works in which there are four EDM different
paper that represent a crosscut of different applications areas of data mining.
M. Abu Tair [12], in their research work presented a case study to improve
the students performance mining the data. They extracted the useful
knowledge from the database and after preprocessing the data applied mining
techniques such as association, outlier detection rules and classification.

4. METHODOLOGY
In this paper we have used Weka 3.7 as a comparison tool. Weka contains
the collection of machine learning algorithms. We can perform classification,
clustering, association rules, pre-processing and visualization of the data. In
our work we have created a dataset containing 7 attributes in excel file using
CSV file format. The process that is carried out in this paper is described
below.

Data Mining Process:


The data sets used in this study is obtained from the database of the students
from one of the educational institutions. Initially the marks of the students
are stored such as their 10th percentage, 12th percentage, semester marks.
First of all the marks are collected of different students in table and then the
processing is carried on. Using Weka 3.7 we can classify and cluster the data
available in our dataset. Weka 3.7 is one of the most efficient comparison
tool when its related to data mining techniques. The results can be tabulated
very fast and accurate through this tool. When we perform such tasks

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 29


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

manually without any involvement of tools then it becomes cumbersome to


reach to the desired results as well as the tasks becomes time taking.
In the first phase, clustering technique k-means clustering is used through
the Weka 3.7. From the given options in the Weka 3.7 tool we choose the
simple k-means algorithm and then through parameterise option we allot the
number of clusters to be formed.
In the second phase, classification technique decision tree is used through
Weka 3.7. It helps in visualizing the tree structure of the input dataset.
Then the results of both the algorithms are analyzed and the results are
predicted as to which algorithm is more efficient and preferable.

Figure 1. Data Collected through Students Database

The above shown is the dataset collected from the universitys database. The
excel file is saved using csv extension; csv denotes comma separated
values which will be used as an input file. Benefit of using this extension is
that data is automatically tabulated as the separated values while the
processing in the Weka 3.7 tool.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 30


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

4.1. CLUSTERING
A clustering is a method in which clusters or groups are formed. It helps in
grouping or collecting the elements of the same kind in one class or group.
These elements are of same type and pattern and are different to those that
belong to different groupings. This can be said as one of the main tasks of
data mining and also a common technique for statistical data analysis. It is
used in many different fields such as machine learning, pattern recognition,
bioinformatics, image analysis and information retrieval. There are many
types of clustering algorithms such as hierarchical clustering, k-means
algorithm, Expectation maximization algorithm (EM), Density-Based Spatial
Clustering Of Applications With Noise (DBSCAN) , Biclustering algorithm,
Fuzzy clustering.

K-Means Clustering:
The term k-means was first coined by James MacQueen in 1967. It uses an
iterative refinement technique. In simple words , it is the algorithm that is
used to classify or collect the group of objects that is based or classified
according to K number of attributes, here there are 7 attributes according to
which clustering is done. This grouping is done by calculating the sum of
square of distances between the given data and cluster centroid and the
elements having the minimum distance are grouped together.
The basic steps of k-means are following:
Find the centroid.
Calculate the distance of each objects to the determined centroid.
Group the objects based on the minimum distance, i.e. the closest
centroid.

Figure 2. Pre-processing the input .csv file

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 31


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Firstly, we pre-process the data before being clustered using Weka 3.7 tool.
It helps in to transform the unmanaged data to the understandable format. It
also helps in preparing data for further processing. The steps that data
undergo during pre-processing are: data cleaning, integration,
transformation, reduction and discretion.
After this we apply K-means Clustering through Weka in which the input is
the dataset collected.

Figure 3. K-means applied using 3 clusters

We parameterise number of clusters equal to 3. And then the results are


tabulated. In Weka 3.7 the default clusters are 2 but in this processing 3
clusters are formed that could be related to Grade A, B, C. That is, here we
parameterise the number of clusters into which we want our data to be
clustered.

Figure 4. Clusters 0, 1, 2 are formed after applying algorithm

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 32


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Therefore we can see that 3 clusters are formed shown within the circle.
(0,1,2) [Figure 4]. Cluster 0 contains 24 elements, Cluster 1 contains 26
elements and lastly Cluster 2 contains 9 elements. From the following figure
we can see the cluster visualization for our input dataset provided . We can
also parametrise the data using x axis and y axis values. Accordingly to
which the data can be visualized.

Figure 5. Cluster Visualization

From the tool we can access that in this the percentage of errors are
17.792%.

4.2. DECISION TREES


After clustering all the datasets given, we will perform the decision tree
algorithm on the same dataset. In decision trees the data is categorised in the
form of the tree where at the end i.e. the leaf nodes, depict the classes which
contain the datasets. It is one of the most simple and precise technique so as
to mine the data and get the result efficiently. Thus, this study will also help
the students in order to improve their performance for the future grades.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 33


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Figure 6. Decision Tree Implementation

Figure 7. Decision Tree Visualization

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 34


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

As in this case all the datasets were initially classified and then the results
are passed through the decision trees algorithm and hence the results are
tabulated through visualizing the tree.
As in decision tree we can see that an accuracy of 87.751% is achieved

5. RESULTS
When we perform clustering after the datasets are clustered accordingly to
their relevant classes the numbers of iteration to form 3 clusters were 5. The
root mean squared error value is 17.793. As lower the number of mean
squared value will be the more efficient the algorithm will be.

Figure 8. Results generated after clustering algorithm analysis

Thereafter, performing the second technique of data mining i.e.


classification through decision trees algorithm. Here we have used J48
algorithm, J48 algorithm is the implementation of ID3 algorithm. It helps in
creating univariate decision trees. The results retrieved after the algorithm
implementation is shown below,

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 35


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Figure 9. Results generated after Decision trees algorithm implementation

From the above decision tree algorithm results we can see that the root
mean squared error in this is equal to 0.216, which is much less than that of
clustering algorithm.
From the above results we can deduce that decision tree algorithm i.e.
classification technique is more preferable over clustering.

6. CONCLUSION
In this paper, two algorithms are used so as to predict the performance of
both the algorithms. They are applied on the marks of the student retrieved
from the database of the university so as to grade the students based on their
up to date performances. Here we have used the technique of clustering,
decision trees in order to mine the data as the huge amount of data is
available in the university containing the students record so it is required to
refine the data so that the results could be used for the future evaluation.
First of all we evaluated the performance of the clustering algorithm and
then secondly we evaluated the performance of decision trees algorithm and
then the judgement is made as to which algorithm performance is suitable.
And after performing both the techniques it is concluded that decision tree
using J48 algorithm is more efficient than clustering k-means technique.
The accuracy achieved through decision tree is much more than that
achieved through clustering. So it is concluded that classification techniques
are preferable than the clustering techniques.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 36


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

REFERENCES
1. LAKSHMI ,D.BHU., . ARUNDATHI , S. and DR.JAGADEESH, Data Mining: A
prediction for Student's Performance Using Decision Tree ID3 Method, July 2014.
2. Ali ,Mohd. Maqsood., Role Of Data Mining In Education Sector, April 2013.
3. DeLaFayette Winters, Titus., Educational Data Mining: Collection and Analysis of
Score Matrices for Outcomes-Based Assessment, June 2006.
4. Torre ,Javier., Rodriguez ,Alejandro., Colomo, Ricardo., Jimenez ,Enrique. and Alor,
Giner., Improving Accuracy of Decision Trees Using Clustering Techniques,
February 2013.
5. Ranganathan ,Sindhuja., Improvements To K-Means Clustering, August 2013.
6. Sembiring ,Sajadin. , An Application Of Predicting Student Performance Using
Kernel K-Means And Smooth Support Vector Machine, August 2012.
7. Singh Bhullar , Manpreet., Member IAENG, Use of Data Mining in Education
Sector, October 2012.
8. Kumar Baradwaj , Brijesh. , Mining Educational Data to Analyze Students
Performance,2011.
9. Romero ,Cristobel. ,, Educational Data Mining: A Review of the State-of-the-
Art,Member, IEEE, Sebastian Ventura, Senior Member, IEEE2010.
10. Patel ,Ketul B. , Chauhan , Jignesh A. and Patel, Jigar D. , Web Mining in E-
Commerce: Pattern Discovery, Issues and Applications, 2011.
11. Calders ,Toon . and Pechenziky ,Mysoka.,Introduction to Special Section On
Educational Data Mining, Volume 13 Issue 2.2013.
12. Tair , Mohammad M.Abu . and El-Halees, Alaa M. , Mining Educational Data to
Improve Students Performance: A Case Study, International Journal of Information
and Communication Technology Research, Volume 2 No. 2, February 2012.
13. Huebner, Richard A., Norwich University, A survey of educational data mining
research, Research in Higher Education Journal.
14. Baker , Ryan SJD. and Yacef, Kalina., The State of Educational Data Mining in
2009: A Review and Future Visions, Journal of Educational Data Mining, Article 1,
Vol 1, No 1, Fall 2009.

This paper may be cited as:


Saxena, R., 2015. Educational Data Mining: Performance Evaluation of
Decision Tree and Clustering Techniques Using WEKA Platform.
International Journal of Computer Science and Business Informatics, Vol.
15, No. 2, pp. 26-37.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 37


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Hamiltonian cycle in graphs



Nguyen Huu Xuan Truong
Department of Economic Information System,
Academy of Finance, Vietnam

Vu Dinh Hoa
Department of Information Technology,
HaNoi National University of Education, Vietnam

ABSTRACT
Given a simple undirected graph with vertices, we denote by the minimum value of
the degree sum of any pairwise nonadjacent vertices. The graph is said to be
hamiltonian if it contains a hamiltonian cycle (a cycle passing all vertices of ). The
problem (Hamiltonian Cycle) is well-known a -problem. A lot of authors have
been studied Hamiltonian Cycles in graphs with large degree sums , but only for
= 1, 2, 3. In this paper, we study the structure of nonhamiltonian graphs satisfying
4 2, and we prove that the problem for the graphs satisfying 4 2 is for
< 1 and is for 1.
Keywords
hamiltonian cycle, , 4 .

1. INTRODUCTION
In this paper, we use definitions and notations in [4] with exception for
the complete graph on vertices. We consider only simple undirected
graphs. Given a graph = (, ) on vertices with the vertex set and the
edge set . A set () is independent if no two of its elements are
adjacent. The independent number of , denoted by (), is defined by
setting = { : () is independent}. We use () to denote
the number of connected components of . The graph is tough (or 1-
tough) if ( ) for every nonempty subset ().
For two disjoint graphs 1 and 2 , we denote by 1 2 the graph with the
vertex set (1 ) (2 ) and the edge set (1 ) (2 ) { |
1 , 2 }. For example, 2 3 = 5 . For a positive integer
, we define = { =1 : 1 , 2 , , is
independent}. In the case > , set () = ( ). Instead of (),
sometimes we simply write .
If contains a hamiltonian cycle (a cycle passing all vertices of ), then
is called hamiltonian; otherwise, is nonhamiltonian. A graph with a

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 38


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
hamiltonian path (a path passing all vertices of ) is said to be traceable.
Let be the cycle of length . The graph is said to be -connected if
is connected for any with < < . Note that a tough
graph is 2-connected, and toughness is a necessary condition for the
existence of a hamiltonian cycle in a graph [6]. There is a polynomial
algorithm 3 time to recognize 2-connected graph.
The problem , are well-known -problem [1] [10].
(HAMILTONIAN PATH)
Instance: Graph .
Question: Is traceable?
(HAMILTONIAN CYCLE)
Instance: Graph .
Question: Is hamiltonian?
A lot of authors have been studied Hamiltonian Cycles in graphs with large
degree sums , but only for = 1, 2, 3, (see [3] [5] [9], etc).
For a positive integer , we state the problem as follow:


Instance: Given a real > 0 and a graph satisfying 2 .
Question: Is hamiltonian?
In [7], [8], we prove that:

Theorem 1.1 [7]. 2( < 1) is and 2( 1) is .

Theorem 1.2 [8]. 3( < 1) is and 3( 1) is .

In this paper, we study the class of graphs satisfying 4 2 for the


problem 4.

2. RESULTS
The following Theorem will be proved in Section 5.

Theorem 2.1. Let be 2-connected graph with 4 2. If is


nonhamiltonian then = 3 and belongs to one of the following three
classes of graphs:
1. Class 1 of 2-connected graphs with = 3 such that there exists
a subset , = 2 so that = 1 2 3 .

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 39


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Figure 1. Class .
2. Class 2 of 2-connected graphs with = 3 such that there exists
three disjoint complete graphs 1 , 2 , 3 and a vertex ()
and 1 1 , 2 2 , 3 3 so that = ( 1 2
3 ) + 1 2 , 2 3 , 3 1 . Moreover, there exists three vertices
1 1 1 , 2 2 2 , 3 3 3 such that 1 , 2 , 3
() and can possibly be adjacent to the another vertices.

Figure 2. Class .
3. Class 3 of 2-connected graphs with = 3 such that there exists
three disjoint complete graphs 1 , 2 , 3 ( 1 , 2 , 3
3) and distinct vertices , for = 1, 2, 3 so that = 1
2 3 + 1 2 , 2 3 , 3 1 + 1 2 , 2 3 , 3 1 .

Figure 3. Class .

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 40


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Note that the graph = 1 1 (3 5 ) with 11 satisfies
4 2 and is not 2-connected. In Section 3, we give polynomial
algorithms to recognize whether a given graph belongs to 1 2 3 .
From Theorem 2.1, we conclude the following corollary.

Corollary 2.1. Every 2-connected graph with 4 and 4 2 is


hamiltonian.

For < 1, we prove the following Theorem:

Theorem 2.2. 4 ( < 1) is .

Proof. The 4 is a subproblem of , so it belongs to . In order to


prove 4 ( < 1) is , we will construct a polynomial transformation
from the problem to it.
For any graph 1 with 1 vertices, we choose a positive integer
1 1
2(1)
, 5 . Then we construct a graph 2 from 1 by adding new
vertex set 1 , 2 , , 1 , 2 , , 1 and the edges joining each
vertex of 1 , 2 , , to all other vertices. In this way, we obtain the
graph 2 = 1 1 . This construction can be proceeded with the
Turing machine in polynomial time.
We observe that the graph 2 has 2 = 1 + 2 1 vertices and 4 2 =
1 1
4. Because of 2(1)
, so 2 (1 + 2 1), it implies that
4 2 22 .
Now we prove that 2 has a hamiltonian cycle if and only if 1 has a
hamiltonian path. Indeed, if 1 has a hamiltonian path then =
(, 1 , 1 , 2 , 2 , , 1 , 1 , ) is a hamiltonian cycle in 2 .
If 2 has a hamiltonian cycle . Observe that ( = 1. . 1) has only
neighbor ( = 1. . ), so all vertices in 1 , 2 , , 1 are only
adjacent to all the vertices in 1 , 2 , , . Then, if we remove all vertices
in 1 , 2 , , then we obtain connected components, which are
1 , 2 , , . 1 and 1 , each of the connected components has a
hamiltonian path (the rest of after removing 1 , 2 , , ). Therefore,
1 has a hamiltonian path.
Thus, we have a polynomial transformation from to 4( < 1). Since
4( < 1) and , it implies that 4 ( < 1) .

Theorem 2.3. 4 ( 1) is .

Proof. Assume that satisfies 4 2 with 1. First, we check


whether is 2-connected or not (it can be done in polynomial time).

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 41


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
If is not 2-connected then is nonhamiltonian.
If is 2-connected, then by Theorem 2.1, either is hamiltonian or
belongs to 1 2 3 which can be recognize in polynomial time (see
Section 3). Thus, 4 ( 1) is .

3. POLYNOMIAL ALGORITHMS RECOGNIZING THE CLASSES


, ,
Assume that () and 1 , 2 , . , are connected components of
. Note that the problem Given a vertex set in a graph , determine
and whether every connected component of is complete
can be solved in polynomial time by an algorithm (2 ). Following, we
design the polynomial algorithms recognizing the classes 1 , 2 , 3 .
3.1. Algorithmrecognizing the class
Every graph in class 1 is not 1-tough. If we remove , then we get three
connected components which are complete.
Input: graph with 4 2.
Output: Is_Graph_1 return True if 1 , else return False.
Algorithm:
Function Boolean Is_Graph_
Begin
If is not 2-connected Then Return False;
For each in ()2 do
If ( = 3) and (the connected components
1 , 2 , 3 are complete) Then Return True;
Return False;
End;
Checking is not 2-connected can be done by (2 ) time. Next, there are
2 iterations, each iteration requires (2 ) time. Thus the overall time
required by algorithm Is_Graph_1 is (4 ).
3.2. Algorithm recognizing the class
For each graph in class 2 , if we remove = {, 1 , 2 , 3 }, then we get
three connected components 1 , 2 , 3 which are complete.
Input: graph with 4 2.
Output: Is_Graph_2 return True if 2 , else return False.
Algorithm:
Function Boolean Is_Graph_ ;
Begin

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 42


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
For each in ()4 do
If ( = 3) and (the connected components
1 , 2 , 3 are complete) Then
If there exists and = {1 , 2 , 3 } such
that:
( 1 () , 2 () , 3 () 1) and
( 1 2 , 2 3 , 3 1 ()) and
(1 + 1 , 2 + 2 , 3 + 3 are complete)
Then Return True;
Return False;
End;
There are 4 iterations, each iteration requires (2 ) time, so the overall
time required by algorithm Is_Graph_2 is (6 ).
3.3. Algorithm recognizing the class
For each graph in class 3 , if we remove = {1 , 2 , 3 , 1 , 2 , 3 }, then
we get three connected components 1 , 2 , 3 which are complete.
Input: graph with 4 2.
Output: Is_Graph_3 return True if 3 , else return False.
Algorithm:
Function Boolean Is_Graph_
Begin
For each in ()6 do
If ( = 3) and (the connected components
1 , 2 , 3 are complete graphs) Then
If there exists 1 , 2 , 3 and 1 , 2 , 3 =
{1 , 2 , 3 } such that:
( 1 2 , 2 3 , 3 1 , 1 2 , 2 3 , 3 1 ()) and
(1 + 1 , 1 , 2 + 2 , 2 , 3 + 3 , 3 are complete)
Then Return True;
Return False;
End;
There are 6 iterations, each iteration requires (2 ) time, so the overall
time required by algorithm Is_Graph_3 is (8 ).

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 43


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
4. PRELIMINARIES
For what follows we assume that is a longest cycle of . On ( with a
given orientation), we denote the predecessor and successor (along ) by
, +, and ++ = ( +)+, = ( ). In general, for a positive integer ,
+ = ( + 1 )+ and = ( 1 ). Moreover, for a vertex set
(), we wirte + = +: and = : . The path joining
two vertices and of , along , is denoted by , and the same path
in reverse order are given by .
In this paper, we consider the paths and cycles as vertex sets. If , are the
end vertices of a path , sometimes we write instead of .
Assume that is a connected component of and () is the set of
neighbors in of all vertices in . A edge sequence is a path joining two
vertices on and its inner vertices belong to . In particular, an
edge joining 2 non-consecutive vertices on is also a edge sequence.

Lemma 4.1. Let be a 2-connected graph. If is nonhamiltonian and is


a connected component of then
+
(a) = = .
(b) There is no edge sequence joining 2 vertices of ()+. Similarly, there
is no edge sequence joining 2 vertices of ().
(c) If , () for then there is no vertex + such that
{+ +, +} (). Similarly, there is no vertex + such that
{+ +, +} ().
(d) For any and for any (), + + 1.

Proof. (a), (b), (c) are presented in [2], so we will prove (d). For any ,
= () + () 1 + () . By (a) and (b), +
+ = () , so + +
1 + + = 1 = 1.

Lemma 4.2 [2]. Assume that , are nonadjacent vertices and +


() . Then is hamiltonian if and only if + is hamiltonian.

We conclude the following Lemma from Lemma 4.2.

Lemma 4.3. Assume that such that = () and +


() for any edge (). Then is hamiltonian if and
only if is hamiltonian.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 44


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
5. PROOFS OF THEOREM 2.1
For what follows, we assume that is nonhamiltonian. Because is 2-
connected, so is cycleable. Let 1 , 2 , , be the connected
components of . Clearly, ( ) 2 for any = 1. . .

Proposition 5.1. 1 , 2 , , are complete graphs.

Proof. We consider a connected component ( = 1. . ). Because is 2-


connected, so ( ) 2 and there are at least two vertices ,
( ). If is not complete then there are two distinct vertices ,
such that (). By Lemma 4.1 (a, b), , , +, + is an independent
vertex set, therefore by 4 2, + + + + (+) 2.
However, by Lemma 4.1 (d), + + 1 and + (+)
1, it implies that + + + + (+) 2 2, a
contradiction. Thus, is complete, and we have 1 , 2 , , are
complete graphs.

Proposition 5.2. ( ) 2
for every = 1. . .

Proof. By Lemma 4.1 (a), + = , therefore



+ = 2 , it implies that ( ) .
2

Proposition 5.3. = 1.

Proof. We consider the case of as follow:


a) 4.
Let for each = 1. .4. Clearly, the vertex set 1 , 2 , 3 , 4 is
independent, by 4 2 we have 1 + 2 + 3 + (4 ) 2.
Moreover, by Proposition 5.2 and 4, 1 +

1 + 2 , so 1 + 2 + 3 + 4 1 + 2 + 3 +
4 + 2 4 + 4, therefore + 4 2, it implies that
+ 4, a contradiction.
Thus, the case 4 does not happen.
b) = 3.
Let 1 , 2 , 3 and we consider each vertex (1 ).
Claim 5.1. + () ().
Proof. Assume to the contrary that + () (), then the vertex
set , , , + is independent, by 4 2 we have + +
+ (+) 2. By Lemma 4.1 (d), + + 1.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 45


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Moreover, by Proposition 5.2 we have () 2 1 + 2
and

() 3 1 + 2 . Therefore, + + + +
3 + 2 + 3 + < 2 3, a contradiction.
Claim 5.2. (1 ) = (2 ) = (3 ) = 2.
Proof. If (1 ) 3 then by Claim 5.1, there are at least two vertices
, (1 ) such that +, + () or +, + (), therefore
there exists an edge sequence joining +, +, which contradicts to
Lemma 4.1 (b). Thus, (1 ) = 2. Similarly, we have (2 ) =
(3 ) = 2.
Claim 5.3. 5 6.
Proof. If there exists such that (1 ) (2 ) (3 ),
then the vertex set , , , is independent, by 4 2 we have
+ + + () 2. However, 1 1 +
1 = 1 + 1, 2 + 1, 3 + 1,
1, so + + + 1 + 2 + 3 + + 2 =
+ 2. It implies that + 2 2 and 2, a contradiction. Therefore,
1 2 3 = , and by Claim 5.2, 6.
Moreover, by Lemma 4.1 (a), 4. If = 4 then by Lemma 4.1 (a)
and Claim 5.1, there exists an edge sequence joining two vertices in
1 +, which contradicts Lemma 4.1 (b). Thus, we have 5 6.
If = 5, so = (1 , 2 , 3 , 4 , 5 ). Without loss of generality, by Lemma
4.1 (a, b) and Claim 5.1, we assume that 1 , 3 (1 ), 2 (2 ),
4 (3 ). Then, 5 (2 ) and 2 + = 1 , 3 . It implies that
there exists an edge sequence joining two vertices in 2 +, which
contradicts Lemma 4.1 (b). Therefore, by Claim 5.3, = 6, so =
(1 , 2 , 3 , 4 , 5 , 6 ) and by Claim 5.2, 1 2 = 2
3 = 1 3 = . Without loss of generality, by Lemma
4.1 (a, b) and Claim 5.1, there are two possible case as follow:
(1) Case 1 , 3 (1 ), 2 (2 ), 4 (3 ). Observe that
6 (3 ) and 5 (2 ). Let 1 , 2 , 3 be the paths in
1 , 2 , 3 joining the pair of vertices 1 , 3 , 2 , 5 , 4 , 6
respectively. Then, we have = (1 1 3 2 2 5 4 3 6 1 ) is longer
than , which contradicts the fact that is a longest cycle of .
(2) Case 1 , 4 (1 ), 2 (2 ), 5 (3 ). Observe that
6 (2 ) and 3 (3 ). Let 1 , 2 , 3 be the paths in
1 , 2 , 3 joining the pair of vertices 1 , 4 , 2 , 6 , 3 , 5
respectively. The, we have = (1 1 4 , 3 3 5 , 6 2 2 , 1 ) is
longer than , a contradiction.
Thus, the case = 3 does not happen.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 46


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
c) = 2.
Without loss of generality, assume that 1 + (1 ) 2 +
(2 ) .
Claim 5.4. (1 ) = 2.
Proof. By (1 ) 2, assume to contrary that (1 ) 3. Let
1 , 2 . By Lemma 4.1 (b) there exists two vertices +, +
1 + (2 ). By Lemma 4.1 (a, b), the vertex set , , +, + is
independent, so + + + + (+) 2. By Lemma 4.1
(d), + + 1. Moreover, 2 1 + 2
1 1 + (1 ) , (+) 1 (1 ) . Therefore +
+ + + (+) 2 2, a contradiction. Thus, (1 ) = 2.
Claim 5.5. (2 ) = 2.
Proof. Assume to contrary that (2 ) 3. Arguing similarly the
proof of Claim 5.4, there exists two vertices +, + 2 +
(1 ). Let 1 , 2 . By Lemma 4.1 (a, b), the vertex set
, , +, + is independent, so + + + + (+) 2.
By Lemma 4.1 (b) and by + (1 ) (2 ), +
(2 ) . Moreover, 1 1 + 1 = 1 + 1,
2 1 + 2 , (+) 2 2 4.
Therefore, + + + + (+) 1 + 2 + +
4 = 2 4, a contradiction. Thus, (2 ) = 2.
By arguing similarly above, observe that (1 )+ (2 ) = 1 =
(2 )+ (1 ) . Without loss of generality, we assume that 2 =
, +2 , 1 = , + with +3 and + . Because is 2-
connected and is a longest cycle of , so 2 = 1, i.e 2 = . Let
1 and 1 be the path in 1 joining + to .

Figure 4. Illustrating the proofs of part c), Proposition 5.3.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 47


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
If +3 + () then = ( +2 +1 +3 + ) is longer than ,
a contradiction. Therefore, +3 + () and the vertex set , , +3 , +
is independent, so + + (+3 ) + (+) 2. However, by
Lemma 4.1 (d), + (+3 ) 1 and + (+) 1, it
implies that + + (+3 ) + (+) 2 2, a contradiction.
Thus, the case = 2 does not happen.

By these case a), b), c) do not happen, we finish the proof that = 1. Then
has only one connected component. For what follows, let be the
connected component of . The fact that = . By Proposition
5.1, is complete.

Proposition 5.4. () = 2.

Proof. Clearly, () 2 by is 2-connected. Assume that ()


3. For any two vertices , , let , and
, then by Lemma 4.1 (b) the vertex set , +, +, + is independent. So
+ + + (+) + (+) 2. However, by Lemma 4.1 (d),
+ + 1, it implies that + + (+) + 1.
By is 2-connected and is complete, there exists two vertices 0 , 0
() and a hamiltonian path in joining 0 to 0 . Then =
(0 0 +0 +0 0 ) is a hamiltonian cycle of graph = + +0 +0 , i.e.
is hamiltonian. By Lemma 4.2, is hamiltonian if and only if is
hamiltonian, therefore is hamiltonian, which contradicts to the assumption
that is nonhamiltonian. Thus, () = 2.

For what follows, let , be two vertices of () and let be the


hamiltonian path of joining , .

Proposition 5.5. + (+) = +, + .

Proof. Assume to the contrary that there exists +, + such that


(+) (+). Clearly, , . Let , then by Lemma
4.1 (b), the vertex set , +, +, is independent, so + + +
(+) + ( ) 2. However, + 1, 3, it
implies that + + (+) 2 + 2 = + 2. Therefore, by
Lemma 4.2, is hamiltonian if and only if = + ++ is hamiltonian.
Observe that = ( ++ ) is a hamiltonian cycle of , i.e
is hamiltonian, it implies that is hamiltonian, a contradiction.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 48


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Now we consider two case of toughness of .

I. is not 1-tough
By is not 1-tough, there exists a vertex set such that has at
least S + 1 connected components. By is 2-connected, 2. Since
+ 1 so 2 1.
Claim 5.6. = .
Proof. Observe that = is 1-tough, if = then
= , which contradicts to the fact that
+ 1. Therefore, . Let = , = . If 1
then 1 + 1 + , a contradiction.
Thus, = 0, i.e = .
Observe that , , otherwise ( ) , a
contradiction. Therefore, is a connected component of . Let
, 1 , 2 , , ( ) be the connected components of .
Claim 5.7. = = 2
Proof. Assume that 3. Let , 1 1 , 2 2 , 3 3 , then the
vertex set , 1 , 2 , 3 is independent, so + 1 + 2 +
(3 ) 2. Observe that + 1 and 1 +
for any = 1, 2, 3. Therefore, + 1 + 2 + 3 +
1 + 2 + 3 + 3 2 2 2 + + 3 = 2 +
+ 1. It implies that 2 + + 1 2, i.e 2 + 1
+ 2 (by 3), which contradicts to the fact that 2 S n 1.
Therefore k 2. By 2, we have = = 2.
By Claim 5.7 and by , we have = , and has three
connected components, such as , 1 , 2 . By Proposition 5.5, 1 = ( +
+ , and 2 = ( + (+) , .
Claim 5.8. 1 , 2 is complete.
Proof. Assume that 1 is not complete. Then there exists pair of
nonadjacent vertices , 1 . Let , then the vertex set , , , +
is independent, so + + + (+) 2. However,
+ 1, (+) 2 + 1, () 1 , () 1 . Therefore
+ + + (+) + 2 1 + 2 + 2 = + 1 . It
implies that + 1 2, i.e 1 , a contradiction. Thus, 1 is
complete. Similarly, we have 2 is complete.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 49


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Figure 5. Graph belongs to class .


Clearly, 3 () 5. If 4, there exists a independent set of four
vertices, whose elements are , 1 , 2 and a vertex in S
(without loss of generality, assume that the vertex in is ). By 4 2,
we have + + + 2. However, + 1,
4, () 1 , () 2 . It implies that + +
+ + 1 + 2 + 3 = 2 5, a contradiction. Thus,
= 3.
Conclude that in this Case is not 1-tough, belongs to class 1 .

II. is 1-tough
Let 1 = + + , 2 = (+ ) + . By Lemma 4.1 (c) and Proposition
5.5, we have 1 , 2 are two paths on satisfying , + , +2 1 ,
, +, +2 2 , 1 2 = and if 1 2 then is an end vertex of both
1 , 2 .
Let 1 = 1 , 2 = 2 . Clearly, 1 2 2. We consider three
case of 1 2 .

Case 1. 1 2 = .

Figure 6. Illustrating the Case 1.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 50


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Observe that there exists an edge joining a vertex 1 to a vertex
2 , otherwise , = 3, which contradicts to the fact that
is 1-tough.
If there exists pair of nonadjacent vertices 1 , 2 1 , let , then the
vertex set , 1 , 2 , + is independent, so + (1 ) + (2 ) +
(+) 2. By Lemma 4.1 (d), + (+ ) 1, we have (1 ) +
(2 ) + 1. By Lemma 4.2, is hamiltonian if and only if = +
1 2 is hamiltonian.
Arguing similarly, for any pair of nonadjacent vertices 1 , 2 2 , we
have ( 1 ) + ( 2 ) + 1 and is hamiltonian if and only if = +
1 2 is hamiltonian.
Let be the graph obtain from by adding new edges joining all pair of
nonadjacent vertices in the same set 1 , respectively in A2 . By Lemma 4.3,
is hamiltonian if and only if is hamiltonian. We consider graph ,
let 1 be the hamiltonian path of 1 joining + to , and let 2 be the
hamiltonian path of 2 joining to +. Then, we have
= ( +1 2 + ) is a hamiltonian cycle in , i.e is
hamiltonian. Therefore, is hamiltonian, a contradiction.
Thus, the Case 1 does not happen.

Case 2. 1 2 = 1.

Let 1 2 = . Without loss of generality, assume that +2


(+2 ).

Figure 7. Illustrating the Case 2.

Case 2.1. +2 .
If > 1 then = ( + ) is longer than . Therefore,
= 1, let = . If (+) then = ( + + ) is
a hamiltonian cycle in a contradiction. Therefore, (+), and by
Proposition 5.5, (+) and + = 2.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 51


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
We consider subgraph 2 = 2 = , . If there exists
pair of nonadjacent vertices 1 , 2 2 then the vertex set , +, 1 , 2
is independent, so + + + (1 ) + (2 ) 2. However,
= + = 2 and (1 ), (2 ) 3 = 4, therefore
+ + + (1 ) + (2 ) 2 4, a contradiction. Thus, 2 is
complete.
If then , 2 , so (), which contradicts to
Lemma 4.1 (b). Therefore .
Because is 1-tough, nonhamiltonian, so 7 and +. If there
exists a vertex +2 is adjacent to then we have =
( + + ) is a hamiltonian cycle in , a contradiction.
Therefore, is not adjacent to all vertices in +2 .

Similarly, if there exists a vertex +2 is adjacent to then we


have = ( + ) is a hamiltonian cycle in , a
contradiction. Therefore, is not adjacent to all vertices in +2 .
Conclude that the graph is shown in Figure 8, can possibly be adjacent
to another vertices:

Figure 8. Graph belongs to class .


Clearly, = 3 and belongs to class 2 .

Case 2.2. +2 and .

Clearly, +. If (), then = ( + ) is


a hamiltonian cycle of , a contradiction. Therefore, (). By
and by Lemma 4.1 (b), (), so + by (+ ).
We have the following Claims.
Claim 5.9. 2 1 .
Proof. Assume that 1 . Let , then the vertex set
, +, , is independent, so + (+) + () + ()

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 52


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
2. By Lemma 4.1 (d), + (+) 1 and +
+ 1. Therefore, by Lemma 4.2, is hamiltonian if and only if =
+ is hamiltonian. Observe that = ( + )
is a hamiltonian cycle of , so and are hamiltonian, a contradiction.
Thus, 1 , and by 1 2 = we have 2 1 .

Figure 9. Illustrating the Claim 5.9.

Let 1 = 1 = + , 2 = 2 = + . By +
and by + we have 1 , 2 2. Arguing similarly, for any pair of
nonadjacent vertices (, ) in the same set 1 , respectively in 2 , we have
+ () + 1.
Claim 5.10. There are no edges joining a vertex in 1 to a vertex in 2 .
Proof. Assume to the contrary that there exists an edge joining 1 1
to 2 2 . Clearly, 1 +, 2 +. Let be the graph obtain
from by adding new edges joining all pair of nonadjacent vertices in
the same set 1 , respectively in 2 . By Lemma 4.3, is hamiltonian if
and only if is hamiltonian.
We consider graph , observe that 1 , 2 are complete. Let 1 be the
hamiltonian path in 1 joining 1 to + and let 2 be the path in 2
joining + to 2 . Then, = ( +2 2 1 1 + ) is a
hamiltonian cycle of , i.e is hamiltonian, it implies that is
hamiltonian, a contradiction.
Claim 5.11. 1 , 2 are complete.
Proof. Assume that there exists a pair of nonadjacent vertices 1 , 2
1 . Let , then the vertex set , 1 , 2 , + is independent, so
+ (1 ) + (2 ) + (+) 2. However, + 1,
(+) 2 + 2 and (1 ), (2 ) 1 + 1, therefore +
(1 ) + (2 ) + (+) + 2 1 + 2 + 5 = + 1 + 2. It
implies that 1 2, a contradiction. Thus, 1 is complete.
Similarly, we have 2 is complete.
Claim 5.12. , are not adjacent to any vertex in 2 + .

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 53


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Proof. Assume that is adjacent to a vertex 2 + . By Claim
5.11, let 1 be the hamiltonian path of 1 joining to +, and let 2
be the hamiltonian path of 2 joining + to . Then,
= ( + 2 1 + ) is a hamiltonian cycle of , a
contradiction. Similarly, if is adjacent to a vertex 2 + , let
2 be the hamiltonian path of 2 joining to +, then =
( 2 + 1 + ) is a hamiltonian cycle of , a
contradiction. Thus, , are not adjacent to any vertex in 2 + .
Claim 5.13. is not adjacent to any vertex in 1 .
Proof. Assume to the contrary that is adjacent to a vertex 1 . Let
2 be the hamiltonian path of 2 joining + to . It happens as one of
two following case:
(1) Case +: Let 1 be the hamiltonian path of 1 joining to +,
we have = ( 1 + +2 ) is a hamiltonian cycle
of , a contradiction.
(2) Case +: Let 1 be the hamiltonian path of 1 joining to ,
we have = ( 1 +2 ) is a hamiltonian cycle
of , a contradiction.
Claim 5.14. is adjacent to all vertices in .
Proof. Assume to the contrary that is not adjacent to a vertex .
Let 1 1 , 2 2 + , then by Claims 5.10, 5.12, 5.13, the
vertex set , , 1 , 2 is independent, so + ( ) + (1 ) +
(2 ) 2. However, , ( ) + 2, (1 ) 1 +
1, (2 ) 2 . Therefore, 2 + ( ) + (1 ) + (2 )
2 + 1 + 2 + 3 = + , it implies that , a
contradiction.
Claim 5.15. is adjacent to all vertices in 1 .
Proof. Assume to the contrary that is not a vertex 1 1 . Let
and 2 2 + . Then by Claim 5.10 and by Claim 5.12, the vertex
set , , 1 , 2 is independent, so + ( ) + (1 ) +
(2 ) 2. However, + 1, 1 + 2, (1 )
1 , (2 ) 2 . Therefore, + ( ) + (1 ) + (2 )
+ 2 1 + 2 + 3 = + 1 , it implies that 1 , a
contradiction.
Let 1 = + , by Claim 5.14, 1 is complete. By Claim 5.15, 1 =
1 + is complete. The graph is shown in Figure 10, in which,

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 54


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
1 , 1 , 2 are complete and 1 , 1 , 2 2. Moreover, the vertex can
possibly be adjacent to another vertices.

Figure 10. Graph belongs to class .


Clearly, 3 () 4. If = 4, then there exists 1 , 1 ,
2 such that the vertex set , , , is independent, so + +
+ ( ) 2. However, + + 1 + 1 + 2
1 = 2, therefore + 2, a contradiction. Thus = 3.
Conclude that in this Case 2.2, belongs to class 2 .

Case 2.3. +2 and .


Arguing similarly the proofs of Case 2.2, let 1 = 1 and 2 = 2
, then for any pair of nonadjacent vertices (, ) together in 1 or 2 ,
we have + () + 1. Observe that + 1 and , +
+ 2 .
Let be the graph obtain from by adding new edges joining all pair of
nonadjacent vertices in the same set 1 , respectively in 2 . By Lemma 4.3,
is hamiltonian if and only if is hamiltonian. We consider graph , let
1 be the hamiltonian path of 1 joining to +, and let 2 be the
hamiltonian path of 2 joining + to +. Then, we have
= ( +2 + 1 + ) is a hamiltonian cycle of , i.e is
hamiltonian. Therefore, is hamiltonian, a contradiction.
Thus, the Case 2.3 does not happen.

Case 3. 1 2 = 2.

Let 1 2 = , . Without loss of generality, we assume that


+2 (+2 ) and +2 (+2 ). Let 1 = 1 , ,
2 = 2 , . Arguing similarly the proofs of Case 2.2, for any pair of
nonadjacent vertices (, ) in the same set 1 , respectively in 2 , we get
+ () + 1.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 55


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Figure 11. Illustrating the Case 3.

Case 3.1. +2 or +2 .
Without loss of generality, assume that +2 . If (+) then we
have = ( + + ) is a hamiltonian cycle of , a
contradiction. Therefore (+), i.e 1 and 2 . It implies
that there is no vertex +2 such that 1 2 , a
contradiction.
Thus, the Case 3.1 does not happen.

Case 3.2. ( +2 and ) or ( +2 and ).


Without loss of generality, assume that +2 and . We have the
following Claims:
Claim 5.16. .
Proof. Assume to the contrary that . Arguing similarly the proofs
of Case 2.2, we have () and + + 1. By
Lemma 4.2, is hamiltonian if and only if = + is
hamiltonian. Observe that = ( + ) is a
hamiltonian cycle of , i.e is hamiltoniania. It implies that is
hamiltonian, a contradiction.
Claim 5.17. 1 , 2 2. Moreover, 2 2 {+}.
Proof. Because of +, 1 , so 1 2. If 2 +, then =
( + + ) is a hamiltonian cycle of , a contradiction.
Therefore, 2 +. By Claim 5.16 we have 2 2 {+} and
2 2.
Claim 5.18. There are no edges joining a vertex in 1 to a vertex in 2 .
Proof. Assume to the contrary that there exists 1 1 , 2 2 such
that 1 2 (). Observe that 1 + and 2 +. Let be the
graph obtain from by adding new edges joining all pair of nonadjacent

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 56


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
vertices in the same set 1 , respectively in 2 (note that their degree sum
is greater than + 1). By Lemma 4.3, is hamiltonian if and only if
is hamiltonian. We consider the graph , let 1 be the hamiltonian path
of 1 joining + to 1 , and let 2 be the hamiltonian path of 2 joining
2 to +. Then = ( +1 1 2 2 + ) is a hamiltonian
cycle of , i.e is hamiltonian. It implies that is hamiltonian, a
contradiction.
Claim 5.19. 1 , 2 are complete.
Proof. Assume that there exists a pair of nonadjacent vertices ,
1 . Let , then the vertex set , , , + is independent, so
+ ( ) + ( ) + (+) 2. However, + 1,
+
( ) 2 + 2 and ( ), ( ) 1 + 2. Therefore, +
( ) + ( ) + (+) + 2 1 + 2 + 7 = + 1 + 3. It
implies that 1 3, a contradiction. Thus, 1 is complete.
Similarly, 2 is complete.
Claim 5.20. is not adjacent to all vertices in 1 + .
Proof. Assume to the contrary that is adjacent to 1 1 + . By
Claim 5.19, let 1 be the hamiltonian path of 1 joining + to 1 . We
have = ( + +1 1 ) is a hamiltonian cycle of , a
contradiction.
Claim 5.21. is not adjacent to all vertices in 2 .
Proof. Assume to the contrary that is adjacent to 2 2 . Observe
that 2 +, otherwise = ( + + ) is a hamiltonian
cycle of , a contradiction. Let 2 be the hamiltonian path of 2 joining
+ to 2 . Then, = ( ++2 2 ) is a hamiltonian
cycle of , a contradiction.

Figure 12. Illustrating the proof of Claim 5.21.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 57


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Similarly the proofs of Claim 5.20 and Claim 5.21, we have:
Claim 5.22. is not adjacent to all vertices in 1 (2 + ).
Claim 5.23. , are adjacent to all vertices in .
Proof. Assume that is not adjacent to . Let 1 + ,
2 + . Then by Claims 5.18, 5.20, 5.21, the vertex set
, , , is independent, so + + ( ) + ( ) 2.
However, () , + 3, ( ) 1 + 1, ( )
2 + 1, therefore + + ( ) + ( ) 2 + 1 +
2 + 5 = + + 1. It implies that 1, a contradiction.
Thus, is adjacent to all vertices in . Similarly, is adjacent to all
vertices in .
Claim 5.24. is not adjacent to all vertices in (2 + ).
Proof. Assume that is adjacent to 2 2 + . Let 1 be the
hamiltonian path of 1 joining to +, and let 2 be the hamiltonian
path of 2 joining + to 2 . Then, we have
+ +
= ( 2 2 1 ) is a hamiltonian cycle of , a
contradiction. Therefore, is not adjacent to all vertices in 2 + .
Moreover, if is adjacent to , by Claim 5.17, let W2 be the
hamiltonian path of 2 joining + to 2 . Then,
+ 2 +
= ( 2 1 ) is a hamiltonian cycle of , a
contradiction. Thus, is not adjacent to .
Claim 5.25. is adjacent to all vertices in 1 .
Proof. Assume to the contrary that is not adjacent to 1 1 . Let
, 2 2 + , then by Claim 5.18 and by Claim 5.24, the
vertex set , , 1 , 2 is independent, so + ( ) + (1 ) +
(2 ) 2. However, + 1, 1 + 2, (1 )
1 , (2 ) 2 . Therefore, + ( ) + (1 ) + (2 )
2 1 + 2 + + 3 = + 1 1. It implies that 1 + 1, a
contradiction.
Arguing similarly the proofs of Claim 5.24 and Claim 5.25, we have:
Claim 5.26. is not adjacent to all vertices in { } (1 + ).
Claim 5.27. is adjacent to all vertices in 2 .
Observe that , (), by Lemma 4.1 (b) we have:
Claim 5.28. ().

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 58


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Claim 5.29. ().
Proof. Assume to the contrary that (). Let 1 1 + ,
2 2 + . Then by Claims 5.18, 5.20. 5.21 and 5.22, the vertex
set , , 1 , 2 is independent, so + ( ) + (1 ) +
(2 ) 2. However, + 2, ( ) + 2, (1 )
1 , (2 ) 2 . Therefore, + ( ) + (1 ) + (2 )
2 + 1 + 2 + 4 = + . It implies that , a
contradiction.
By Claim 5.25, 1 = 1 + is complete. By Claim 5.27, 2 = 2 +
is complete. Moreover, by Claim 5.17, 1 , 2 3. By Claim 5.23
and Claim 5.29, 1 = + , is complete and 1 3. Conclude that
is shown in Figure 13, in which 1 , 1 , 2 are complete and
1 , 2 , 1 3.

Figure 13. Graph belongs to class .


Clearly, = 3 and belongs to class 3 .

Case 3.3. +2 , and +2 , .


Observe that +, 1 + and , + 2 {+}. Let be the
graph obtain from by adding new edges joining all pair of nonadjacent
vertices in the same set 1 , respectively in 2 (note that their degree sum is
greater than + 1). By Lemma 4.3, is hamiltonian if and only if is
hamiltonian.
We consider the graph . Let 1 be the hamiltonian path of 1 joining
to +, and let 2 be the hamiltonian path of 2 + joining to +.
Then, we have = ( + 2 + 1 + ) is a hamiltonian
cycle of , i.e is hamiltonian, therefore is hamiltonian, a
contradiction.
Thus, the Case 3.3 does not happen.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 59


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
6. ACKNOWLEDGEMENTS
This research was supported by Vietnam National Foundation for Science and
Technology Development grant 102.01-2012.29.

REFERENCES
[1] Alan, G. Algorithmic Graph Theory. Cambridge University Press. Published June 27th
1985.
[2] Bondy, J. A., and Chvtal, V. A Method in Graph Theory. Discrete Math. 15 (1976),
pp 111-135.
[3] Cuckler, B., and Kahn, J. Hamiltonian cycles in Dirac graphs. Combinatorica, 29,
(2009), pp 299-326.
[4] Diestel, R. Graph Theory. Springer. Third Edition (2005).
[5] Ferrara, M., Jacobson, M. S., and Harris, A. Cycle lengths in a Hamiltonian Graphs
with a pair of vertices having large degree sum. Graphs and Combinatorics, 26 (2010),
pp 215-223.
[6] Jung, H. A. On maximal circuits in finite graphs. Ann. Discrete Math., 3 (1978), pp.
129-144.
[7] Hoa, V. D., and Truong, N. H. X. Hamilton cycle of graphs 2 . Journal Of
Computer Science And Cybernetics., 28, No.2 (2012), pp 153-160.
3
[8] Hoa, V. D., and Truong, N. H. X. Hamiltonian in graphs 3 1. In Proceedings
2
of the 7th National Conference on Fundamental and Applied Information Technology
Research (FAIR7) (Thai Nguyen, Vietnam, June 19-20, 2014). Vietnam Academy of
Science and Technology Press, Hanoi, 2014, pp 60-67.
[9] Krivelevich, M., Lee, C., and Sudakov, B. Long paths and cycle in random subgraphs
of graphs with large minimum degree. Random Structure & Algorithm, volume 46,
Issue 2, pp. 320345, March 2015.
[10] Nishiyama, H., Kobayashi, Y., Yamauchi, Y., Kijima, S., & Yamashita, M. (2015).
The Parity Hamiltonian Cycle Problem. arXiv preprint arXiv:1501.06323.

This paper may be cited as:


Truong, N. H. X. and Hoa, V. D., 2015. Hamiltonian cycle in graphs 42n.
International Journal of Computer Science and Business Informatics,
Vol.15, No. 2, pp. 38-60.

ISSN: 1694-2108 | Vol. 15, No. 2. MARCH 2015 60

You might also like