Professional Documents
Culture Documents
Abstract:: Key Words: Data Mining, Classification, Decision Tree, C4.5, Covid19 Dataset
Abstract:: Key Words: Data Mining, Classification, Decision Tree, C4.5, Covid19 Dataset
Abstract:: Key Words: Data Mining, Classification, Decision Tree, C4.5, Covid19 Dataset
Data mining is a process of extracting unknown or hidden knowledge from the existing data.
This is mainly used for predicting the future based on the past data. Classification in data mining
is a common technique that separates data instances into different classes. It allows you to
organize data sets of all sorts, including complex and large datasets as well as small and simple
one. Decision tree is a decision support tool that uses a tree like model of decision and their
possible consequences. It is one way to display an algorithm that only contains conditional
control statements. In this paper, we are discussing about the pandemic covid-19 related dataset
of 1,81,884 instances with 9 attributes. The data is tested to extract the important rules about who
had likely chance to get covid-19 positive. This paper includes one of the algorithm of decision
tree known as C4.5. The experimental results are
Key words: Data mining, Classification, decision tree, C4.5, Covid19 dataset.
1. Introduction :
Decision Tree is a tree structure which includes root, branches and leaves. The same structure is
followed in Decision Tree model construction. It contains root node, branches, and leaf nodes.
Testing an attribute is on every internal node, the outcome of the test is on branch and class label
as a result is on leaf node. A root node is parent of all nodes and as the name suggests it is the
topmost node in Tree. A decision tree is a tree where each node shows a feature (attribute), each
link (branch) shows a decision (rule) and each leaf shows an outcome (class)
Fig.1 Example of Decision Tree on what to do when different situations occur in weather.
The above figure 1 represents the decision tree model constructed using weather dataset on C4.5
algorithm. This tree outcome of test is humidity and windy in above picture because they tested
and take decision according to outcome. The label result yes, no as leaf nodes.
2. Related work :
Decision Tree is similar to the human decision-making process and so that it is easy to
understand. It can solve in both situations whether one has discrete or continuous data as input.
The example of Decision Tree is as shown in Fig.1.
Talking about the characteristics of Decision Tree, the C4.5 algorithm is simulated on WEKA
tool and the data type of data set is only categorical. C4.5 can take continuous data set as input
for simulation purpose
3. Dataset used :
4.
5. Research methodology :
2 peragraph
6. Results :
Relation: covid_19
Instances: 181884
Attributes: 9
cough
fever
sore_throat
shortness_of_breath
head_ache
age_60_and_above
gender
test_indication
corona_result
------------------
test_indication = Other
| head_ache <= 0
| | shortness_of_breath <= 0
| | | sore_throat > 0
| | shortness_of_breath > 0
| | | age_60_and_above = None: negative (5.0/1.0)
| head_ache > 0
| | age_60_and_above = None
| | | sore_throat <= 0
test_indication = Abroad
| head_ache <= 0
| | sore_throat > 0
| head_ache > 0
| | age_60_and_above = Yes
test_indication = Contact_with_confirmed
| head_ache <= 0
| | cough <= 0
| | cough > 0
| | | age_60_and_above = None
| | | age_60_and_above = No
| | | | sore_throat <= 0
| | | | | shortness_of_breath <= 0
| | | | | | fever > 0
Number of Leaves : 34
Size of the tree : 57
Analysing: IR_precision
Datasets: 1
Resultsets: 1
Sorted by: -
----------------------------------------
----------------------------------------
(v/ /*) |
Analysing: IR_recall
Datasets: 1
Resultsets: 1
Sorted by: -
----------------------------------------
----------------------------------------
(v/ /*) |
Analysing: F_measure
Datasets: 1
Resultsets: 1
Sorted by: -
----------------------------------------
----------------------------------------
Analysing: Area_under_ROC
Datasets: 1
Resultsets: 1
Sorted by: -
----------------------------------------
----------------------------------------Analysing: True_positive_rate
Datasets: 1
Resultsets: 1
Sorted by: -
----------------------------------------
----------------------------------------
(v/ /*) |
Analysing: True_negative_rate
Datasets: 1
Resultsets: 1
Sorted by: -
----------------------------------------
----------------------------------------
(v/ /*) |
Analysing: False_negative_rate
Datasets: 1
Resultsets: 1
Sorted by: -
----------------------------------------
Analysing: False_positive_rate
Datasets: 1
Resultsets: 1
Sorted by: -
----------------------------------------
(v/ /*) |
ghpo[tr[p[ptyuj[]ty
7. Discussion :
2 pera
8. Conclusion :
In this paper, we are taken the real world available data in online. By this data we are extracted
34 conditions to get covid-19 positive or negative from 181884 instances. We are taken the
attributes as corona symptoms like cough, fever, sore-throat, shortness-of-breath, head-ache, age-
60 and above, gender, test-indication and corona-result. Here corona-result is taken as class.
Because we taken real world data the outcome is (i.e..,34 conditions) are very much useful to say
that person has some percentage chance to get corona positive or negative.
References : tghpo[tr[p[ptyuj[]ty