Professional Documents
Culture Documents
26 18MI31032 Shubham Shubhadarshi
26 18MI31032 Shubham Shubhadarshi
26 18MI31032 Shubham Shubhadarshi
BY
SHUBHAM SHUBHADARSHI
(18MI31032)
Acknowledgment 3
Abstract 5
1. Introduction 6
2. Literature Review 8
3. Methods 9
3.1. Data Acquisition 9
3.2. Data Pre-processing 9
3.3. Data Processing 10
3.4. Association Rule Mining 12
5. Conclusion 16
6. References 17
Acknowledgment
I want to express our gratitude to Professor J. Maiti and Baneswar Sarker (Ph.D. Research
Scholar) from the Department of Industrial and Systems Engineering, IIT Kharagpur, for their
help with the assortment of information for finishing this project. I am highly thankful to the
Department of Industrial and Systems Engineering, IIT Kharagpur, for supporting this project to
successful completion.
A MACHINE LEARNING-BASED MODEL FOR DEVELOPMENT OF HAZARD
TRIANGLE USING ASSOCIATION RULE MINING
Abstract
Prior knowledge of the hazardous accidents that occurred at industrial facilities in the
past would be pretty helpful for management to make decisions to improve the safety of the
industrial workspace. In this project, an attempt is put forward to develop the related association
rules for understanding the relationship between components of the Hazard Triangle.
Furthermore, to find out the frequent patterns of the hazard triangle occurring in industrial areas.
A hazard is a combination of three components known as the hazard triangle; (i) Hazardous
Element (HE), (ii) Initiating Mechanisms (IM), and (iii) Target/Threat (T/T). More than ten
thousand accidents from industrial areas were analyzed, then the components of hazard triangles
were categorized into 21 groups using the unsupervised topic modeling algorithm Latent
Dirichlet Allocation (LDA). After that, three hundred ninety-seven unique hazard triangles are
developed; from that, 31 hazard triangles contribute to 50 percent of all accidents. Eighteen
significant association rules are extracted based on three criteria: support (S), confidence (C),
and lift (L). For example, the results show that accidents involving chemicals, metals, and
materials exposure lead to burns in the arm and elbow(S=2.1%, C=90.3%, L=2.32). Similarly,
accidents involving any fall, slip, or trip due to machinery or walkway mostly lead to fracture,
wound, and muscle injury (S=1.5%, C=87.4%, L=1.74). It is also found that accidents involving
workers exposed to equipment or machinery lead to thermal or electrical burns in the head, neck,
or trunk(S=2.3%, C=84.0%, L=2.16). The results of this project can be used for improvements in
safety protocol to minimize frequently occurring accidents.
1. Introduction
For any industry to thrive, its operations must be safe, reliable, and sustainable in the long
run. Hazards related to the industry need to be identified to assess associated risks to reduce the
risks to a tolerable level. According to the domino hypothesis (Heinrich, 1959), most industrial
accidents arise due to controllable hazardous conduct and conditions. According to Reason's
Swiss Cheese model, a hazard becomes an accident when a succession of events align, forming a
path from hazard to accident (Reason, 1990). The identification and assessment of such
pathways are critical for preventing accidents.
This hazards and risk analysis project aims to identify and assess hazards, event
sequences that lead to hazards, and the risk associated with hazardous events. Several ways are
available to identify and assess hazards, ranging from simple qualitative procedures to complex
quantitative ones.
According to (Clifton A. and Ericson II, 2005), a hazard is the combination of three
components known as the hazard triangle. The three components are as illustrated in Fig. 1: (i)
Hazardous Element (HE): The primary hazardous source that causes the hazard (e.g., a
hazardous energy source). (ii) Initiating Mechanisms (IM): The events that cause the hazard to
arise. (iii) Target/Threat (T/T): A person or object susceptible to harm, damage/mishap
outcomes, and the projected damage and loss. Hazard components, particularly IMs, may only
activate a hazard in a specified time sequence, resulting in accidents. A threat can be produced
by a human or an asset that can cause a hazardous scenario. A Hazard event is caused by the
interaction of several triggering events or persons, which might result in a potential accident. The
hazard and mishap relations are shown in Fig. 2.
Fig. 1. Hazard Triangle according to Hazard Theory
With the availability of extensive data and high-speed computer facilities, the demand for
data mining techniques in information-related applications for rational management decision-
making has skyrocketed. Data mining technologies include machine learning, cluster analysis,
regression analysis, and neural networks. A machine learning algorithm creates a set of models,
often in decision rules, to emphasize the most important links between the input features and the
decision. In cluster analysis, similar items are classified into one cluster, and dissimilar ones are
separated into another depending on specific features.
The Machine Learning model, Association Rule Mining, uses uncomplicated 'If Then'
statements to examine frequently occurring patterns in a dataset or identify intrinsic links
between independent and dependent variables. These guidelines are applicable for non-numeric,
categorical data generated just by counting. An association rule consists of two parts: an
antecedent (if) and a consequent (then) (Kaur, 2014; Meenakshi, 2014). An antecedent is a data
item found in the dataset, whereas a consequent is a data item seen in conjunction with the
antecedent. Thus, the 'If-Then' phrase has the form 'If condition Then conclusion.' These rules
are created by examining the dataset for the presence of frequent 'If-Then' patterns, and the most
relevant associations are evaluated afterward using the support, confidence, and lift criteria
(Gupta and Chauhan, 2013).
In this project, a Machine Learning model has been developed to identify frequent
Hazard Triangles in different industrial areas by analyzing 10351 accidents that happened in the
US between 2020-2021. The topic modeling algorithms are used to cluster similar accidents into
different categories. Then Association rule mining was applied to find frequently occurring
hazard triangles using the criteria of support, confidence, and lift.
2. Literature Review
Several statistical models have been developed to investigate the factors that lead to
hazardous events. Khanzode et al. (2012) conducted a generational assessment of accident-cause
ideas. Among the pioneering efforts in accident data analysis are (Cooper, 2000; Maher and
Summersgill, 1996). In recent years, academics have become more interested in studying
accident data utilizing data mining techniques and algorithms. (Arunraj et al., 2013; Cheng et al.,
2013, Verma et al., 2014).
Association rule mining aims to discover frequent patterns, intriguing correlations,
associations, or causal networks among groups of objects in transaction databases, relational
databases, or other data warehouses (Jaiswal and Agarwal, 2012). They are widely employed in
various fields, including communications networks, risk and market management, and inventory
control. (Bala et al., 2010; Adewole et al., 2014; Agarwal and Mittal, 2019; Chakraborty et al.,
2022).
The apriori algorithm is the favored approach for association rule mining (Agrawal and
Ramakrishnan, 1994). Aside from the apriori algorithm, other algorithms, techniques, and
approaches for mining association rules include the aprioriTid and aprioriHybrid algorithms
(Agrawal and Ramakrishnan, 1994), the Eclat algorithm (Zaki, 2000), the FP-Growth algorithm
(Han et al., 2000), the continuous association rule mining algorithm (CARMA) (Hidber, 1999).
Although there has been various research on association rule mining, the literature on safety data
analysis is limited.
3. Methods
3.1. Data Acquisition
The dataset used in this project is an open-source dataset containing the details of
accidents that happened in the industries of the US between January 2015 and February 2021.
More than sixty thousand accidents were present in the dataset. For this project, only the recent
accidents between January 2020 and February 2021 were considered, accounting for 10351
accidents. The dataset contains 25 features, from which four (SourceTitle, EventTitle,
NatureTitle, Part of Body Title) features are used to develop the Hazard Triangles of the
accidents. These features directly correspond to the components of the hazard triangle, i.e.,
SourceTitle corresponds to Hazardous Element, EventTitle corresponds to Initiating Mechanism,
NatureTitle corresponds to Threat, and Part of Body Title corresponds to Target.
After clustering the dataset's features, the values are substituted with the topic's title. Thus
obtained, a hazard triangle for each accident. In total, 10351 hazard triangles were formed. Every
hazard triangle is not unique. Three hundred ninety-seven unique hazard triangles are found. Of
these, 31 hazard triangles contribute to 50 percent of all accidents, and 105 hazard triangles
contribute to 80 percent of all accidents. A Pareto chart showing the cumulative percentage up to
80 percent of the hazard triangle is shown in Fig 3.
Fig 2. Pareto Chart, showing frequent One Hundred Five Hazard Triangles.
Association rules with higher lift (greater than one) values are more powerful and
intriguing. Eighteen association rules have three antecedents(HE, IM, TA). The rules with only
three(HE, IM, TA) antecedents were considered antecedents, and the consequent(TH) will form
a hazard triangle. The first important rule has L=2.32, with antecedents HE4, IM6, TA1, and
consequent TH3 (S=2.1%, C=90.3%), which signifies that the accident occurred when a person
got exposed to chemical, metal, or material resulting in thermal or electrical burn in the arm or
elbow. The second rule has L=1.74, with antecedents HE3, IM2, TA1, and consequent TH2
(S=1.5%, C=87.4%), which signifies that the fracture or wound in the arm or elbow happened
due to any type of fall, slip or trip due to any powered machinery or wooden object. The fourth
rule has L=2.16, with antecedents HE7, IM6, TA1, and consequent TH3 (S=1.7%, C=84.0%),
which signifies an arm or hand caught in appliances used for heating or cleaning, it generally
leads to thermal or electrical burns. Similarly, other association rules can also be explained in
words.
5. Conclusion
This project aimed to develop hazard triangles from past accidents and identify the
patterns in the accidents to upgrade safety measures to prevent or minimize such incidents in the
future. A structured methodology was used to build hazard triangles. The results indicate that out
of 397 hazard triangles, only 31 triangles contribute to 50 percent of all accidents, and 105
hazard triangles contribute to 80 percent of all accidents. Only two initiating mechanisms are
more frequent Falls, slips, or Trips and Exposed to Equipment or Machinery. They result in
mainly two types of Threats Fractures, Wound or Muscle injury, and Thermal or Electrical
burns. Nineteen association rules were found using the apriori algorithm of association rule
mining. These rules will help the management of industrial facilities to understand different
accidents and make better decisions concerning safety. The limitations of the study can be the
data used in the project. For better clustering of text, the length of the text should be more, but
the texts used for the clustering in this project are concise; some are even one word. Thus, the
clustering accuracy is not the best, but it is well within the range of average to good. In the
future, different topic modeling algorithms can cluster the data with better accuracy.
Furthermore, different variants of the apriori algorithm or other algorithms outlined in the
introductory section can be used to establish association rules for safety-related occurrences.
Their results can be compared to the apriori method. Nonetheless, the current project allows
learning from past accident experiences.
6. References
Adewole, K. S., Akintola, A. G., Ajiboye, A. R., Abdulsalam, K. S., 2014. Frequent pattern and
association rule mining from inventory database using apriori algorithm. African Journal
of Computing & ICT, 7(3), 35-41.
Agarwal, R., Mittal, M., 2019. Inventory classification using multi-level association rule mining.
International Journal of Decision Support System Technology, 11(2), 1-9.
Agrawal, R., Ramakrishnan, S., 1994. Fast algorithms for mining association rules. International
Conference on Very Large Data Bases, pp. 1–32.
Arunraj, N.S., Mandal, S., Maiti, J., 2013. Modeling uncertainty in risk assessment: An
integrated approach with fuzzy set theory and Monte Carlo simulation. Accident
Analysis and Prevention, 55, 242–255.
Bala, P. K., Sural, S., Banerjee, R. N., 2010. Association rule for purchase dependence in multi-
item inventory. Production Planning and Control, 21(3), 274-285.
Blei, D. M., Ng, A. Y., Jordan, M. I., 2003. Latent Dirichlet Allocation. Journal of Machine
Learning Research, 3, 993-1022,
Cheng, C.-W., Yao, H.-Q., Wu, T.-C., 2013. Applying data mining techniques to analyze the
causes of major occupational accidents in the petrochemical industry. Journal of Loss
Prevention in the Process Industry.
Chakraborty, S., Mallick, B., Chakraborty, S., 2022. Mining of association rules for the
treatment of dental diseases. Journal of Decision Analytics and Intelligent Computing,
2(1), 1-11.
Clifton, A., Ericson, II., 2005. Hazard Analysis Techniques for System Safety. John Wiley &
Sons, Hoboken, New Jersey, USA.
Cooper, M.D., 2000. Towards a model of safety culture. Safety Science, 36, 111–136.
Gupta, D., Chauhan, A. S., 2013. Mining association rules from infrequent itemsets: A survey.
International Journal of Innovative Research in Science, Engineering and Technology,
2(10), 5801-5808.
Han, J., Pei, J., Yin, Y., 2000. Frequent pattern tree: design and construction. SIGMOD ’00
Proceedings of the 2000 ACM SIGMOD International Conference on Management of
Data, Dallas, pp. 1–12.
Heinrich, H., 1959. Industrial Accident Prevention. McGraw-Hill, New York.
Hidber, C., 1999. Online association rule mining. SIFMOD Association for Computing
Machinery, Philadelphia, PA, pp. 145–156.
Jaiswal, V., Agarwal, J., 2012. The evolution of the association rules. International Journal of
Modeling and Optimization, 2(6), 726-729.
Kaur, G., 2014. Association rule mining: A survey. International Journal of Computer Science
and Information Technologies, 5(2), 2320-2324.
Khanzode, V.V., Maiti, J., Ray, P.K., 2012. Occupational injury and accident research: a
comprehensive review. Safety Science, 50 (5), 1355–1367.
Lee, C., Song, B., Park, Y., 2012. Design convergent product concepts based on functionality:
an association rule mining and decision tree approach. Expert Systems with
Applications, 39 (10), 9534–9542.
Maher, M.J., Summersgill, I., 1996. A comprehensive methodology for the fitting of predictive
accident models. Accident Analysis and Prevention, 28 (3), 281–296.
Meenakshi, R., 2014. A review on association rule mining. International Journal of Advance
Research in Science and Engineering, 3(5), 299-303.
Reason. J., 1990. Human error. BMJ, New York: Cambridge University Press, vol. 320, pp.
768–770.
Sadoyan, H., Zakarian, A., Mohanty, P., 2006. Data mining algorithm for manufacturing
process control. International Journal of Advanced Manufacturing Technology, 28, 342-
350.
Verma, A., Khan, S. D., Maiti, J., Krishna, O. B., 2014. Identifying patterns of safety-related
incidents in a steel plant using association rule mining of incident investigation reports.
Safety Science, 70. 89–98.
Zaki, Mohammed J., Parthasarthy, S., Ogihara, M., 1997. Parallel algorithms for discovery of
association rules. Data Mining and Knowledge Discovery, 373, 343–373.
Zhang, C., Zhang, S., 2002. Association Rule Mining: Models and Algorithms. Springer, New
York.