Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Data Mining Techniques

Process Mining
What is process mining?
Process mining applies data science to discover, validate
and improve workflows. By combining data mining and process
analytics, organizations can mine log data from their
information systems to understand the performance of their
processes, revealing bottlenecks and other areas of
improvement.
What is process mining?
Process mining leverages a data-driven approach to process
optimization, allowing managers to remain objective in their
decision-making around resource allocation for existing
processes.
Types of Process Mining
Discovery: Process discovery uses event log data to create a
process model without outside influence. Under this
classification, no previous process models would exist to
inform the development of a new process model. This type of
process mining is the most widely adopted.
Types of Process Mining
Conformance: Conformance checking confirms if the intended
process model is reflected in practice. This type of process
mining compares a process description to an existing process
model based on its event log data, identifying any
deviations from the intended model.
Types of Process Mining
Enhancement: This type of process mining has also been
referred to as extension, organizational mining, or
performance mining. In this class of process mining,
additional information is used to improve an existing
process model.
Importance of Process Mining
Process mining helps businesses reduce these costs by
quantifying the inefficiencies in their operational models,
allowing leaders to make objective decisions about resource
allocation.

The discovery of these bottlenecks can not only reduce costs


and expedite process improvement, but it can also drive more
innovation, quality, and better customer retention.
Importance of Process Mining
● Data Quality: Finding, merging and cleaning data is
usually required to enable process mining. Data might be
distributed over various data sources. It can also be
incomplete or contain different labels or levels of
granularity. Accounting for these differences will be
important to the information that a process model yields.
● Concept drift: Sometimes processes change as they are
being analyzed, resulting in concept drift.
Data Stream
Mining
Data Stream
Data Stream is a continuous, fast-changing, and ordered
chain of data transmitted at a very high speed. It is an
ordered sequence of information for a specific interval.

The sender’s data is transferred from the sender’s side and


immediately shows in data streaming at the receiver’s side.
Streaming does not mean downloading the data or storing the
information on storage devices.
Sources of Data Stream
● Internet traffic ● Satellite data

● Sensors data ● Audio listening

● Real-time ATM ● Watching videos


transaction
● Real-time surveillance
● Live event data systems

● Call records ● Online transactions


What are Data Streams in Data Mining?
Data Streams in Data Mining is
extracting knowledge and valuable
insights from a continuous stream
of data using stream processing
software.
Data Streams in Data Mining can
be considered a subset of general
concepts of machine learning,
knowledge extraction, and data
mining.
Characteristics of Data Stream in Data Mining
Continuous Stream of Data: The data stream is an infinite continuous
stream resulting in big data. In data streaming, multiple data streams
are passed simultaneously.

Time Sensitive: Data Streams are time-sensitive, and elements of data


streams carry timestamps with them. After a particular time, the data
stream loses its significance and is relevant for a certain period.

Data Volatility: No data is stored in data streaming as It is volatile.


Once the data mining and analysis are done, information is summarized
or discarded.

Concept Drifting: Data Streams are very unpredictable. The data changes
or evolves with time, as in this dynamic world, nothing is constant.
Data Streams in Data Mining Techniques
Data Streams in Data Mining
techniques are implemented to
extract patterns and insights
from a data stream. A vast
range of algorithms is
available for stream mining.
There are four main
algorithms used for Data
Streams in Data Mining
techniques.
Classification & Regression Algorithms
● Lazy Classifier or k-Nearest Neighbor
● Naive Bayes
● Decision Trees
● Logistic Regression / Linear Regression
● Ensembles
Clustering Algorithms
● K-means Clustering
● Hierarchical Clustering
● Density-based Clustering
Frequent Pattern Mining
● Apriori
● Eclat
● FP-growth
The End

You might also like