Download as pdf or txt
Download as pdf or txt
You are on page 1of 81

1. What is a business process?

What are the important components of a


business process?
A1
A business process as a collection of inter-related ​events​​, ​activities​​, and ​decision points
that involve a number of ​actors and ​objects​​, which collectively lead to an ​outcome that is
of ​value​​ to at least one customer.
Events: Events are the conditions which must exist for the process to be performed. It is
something that happens as opposed to something that is done on purpose. It can think of
as the effect which occurs after sufficient cause is provided. Each process starts and ends
with an event.

Tasks: A task is the smallest unit into which the activity can be broken down. Breaking it
down is not feasible for the purpose at hand. The business process describes the different
activities as well as the interrelationship between them. It is important to note that
inter-relationships are more important than the tasks. In any structure, the whole is
greater than the sum of its parts. While conducting BPM exercises, one must therefore
have a synergistic view.

The number one problem with BPM today is that most of the practitioners are unable to
understand the system viewpoint. The employees of the same business have conflicting
objectives. Therefore a human resource professional may end up optimizing their process,
but may have an adverse effect on the functioning of a marketing department. Thus
problems are merely being shifted than actually being resolved. A good understanding of
how the process connects to other activities and processes will help solve this problem
and achieve sustainable progress.

Decisions: There might be certain decisions which may have to be taken as the part of a
process. Leaving the decisions up to the people involved has undesirable consequences. It
is likely that in the absence of clear guidelines, the decisions taken by different people will
be different. This will create inconsistent experiences for the customers and bring down
quality.

As an example consider a leave granting process in any big organization. There are explicit
rules which define the number of leaves that a person can take as well as the procedure
to get them approved. Thus although it may look like the manager is taking the decisions
with regards to granting leave, all they are doing is following a pre-defined procedure.
Thus no matter who the manager is, the decisions will always remain consistent because
they are taken on the basis of rules rather than on the basis of who is involved. Such rules
are usually laid down as if, then and else conditions in the process.

Inputs: Until gives inputs, a process cannot function. The correct inputs are like the
correct food for the process. Just like eating unhealthy food makes us unhealthy, giving
wrong inputs makes the process unhealthy and inefficient. Here are some common inputs
required by a process.

People: Processes require people with the correct aptitude and attitude. This is why
breaking down of tasks is so important. In a process driven organization, you can arrange
for an unskilled person to do the mundane jobs while a skilled person can be deployed to
do the important jobs. Matching skills with task requirements brings down costs and
increases efficiency.

Raw Material: Raw materials need to be made available in a timely manner and at least
costs. There are companies which have built procurement processes as their core
competencies.

Information: The correct information needs to be made available to all the entities in the
process. The worker must have the skill and must be well versed with the procedure. The
manager must get continuous feedback to ensure that the production is on target and as
planned.

Outputs: The outputs from the process must be continuously monitored. This will help in
measuring the effectiveness and efficiency of the process and suggesting changes as and
when required.

2. What is BPM ?
A2
Business Process Management (BPM) is the art and science of overseeing how work is
performed in an organization to ensure consistent outcomes and to take advantage of
improvement opportunities.

• The term “improvement” may take different meanings depending on the objectives of the
organization, e.g.

– reducing costs,
– reducing execution times, and
– reducing error rates, but also
– gaining competitive advantage through innovation.

• Improvement initiatives are one-off or continuous; incremental or radical.

• BPM is about managing processes, i.e. entire chains of events, activities, and decisions
3. How is BPM useful for organizations?
A3

1. Improved Business Agility

It has always been a necessity to make modifications to an organization’s best practices in


order to stay abreast with the changing conditions in the market. An efficient BPM permits
the business owner to make pauses in its business processes, implement changes and
re-execute it. With this, process will have the unique ability to stay on track and implement
changes or redefine the tasks of its process users. The end result is a higher level of
adaptability to unstable situations.

Greater control and agility allows organizations to alter workflows and re-use or customize
them as necessary. Through this, business processes become more responsive through the
structure that entails precise documentation of the steps involved in a certain process. The
defined knowledge allows organizations to comprehend the possible impact of change on
business processes. An organization that has knowledge of the effects of process
modifications is more open to options that could improve profitability.

2. Reduced Costs and Higher Revenues

Implementing the right BPM suite in an organization can trim down the costs associated to
business process execution. More enhanced processes and productivity of the workforce
makes it possible. Hence, employing the right BPM in the organization can significantly
deliver positive results.

The decline in operational costs, post-BPM deployment, may not be visible right away, but
eradicating bottlenecks would cause remarkable improvements. For instance, this could
reduce lead time that can have a positive effect on how the organization sells the products.
This may also mean that consumers will have more access to the services and products in
unity to their needs within the shortest time possible. Thus, organizations will have more
market demand, which is followed by more elevated sales and improvement in terms of
revenue.

3. Higher Efficiency

Deployment of BPM enhances the efficiency of business processes tremendously. This


potential is brought by the integration of organization processes from start to finish.
Process owners are automatically alerted every time they hand out responsibilities to its
individual members. This leads to more proficient monitoring of delays or reallocating
tasks among the members. Therefore, BPM aids in eliminating bottlenecks and reducing
lead time in terms of implementing and enhancing business processes.
4. Better Visibility

Essentially, BPM makes use of refined software programs in order to make process
automation possible. These programs allow process owners to keep track of performance
and see how the business processes function in terms of real time. The automation of
processes discloses how processes are working without the need of extensive labor and
monitoring techniques. Enhanced transparency allows management gain a better
understanding of their processes. These things allow the management to modify structures
and processes efficiently while keeping track of outcomes.

5. Compliance, Safety and Security

Reliable BPM practices assist organizations informed of their duties. These can be financial
reports, labor laws compliance and a wide range of government rules that organizations
should follow. A comprehensive BPM guarantees that organizations comply with the
standards and stay up to date with the laws.

Overall, organizations that utilize BPM principles discover that they have the capacity to
reduce cost and enhance productivity by simply identifying how processes would work
under the best conditions. This is also followed by implementing the necessary adjustments
to implement control and achieve the best performance that would aid in tracking future
outcomes. With all these, there is no wonder why BPM is making a buzz in the giant world
of business and marketing.

4. Which disciplines are closely related to BPM?

A4
Business process management is a discipline, not a technology.

The discipline is grounded in Lean thinking, continuous improvement, total quality management,
and six sigma.

Business processes constantly change, even though many are unaware of it. For example, the
products change, the marketing approach changes, suppliers change, the business strategy
changes, the organizational structure changes, and if that isn’t enough, the competition and
regulatory bodies force change. Or the company gets bought or acquires another company.

BPM projects should focus on how to manage and capitalize upon constant, unrelenting change.

BPM projects should also help the business person by providing all the information he/she needs
within the context of a process, and all the collaboration tools needed within the context of the
process.
BPM projects should be led by business, not IT, to avoid failure.

Business-led BPM projects can still fail, so best practices matter.

Process governance is an important part of BPM initiatives.

Process governance and data governance initiatives should be aligned.

The best place to start with a BPM initiative is in cross-functional processes with high volumes
of manual work that create a lot of pain in the organization.

5. Explain in detail different phases of the BPM Life cycle.


A5
The steps in a BPM Life Cycle are:
– Model
– Implement
– Execute
– Monitor
– Optimize

Model
Capture the business processes at a high level.
Gather just enough detail to understand conceptually how the process works.
Concentrate on ensuring the high level detail is correct without being distracted by the detail of
how it’s going to be implemented.
Historically carried out by business analysts, but simple-to-use technologies such as SeQuence
are allowing the business manager to undertake this task, as this is typically where the in-depth
knowledge required to model the process lies.

Implement
Extend the model to capture more detail required to execute the process, e.g.
– Recipients
– Form controls and layout
– Email message content
– System integration

Execute
Instances of the process are launched and interacted with by the end users.

Monitor
Measure key performance indicators and process performance.
View these vs. SLAs via graphical dashboards and textual reports to monitor how the process is
performing.
Understand where the bottlenecks/inefficiencies in the process are.
Optimize
Improve the business process and performance against SLAs by reducing the
bottlenecks/inefficiencies identified during monitoring.
Simulate these changes using “what-if” simulation.
Determine which changes will deliver the maximum benefit.
Fine tune the process.

6. Who are the stakeholders in BPM life cycle.


A6
Chief Process Officer

This person will be accountable for business process management within the organisation,
standardising and streamlining business processes. They will own the bpm method, BPM
lifecycle, plans and strategy. They will be responsible for ensuring that bpm is embedded in the
management philosophy.

Business Engineer

These are the subject matter experts for their departments or area. They are not necessarily
technical, but they will know about the strategy for their divisions, the alignment to the overall
business strategy and goals. You could also think of these stakeholders as the senior managers
who will feed information into business process modelling for their specialisms, for example
heads of HR, finance, IT, sales and so on.

Process Designer

Process Designers are the skilled individuals who are responsible for producing the business
process models. They will work with the Business Engineers to research, observe and document
the business processes. Designers will use Business Process Modelling Notation (BPMN) to
model the processes.

Process Participant

These are the frontline or end users of the business processes. They will input to the process
modelling by explaining their activities, hand-offs and dependencies within their processes.
Knowledge Worker Knowledge workers are also process participants, but they use software to
perform activities within a process for example, invoicing or payroll. They will have a detailed
knowledge of the steps followed with the software applications used.

Process Owner

Each process should have an owner who is responsible for managing the process and identifying
inefficiencies and improvements during the modelling and optimisation stages of the BPM
lifecycle. They work closely with the process participants and process designers.
System Architect

System architects are responsible for developing or configuring the business process
management systems (bpms). See an example architecture diagram for a BPMS. Developers
During BPM new software solutions may be needed or existing solutions may be integrated with
other solutions or customised to improve a business process.

7.How do Process-Aware systems differ from Non-process aware systems?


A7
As information systems age they become legacy information systems (LISs), embedding
business knowledge not present in other artefacts. LISs must be modernised when their
maintainability falls below acceptable limits but the embedded business knowledge is valuable
information that must be preserved to align the modernised versions of LISs with organisations'
real-world business processes. Business process mining permits the discovery and preservation
of all meaningful embedded business knowledge by using event logs, which represent the
business activities executed by an information system. Event logs can be easily obtained through
the execution of process-aware information systems (PAISs).

However, several non-process-aware information systems also implicitly support organisations'


business processes. This article presents a technique for obtaining event logs from traditional
information systems (without any in-built logging functionality) by statically analysing and
modifying LISs. The technique allows the modified systems to dynamically record event logs.
The approach is validated with a case study involving a healthcare information system used in
Austrian hospitals, which shows the technique obtains event logs that effectively and efficiently
enable the discovery of embedded business processes. This implies the techniques provided
within the process mining field, which are based on event logs, may also be applied to traditional
information systems.

8.What are the advantages of using Process-aware systems ?


A8
Process awareness is an important property for information systems and the shift from
task-driven to PAISs brings a number of advantages : –

The use of explicit process models provides a means for communication between people.

– Systems driven by models rather than code have less problems dealing with change, i.e., if an
information system is driven by process models, only the models need to be changed to support
evolving or emerging business processes.

– The explicit representation of the processes supported by an organization allows their


automated enactment. This may lead to a better performance.
– The explicit representation of processes enables management support at the (re)design level,
i.e., explicit process models support (re)design efforts.

– The explicit representation of processes also enables management support at the control level.
Generic process monitoring and mining facilities provide useful information about the process as
it unfolds. This information can be used to improve the control (or even design) of the process. A
detailed introduction PAISs is beyond the scope of this paper. However, to provide an overview
of the important issues, we summarize the classification given in.

9.What are events? Discuss various event types in BPMN.


A9
BPMN events

Events, represented with circles, describe something that happens during the course of a process.
There are three main events within business process modeling: start events, intermediate events,
and end events. These three types are also defined as either catching events (which react to a
trigger) or throwing events (which the process triggers). Once you're familiar with BPMN, jump
right into ​Lucidchart's intuitive editor​ to start making diagrams.

Start events
Each process must begin with an initiating event, called the start event. All start events catch
information (such as receiving an email), and you can add a line that proceeds from the start
events to continue the process. Many start events contain an icon in the middle to define the
event's trigger. For example, a start event that contains an envelope icon indicates that a message
arrives and triggers the start of the process.

In Lucidchart, you can easily add a start event from the BPMN 2.0 shape library in Lucidchart.
Once you drag shapes onto the canvas, you can click any shape to change its properties in the
advanced shape menu at the top of the editor.

This simple shape represents how a process begins by receiving an email. After the user receives
the email, the rest of the BPMN diagram may proceed.

Diagramming is quick and easy with Lucidchart. Start a free trial today to start creating and
collaborating.
Make a BPMN diagram

By registering I agree to Lucid Software's ​Terms of Service​ and ​Privacy Policy​.

Intermediate events

An intermediate event is any event that occurs between a start and an end event. The
intermediate event circle has a double line, and the event can catch or throw information.
Connecting objects indicate the directional flow, determining whether the event is catching or
throwing.

Lucidchart users can find event types for intermediate events by using the advanced shape menu
that appears when you add a new BPMN shape to the canvas.

The shape below shows a message received in the middle of a process. Notice that the event
circle has a double line around it and that the mail icon is not filled in, indicating that it is a
catching shape.
The following shape is similar to the previous example, except it is throwing a message, not
catching one. Simply put, the message is sent as a step in the process, instead of a message being
received.
End events

Finally, end events are styled with a single thick black line. End events are always thrown
because there is no process to catch after the final event.

In the BPMN example below, the process is completed when a final message is thrown. After
processing some system, it is likely that you will need to notify someone, so it's common to
include a thrown message to end your flow.
10.What are different types of gateways? Explain with an example of a
process model.
A10

BPMN Gateways

Exclusive gateway

Exclusive gateways are represented with this shape:

An exclusive gateway evaluates the state of the business process and, based on the
condition, breaks the flow into one of the two or more mutually exclusive paths.
In the example below, an exclusive gateway requires that the mode of transportation be
evaluated. In this case, one light will be placed in the Old North Church if the British attack
by land, two if by sea.

Event-based gateway

Event-based gateways are represented with this shape:

An event-based gateway is similar to an exclusive gateway because both involve one path
in the flow. In the case of an event-based gateway, however, you evaluate which event has
occurred, not which condition has been met.
An example of an event-based gateway is the decision to hold fire until your soldiers can
see the whites of their enemies' eyes. In this process flow, if a certain amount of time
passes without the British coming, the soldiers will go home.

Parallel gateway

Parallel gateways are represented with this shape:

A parallel gateway is very different than the previous gateways because you don't evaluate
any condition or event. Instead, a parallel gateway is used to represent two concurrent tasks
in a business flow. It is the same as a fork in a ​UML activity diagram​.
In the example below, this business process uses a parallel gateway because the company is
having its cake and eating it too.
Diagramming is quick and easy with Lucidchart. Start a free trial today to start creating and
collaborating.
Event-based gateway

Parallel event-based gateways are represented with this shape:

As the name suggests, this gateway is similar to a parallel gateway. It allows for multiple
processes to happen at the same time, but unlike the parallel gateway, the processes depend
on specific events. You can think of a parallel event-based gateway as a non-exclusive,
event-based gateway where multiple events can trigger multiple processes, but the
processes are still dependent upon specific events.
Inclusive gateway

Inclusive gateways are represented with this shape:

An inclusive gateway breaks the process flow into one or more flows. The example below
shows an inclusive gateway that triggers different processes based on the way customers
responded to a product survey. If the customer is satisfied with A, they are added to the
Product A email list. If the customer is satisfied with B, they are added to the Product B
email list. And if the customer is not satisfied with A, they are sent a voucher.
Complex gateway

Complex gateways are represented with this shape:

As the name signifies, complex gateways are only used for the most complex flows in the
business process. They use words in place of symbols and, therefore, require more
descriptive text. Use the complex gateway if you need multiple gateways to describe the
business flow; otherwise, you should use a simpler gateway.

11.Discuss different perspectives of a business process modeling​.


A11
Four perspectives of process modeling

The fact that business proceses can be targeted on different application areas suggests that
business processes may be studied and analyzed from different viewpoints. (Giaglis et al.
1999).
In 1994, Curtis summarized the process modeling objectives and goals (Curtis et al. 1994):

z Faciliate human understanding and communication; z Support process improvement;

z Support process management;

z Automated guidance in performing process;

z Automated execution support;

However, the term of process modeling in this paper is not refered to all the process
modeling in information science field, but specifically to the business process modeling in
BPR. Because of that, we cannot take the goals and objectives offered by Curtis for
granted, unless we can integrate the characteristics of business process, and the objectives
of BPR into them. To be able to accommodate the aforementioned goals and objectives, a
model must be capable of delivering complete and correct information elements to its
users. To provide these information, cutis suggested, a modeling technique should be
capable of representing one or more of the following modeling perspectives (Cutis et al.
1994): z Functional perspective: represents what process elements (activity) are being
performed; z Behavioral perspective: represents when activities are performed, as well as
aspects of how they are performed through feedback loops, iteration, decision-making
conditions, entry and exit criteria, and so on. z Organizational perspective: represents
where and by whom activities are performed, the physical communication mechanisms
used for transfer of entities, and the physical media and locations used for storing entities. z
Informational perspective: represents the informational entities produced or manipulated by
a process and their relationships. These perspectives present different views of people who
observe the business process. Since different people will be involved in the process of
process modeling, it is important to catch different modeling perspectives in order to
support communication, strengthen understanding and coordinate co-work.

12.What are swimlanes and pools? What is the significance of using them?
A12

In ​BPMN​, ​swimlane is divided into types, ​pool and lane​. A pool represents a participant who
takes part in a process. It is a rectangular container that can contain ​flow objects vertically or
horizontally, such as task and activity. On the other hand, a lane is a graphical sub-division in a
pool. It is often used to organize and categorize activities within a pool according to function or
role.
Pool and lane

Pool and lane sample

A pool is usually used to represent a process in an organization, a lane, on the other hand,
represents an activity of a department within that organization. By using pool and lane, you can
identify how a process is done and which department performs that activity.

Defining black box pool

A pool can be shown either as a “white box” with all details exposed, or as a “black box” with all
details hidden. When a pool presents as an empty box with no flow objects nor lanes, it is
regarded as a “black box”. Since a black box pool merely represents a role, you can neither
create flow objects in it, nor associate sequence flow with it, but you can attach message flows to
its boundary.
Black box pool sample

Create a ​business process diagram beforehand. To create a pool, select either Horizontal Pool or
Vertical Pool from diagram toolbar and click it on the business process diagram.

Select pool from diagram toolbar

You can present an organization as a ‘black box’ pool in a business process diagram when the
workflow of that organization isn’t the main concern in the issue or the workflow of that
organization doesn’t relate to your process. That’s because drawing message flows to and from
the black box pool can fully present the interaction between two organizations.
To define a black box pool, right click on an empty pool and select Black Box from the pop-up
menu.

Set a pool as black box

Creating nested lanes

Since a lane represents a sub-partition of a pool, it is possible to create lanes within a lane to
form a hierarchical structure. These hierarchical lanes are called ​nested lanes​.
In the following figure, you see the pool ​IT Consulting is divided into two lanes: ​Accounting and
Customer Service,​ both representing departments within the same company. ​Customer Service
can further be classified as two nested lanes named ​Hotline Support​ and ​On-site Support​.
Nested lane sample

Create a pool in advance. To create a lane, select Lane from the diagram toolbar and click it
inside the pool. Rename the pool as ​Account​.
Create a lane

To create a nested lane, right click on a lane and select Add Child Lane from the pop-up ​menu. 

 
Add a child lane

On the other hand, you can make a lane to become a nested lane of an existing lane. Right click
on a lane you want it to be nested lane, for example, ​Hotline Support and select a parent lane for
it, for example, ​Customer Service.​ Select Set Parent Lane > Customer Support from the pop-up
menu.
Set a parent lane

As a result, ​Hotline Support​ becomes a nested lane under ​Customer Service.​

13.Discuss different artifacts to data storage and retrieval in BPM.


A13

In BPMN, Artifacts, Data Objects and special connectors called Associations are used to specify
information, which is not related to the flow of the process. These elements are not executable
and serve for readability and analysis of business processes.

Artifacts provide a mechanism for adding descriptive information about the process. The two
typical Artifacts are Group and Annotation (in BPMN 1.2 Data Objects also belonged to
Artifacts, but in 2.0 they are a separate category). Yet, developers of BPM systems can add as
many Artifacts as needed.

Let’s take a look at how the Group element is used in business process modeling. In graphic
form, it is represented as a rounded rectangle with a dot-and-dash line as border. It surrounds a
group of flow objects but has no influence on process performance.

Fig. 25. Group graphical representation

Groups highlight or categorize flow objects and do not affect the flow of the process. Activities
can be highlighted to show that they are related, and can be categorized, for example, for
analysis purposes or document preparation.
Fig. 26. Group element used in a process

In our example process, “Complete paperwork” (​Lesson 4 of our BPMN tutorial​), a Group unites
the tasks aimed at preparing and signing the job offer. As you can see, such highlighting of
activities shows that the tasks are related, but it in no way affects the process flow.

A Group is not an activity like a Task or Sub-Process, nor is it an element of the sequence flow
like a Gateway. Therefore, you cannot connect a Group to a sequence flow or message flow.
However, since Groups are not limited by the pool and swimlane constraints, they can highlight
activities that belong to different pools, which is widely used to show relations of the B2B type.

The next BPMN element that we would like to review is the Text Annotation. Text Annotations
allow the modeler to add descriptive information or notes to the diagram. You can include any
information that could be important to the end user, for example, describe how an element is
used, and add comments, explanations and so on.
All this contributes to the diagram’s informational content, and makes it easy for the business
user to understand the process.

Graphically, an Annotation is an open box with text placed on either of its side.

Fig. 27. Text Annotation

You can connect aText Annotation to a certain element by means of an Association, without
affecting the process flow.
Fig.28. Text Annotation in process diagram

In our example process, “Complete paperwork”, the Text Annotation specifies the exact actions
that the accountant has to take when performing the “Open account” task.

The next element, Association, is a connector that creates a relationship between a piece of
information and an artifact or a flow element (such as an event, a task or a gateway). If a text or a
graphical object do not belong to the process flow, you can still link them to the flow’s objects
(see Fig.29). Associations are usually used to link a Text Annotation or a Data Object to an
element of the process flow.

Graphically Associations are represented as a dotted line.

Fig.29. Graphical representation of Association


If needed, an Association can show the direction of a flow, for example, of Data flow. In this
case, an arrowhead is added to the dotted line.

Fig.30. Graphical representation of Association demonstrating direction

When modeling business processes, it is important to assure any data used during the process can
be collected and managed.

BPMN offers specialized elements that allow you to store and transmit process components
during the process’s execution: Data Objects and Data Storage. Usually these elements are tied
to the performance of Activities.

Graphically, a Data Object is represented by a document shape with one corner bent over.

Fig.31. Graphical representation of a Data Object

Data Objects show the inputs and outputs of Activities, and do not affect the process flow. A
Data Object is tied to the context of the process, so in the diagram it is shown within a process or
a sub-process. Data Objects exist only between the process’s start and end. If a process instance
is cancelled, all of its Data Object instances become inactive and, therefore, inaccessible to any
external processes.

Also, BPMN 2.0 (unlike the previous version) introduces the Data Storage element, which
allows storing information even after the process instance has been completed.

Graphically, it is represented in the following way:

Fig. 32. Graphical representation of Data Storage


Fig.33. Associations in process diagram

Figure 34. features the Employee Recruitment process, where Data Objects and Associations are
used.

In our process, Data Objects either show the outputs of process activities (request for a new
employee) or are used in task execution (request for a new hire, candidate database). The request
is a simple Data Object, while the candidate database is represented by Data Storage. In BPMN,
Data Storage allows for interaction between different processes, which is impossible with simple

They can only be used within one process.


14.Differentiate between UML & BPMN.
A14

UML Activity BPMN Business Process See also

The starting point is defined The starting point is defined by a Start Initial
Node​Start
by an Initial Node. No Event. This implies a specific cause for
Event
method of specifying why the the Activity to start, although it could be
Activity was started is unspecified.
available.

The basic behavior unit in an The basic behavior unit in an Activity is UML
Action​BP
Activity is the Action the Activity element. A number of
MN
element. UML provides different Task Types are available. These Activity
many different forms of typically describe different methods of
Actions, although the execution (for example Manual) as
simulation makes use of a opposed to what happens.
small subset of these.

A Control Flow is used to A Sequence Flow is used to connect the Control


Flow​Seque
connect the elements on an elements on a Business Process diagram.
nce Flow
Activity diagram. A These differ from ​UML Activity
distinguishing feature is that diagrams in that all valid sequence flows
only a single Control Flow are taken by default. To restrict flow on a
can be followed from any Sequence Flow set the conditionType
node, except for an explicit Tagged Value​​ to 'Expression' and create
Fork Node. To restrict flow the script in the conditionExpression
on a Control Flow, add a Tagged Value.
Guard.
A Decision node is used to A Gateway node set to 'Exclusive' is used Decision​G
ateway
explicitly model a decision when a single path must be selected. It is
being made. A Merge node, also used to combine the potential flows
which uses the same syntax is again. A direction can be specified as
used when the potential flows 'Converging' or 'Diverging' to explicitly
are combined back into one. select between the two modes.

A Fork node is used to A Gateway node set to 'Parallel' is used to Fork/Join


Gateway
concurrently execute multiple explicitly model concurrent execution of
nodes, while a Join node, multiple nodes. It is also used to wait for
using the same syntax is used all incoming flows to become available
to wait for all incoming flows and leave with a single flow. A direction
to become available and can be specified as 'Converging' or
leave with a single flow. 'Diverging' to explicitly select between
the two modes.

There is no allowance for A Gateway node set to Inclusive is used Gateway


concurrently executing only to explicitly model the situation where all
some outputs from a node for outgoing flows with a true condition are
UML Activities. If you executed concurrently.
needed this you add later
Control Flows with the
appropriate Guards.

A Call Behavior Action is Activity elements are set as an UML


Action​BP
used when behavior needs to CallActivity Sub-Process when behavior
MN
be further decomposed by needs to be further decomposed by Activity
referring to an external referring to an external activity.
activity.

Activity Action Call Activity elements are set as an Embedded UML


Activity​B
Behavior Action. Sub-Process when behavior needs to be
PMN
Activity
further decomposed without referring to
an external activity.

15.How are exceptions handled in BPM? Explain.


A15

Exception handling in BPM


Exception handling is one of the prime aspects of software design which attributes to the
robustness of the application. Exception handling gains more significance in BPM space, since
exceptions had to be dealt both at a system level and at a business level. Bruce Silver has added
one more dimension to exception handling in BPM, in his ​article​, wherein exceptions had to be
dealt at the process modeling itself.

The significance of exception handling at a process modeling step cannot be ignored. There is a
need to handle ACID transactions in BPM, with a business solution and it has to be captured in
the model with business semantics. For instance, a classic example of flight, hotel and rental car
booking when deemed as a single atomic transaction cannot always be dealt at a system level. If
flight booking and hotel booking are two different applications in two different domains (if flight
carrier and hotel are different companies), then system transactions cannot be propagated and
even if it seems to be possible, system level locks for transaction may not be possible to be
obtained since the whole transaction may exceed beyond days to complete. The ideal way to deal
with this situation is to have three distinct transactions as business process and even if one fails
the other transactions must be canceled using a business process. If flight booking has been
confirmed and if hotel booking transaction fails, then flight booking should also be canceled,
wherein flight cancellation must be modeled as a separate business process.

It becomes imperative for a BPM solution to handle system level exceptions gracefully. Since
process instances can span across business days, in case of system exceptions the process state
has to be persisted and should be able to recover from the state where it was left. This may not be
possible always for each and every process instance. Sometimes there is a good chance that a
process instance may get into an irrecoverable state. In such scenarios, it may not be desirable for
the users to create a new process instance altogether. The alternatives for irrecoverable instances
due to system exceptions are few, and one probable solution could be to create a new process
instance and associate the application process data with the new instance, and programmatically
navigate process the instance to the workflow step where it failed.

Another alternative is to create an exception business process. Any system level exception
occurring in any of the application business process would trigger the exception business
process, which would notify the user about the exception. Also exception process can notify a
repair/admin team about the exception. Having a repair queue of exceptions can help the
repair/admin team to get first hand information about the exception without the need for
notification from the business users about what had happened. Also, irrecoverable process
instances can be dealt internally within the IT team, without burdening the business users.

In my opinion, a BPM solution should handle both business and system exceptions. Handling
every exception at a process model level would clutter the process diagram with fine grained
details defeating the very purpose of a business process diagram. The need for hour is a finer
distinction between business and system exceptions should be made even before a process is set
to get modeled. A business exception may not look like an exception at all from a BPMN
perspective, since most of the business exception would be handled by a new business process
altogether, instead of the intermediate handler event.

16.Explain different techniques of Qualitative Process Analysis.


A16
Different techniques of Qualitative Process Analysis.

Qualitative research is a method of inquiry that finds widespread employment in social sciences,
market research and other areas. Read on for some of the popular qualitative methods of data
analysis.

● Qualitative Analysis aims at securing an in depth understanding or the “why" and “how"
of human behavior and decision making, over “what," “when" and “where." It produces
specific information on the cases studied rather than general conclusions and is used to
gain support for research hypothesis.
● Case Study

● Case studies​ are the most popular qualitative methods of data analysis.
● A case study method focuses on the in depth study of a single, usually complex series of
events, that make up a case. The study focuses on all aspects and dimensions of the case
in question, aimed at illustrating viewpoints or theories rather than making comparisons.
Even when conducting a series of case studies, comparisons between two or more cases
would remain faulty, as each case study has a specific design which need not necessary
remain suitable for making comparisons.
● Action Research

● Another popular qualitative data analysis method is action research. As the name
suggests, action research is research through actions. It is a systematic and interactive
inquiry process that involves a three step cyclical process of and fact-finding or
examining the practices for improvements, planning for improvements, and
implementing new practices or taking action.
● Analytic Induction

● Analytic induction is the progressive redefinition of a concept by collecting data,


developing analysis, and organizing the findings to construct and testing causal links
between events and actions.
● The methodology of analytical induction is inspecting initial cases to identify common
factors and the seek explanation for existing linkages, and reworking the explanations
based on the findings from new cases. Success depends on testing cases with new
varieties of data to validate or revise established linkages, until negative cases cease to
exist.
● Ethnography

● Ethnography is the study of people in their natural settings to capture their ordinary and
normal activities. It focuses on capturing the values, ideas, and material practices
articulated by the subject.
● There is no rigid method or process for ethnography, and the tools include other
multi-method qualitative tools, such as:
1. Field Research ​- The observation of any normal every day event in the
environment where it occurs.
2. Sponsored Content
■ Learn Analytics from top Industry Experts with this PG Diploma
in Business Analytics &
Intelligence​​careersoftomorrow.amityonline.com
■ AMP-Led Workgroup Publishes Guidelines for Validating NGS
Bioinformatics Pipelines​​GenomeWeb
■ Chiropractors Baffled: Simple Stretch Relieves Years of Back
Pain (Watch)​​healthhacktips.com
3. Recommended by
4. Ethnomethodology​​ - Defining and interpreting everyday life through
people’s talk and interactions. A related branch is conversation analysis, or
fine-grained analysis of natural talk based interactions to construct patterns of
social order.
5. Discourse Analysis ​- Language and literature is a reflection of the world
around the writer, and discourse analysis is the study of the world, society,
events and psyche as represented in language and discourse. The forms of
discourse analysis include semiotics, deconstruction and narrative analysis.
6. Biographical Research ​- The analysis of a person’s written account or
narrative, usually the life history, trying to identify the epiphany or turning
point
7. Interviews​​ - Direct interview of the subject or people closely associated with
subject, on life history.
● Comparative Analysis

● Comparative analysis involves analyzing data from different settings or groups, but
belonging to the same point in time and/or the same settings, to identify similarities and
differences.
● Two qualitative methods of data analysis of a comparative analysis nature are matrix
analysis and constant comparison.
● Matrix analysis or logical analysis involves categorization and arranging collected data in
flow charts, tables, diagrams and other forms of representation to represent the cause and
process in a tabular, pictorial or graphical manner. This approach helps make
comparisons and in construction of hypothesis.
● Constant comparison is comparing new data with previously collected data coding the
same to develop theoretical categories.
● Frame Analysis

● Frame analysis is rooted in psychiatry and psychology, and explains social phenomena
through symbolic-interpretive constructs or frames that people adopt in their normal daily
lives. Examples of such frames include beliefs systems, social convictions, phobias,
norms and more. The frames people adopt depend largely on the society they live in.
● Framework analysis is the classification and organization of data based on key themes or
concepts, with matrix based subdivisions that illustrate connections between different
frames.
● Grounded Theory

● Grounded theory involves the simultaneous collection and analysis of data, usually
through observations. This approach develops the theory from the data collected rather
than trying to test whether the collected data fits any preconceived theory. The researcher
reads the text or data collected to identify theoretical and analytical codes. A central part
of this exercise is constant comparison, or checking to see if the new data remains
consistent with previously collected data.
● One approach to grounded theory is line by line coding, where the researcher codes each
line of the data collected instantly, such as accepts what the interviewee says point blank
without the chance of being influenced by preconceived notions that may creep up if the
analysis is left for a later stage.
● Interpretative Phenomenological Analysis

● Interpretative Phenomenological Analysis (IPA) is an approach wherein the researcher


understands the experiences of participants the way participants themselves understand,
rather than trying to analyze an objective record of the experience.

17.What is data mining?


A17

Data mining is the practice of automatically searching large stores of data to discover patterns
and trends that go beyond simple analysis. Data mining uses sophisticated mathematical
algorithms to segment the data and evaluate the probability of future events. Data mining is also
known as Knowledge Discovery in Data (KDD).
The key properties of data mining are:

● Automatic discovery of patterns


● Prediction of likely outcomes
● Creation of actionable information
● Focus on large data sets and databases

Data mining can answer questions that cannot be addressed through simple query and reporting
techniques.

Automatic Discovery
Data mining is accomplished by building models. A model uses an algorithm to act on a set of
data. The notion of automatic discovery refers to the execution of data mining models.
Data mining models can be used to mine the data on which they are built, but most types of
models are generalizable to new data. The process of applying a model to new data is known as
scoring​​.
Prediction
Many forms of data mining are predictive. For example, a model might predict income based on
education and other demographic factors. Predictions have an associated probability (How likely
is this prediction to be true?). Prediction probabilities are also known as ​confidence​​ (How
confident can I be of this prediction?).
Some forms of predictive data mining generate ​rules​​, which are conditions that imply a given
outcome. For example, a rule might specify that a person who has a bachelor's degree and lives
in a certain neighborhood is likely to have an income greater than the regional average. Rules
have an associated ​support​​ (What percentage of the population satisfies the rule?).

Grouping
Other forms of data mining identify natural groupings in the data. For example, a model might
identify the segment of the population that has an income within a specified range, that has a
good driving record, and that leases a new car on a yearly basis.

Actionable Information
Data mining can derive actionable information from large volumes of data. For example, a town
planner might use a model that predicts income based on demographics to develop a plan for
low-income housing. A car leasing agency might a use model that identifies customer segments
to design a promotion targeting high-value customers.

18.​Differentiate between OLAP/ data mining.


A18

Defining OLAP and data mining

OLAP is a design paradigm, a way to seek information out of the physical data store. OLAP is
all about summation. It aggregates information from multiple systems, and stores it in a
multi-dimensional format. These could be a star schema, snowflake schema or a hybrid kind of a
schema.

Data mines leverage information within and without the organization to aid in answering
business questions. They involve ratios and algorithms like decision trees, nearest neighbor
classification and mural networks, along with clustering of data.

Why are OLAP and data mining considered synonymous?

OLAP and data mining are considered the same​ due to the perception one holds of their function.
To add to the ambiguity, both the terms fall under the business intelligence (BI) umbrella.
Vendors also complicate the scenario when they offer ​data mining​ solutions at the database level.
Data mining was considered a skillfully built statistical solution, but as a result of mergers and
acquisitions, specialized tools are available for predictive purposes.

While OLAP was always prevalent, it is easy to build and use, therefore, extensively used.
Owing to the easy features and availability of data mines, the two terms began to be used
synonymously.

Functions of OLAP and data mining

● OLAP and data mining are used to solve different kinds of analytical problems​.
OLAP summarizes data and makes forecasts. For example, it answers operational
questions like “What are the average sales of cars, by region and by year?"
● Data mining discovers hidden patterns in data and operates at a detailed level instead
of a summary level. For instance, in a telecom industry where customer churn is a
key factor, Data mining would answer questions like, “Who is likely to shift service
providers and what are the reasons for that?”

OLAP and data mining can complement each other​. For instance, while OLAP pinpoints
problems with the sales of a product in a certain region, data mining could be used to gain insight
about the behavior of the individual customers. Similarly, after data mining predicts something
like a 5% increase in sales, OLAP could be used to track the net income.

Can OLAP and data mining exist independently?

Data mining is appropriate for an organization that wants a future perspective on things.
However, for an organization that simply wants to improve its operational efficiency, OLAP can
be used. Thus, ​OLAP and data mining can exist independently​. A lot of mid-sized companies do
not use data mining because it requires high-end skills. A data mine can be implemented only
when there is a need to address business queries. On the other hand, OLAP can be easily
employed to further the goals of any business that can be satiated by reporting and association of
the various variables.

Users for OLAP and data mining

The ​customers for OLAP and data mining vary​. In a typical organization, OLAP is used by the
regular front and back office employees. Predominantly, they would use it for an
organization-wide reporting or a small time analysis.

Data mining is used by business strategists. The strategists base their business moves on the
information thrown up by the data mine.
Inadequacies of OLAP and data mining

OLAP is a dimensional model, which can scale up and information can be diced and sliced for
interrogation. It is a kind of a BI cube, which is refreshed based on the source data on a periodic
basis. However, an OLAP solution lacks the capacity for predictive analysis.

A data mine is built for eternity, which is a shortcoming, as a model cannot be valid forever.
Some data mining tools also enable the retention of older models.

19.Explain different stages of knowledge discovery process.


A19

Some people don’t differentiate data mining from knowledge discovery while others view data
mining as an essential step in the process of knowledge discovery. Here is the list of steps
involved in the knowledge discovery process −

● Data Cleaning − In this step, the noise and inconsistent data is removed.

● Data Integration − In this step, multiple data sources are combined.

● Data Selection − In this step, data relevant to the analysis task are retrieved from the

database.

● Data Transformation − In this step, data is transformed or consolidated into forms

appropriate for mining by performing summary or aggregation operations.

● Data Mining − In this step, intelligent methods are applied in order to extract data

patterns.

● Pattern Evaluation − In this step, data patterns are evaluated.

● Knowledge Presentation − In this step, knowledge is represented.

The following diagram shows the process of knowledge discovery −


20.Explain the architecture of data mining system.

A20

Introduction

Data mining is a very important process where potentially useful and previously unknown
information is extracted from large volumes of data. There are a number of components involved
in the data mining process. These components constitute the architecture of a data mining
system.
Data Mining Architecture
The major components of any data mining system are data source, data warehouse server, data
mining engine, pattern evaluation module, graphical user interface and knowledge base.

a) Data Sources

Database, data warehouse, World Wide Web (WWW), text files and other documents are the
actual sources of data. You need large volumes of historical data for data mining to be
successful. Organizations usually store data in databases or data warehouses. Data warehouses
may contain one or more databases, text files, spreadsheets or other kinds of information
repositories. Sometimes, data may reside even in plain text files or spreadsheets. World Wide
Web or the Internet is another big source of data.

Different Processes
The data needs to be cleaned, integrated and selected before passing it to the database or data
warehouse server. As the data is from different sources and in different formats, it cannot be used
directly for the data mining process because the data might not be complete and reliable. So, first
data needs to be cleaned and integrated. Again, more data than required will be collected from
different data sources and only the data of interest needs to be selected and passed to the server.
These processes are not as simple as we think. A number of techniques may be performed on the
data as part of cleaning, integration and selection.
b) Database or Data Warehouse Server

The database or data warehouse server contains the actual data that is ready to be processed.
Hence, the server is responsible for retrieving the relevant data based on the data mining request
of the user.
c) Data Mining Engine

The data mining engine is the core component of any data mining system. It consists of a number
of modules for performing data mining tasks including association, classification,
characterization, clustering, prediction, time-series analysis etc.
d) Pattern Evaluation Modules

The pattern evaluation module is mainly responsible for the measure of interestingness of the
pattern by using a threshold value. It interacts with the data mining engine to focus the search
towards interesting patterns.
e) Graphical User Interface

The graphical user interface module communicates between the user and the data mining system.
This module helps the user use the system easily and efficiently without knowing the real
complexity behind the process. When the user specifies a query or a task, this module interacts
with the data mining system and displays the result in an easily understandable manner.
f) Knowledge Base

The knowledge base is helpful in the whole data mining process. It might be useful for guiding
the search or evaluating the interestingness of the result patterns. The knowledge base might
even contain user beliefs and data from user experiences that can be useful in the process of data
mining. The data mining engine might get inputs from the knowledge base to make the result
more accurate and reliable. The pattern evaluation module interacts with the knowledge base on
a regular basis to get inputs and also to update it.

21.Explain the advantages of Data mining.


A21

Advantages of Data Mining


Marketing / Retail
Data mining helps marketing companies build models based on historical data to predict who
will respond to the new marketing campaigns such as direct mail, online marketing
campaign…etc. Through the results, marketers will have an appropriate approach to selling
profitable products to targeted customers.
Data mining brings a lot of benefits to retail companies in the same way as marketing. Through
market basket analysis, a store can have an appropriate production arrangement in a way that
customers can buy frequent buying products together with pleasant. In addition, it also helps the
retail companies offer certain discounts for particular products that will attract more customers.
Finance / Banking
Data mining gives financial institutions information about loan information and credit reporting.
By building a model from historical customer’s data, the bank, and financial institution can
determine good and bad loans. In addition, data mining helps banks detect fraudulent credit card
transactions to protect credit card’s owner.
Manufacturing
By applying data mining in operational engineering data, manufacturers can detect faulty
equipment and determine optimal control parameters. For example, semiconductor
manufacturers have a challenge that even the conditions of manufacturing environments at
different wafer production plants are similar, the quality of wafer are a lot the same and some for
unknown reasons even has defects. Data mining has been applying to determine the ranges of
control parameters that lead to the production of the golden wafer. Then those optimal control
parameters are used to manufacture wafers with desired quality.
Governments
Data mining helps government agency by digging and analyzing records of the financial
transaction to build patterns that can detect money laundering or criminal activities.

22.What are different application areas of data mining


A22

Data mining is widely used in diverse areas. There are a number of commercial data mining
system available today and yet there are many challenges in this field. In this tutorial, we will
discuss the applications and the trend of data mining.
Data Mining Applications

Here is the list of areas where data mining is widely used −

● Financial Data Analysis

● Retail Industry

● Telecommunication Industry

● Biological Data Analysis

● Other Scientific Applications

● Intrusion Detection

Financial Data Analysis

The financial data in banking and financial industry is generally reliable and of high quality
which facilitates systematic data analysis and data mining. Some of the typical cases are as
follows −

● Design and construction of data warehouses for multidimensional data analysis and data

mining.

● Loan payment prediction and customer credit policy analysis.

● Classification and clustering of customers for targeted marketing.

● Detection of money laundering and other financial crimes.

Retail Industry

Data Mining has its great application in Retail Industry because it collects large amount of data
from on sales, customer purchasing history, goods transportation, consumption and services. It
is natural that the quantity of data collected will continue to expand rapidly because of the
increasing ease, availability and popularity of the web.

Data mining in retail industry helps in identifying customer buying patterns and trends that lead
to improved quality of customer service and good customer retention and satisfaction. Here is
the list of examples of data mining in the retail industry −
● Design and Construction of data warehouses based on the benefits of data mining.

● Multidimensional analysis of sales, customers, products, time and region.

● Analysis of effectiveness of sales campaigns.

● Customer Retention.

● Product recommendation and cross-referencing of items.

Telecommunication Industry

Today the telecommunication industry is one of the most emerging industries providing various
services such as fax, pager, cellular phone, internet messenger, images, e-mail, web data
transmission, etc. Due to the development of new computer and communication technologies,
the telecommunication industry is rapidly expanding. This is the reason why data mining is
become very important to help and understand the business.

Data mining in telecommunication industry helps in identifying the telecommunication patterns,


catch fraudulent activities, make better use of resource, and improve quality of service. Here is
the list of examples for which data mining improves telecommunication services −

● Multidimensional Analysis of Telecommunication data.

● Fraudulent pattern analysis.

● Identification of unusual patterns.

● Multidimensional association and sequential patterns analysis.

● Mobile Telecommunication services.

● Use of visualization tools in telecommunication data analysis.

Biological Data Analysis

In recent times, we have seen a tremendous growth in the field of biology such as genomics,
proteomics, functional Genomics and biomedical research. Biological data mining is a very
important part of Bioinformatics. Following are the aspects in which data mining contributes for
biological data analysis −
● Semantic integration of heterogeneous, distributed genomic and proteomic databases.

● Alignment, indexing, similarity search and comparative analysis multiple nucleotide

sequences.

● Discovery of structural patterns and analysis of genetic networks and protein pathways.

● Association and path analysis.

● Visualization tools in genetic data analysis.

Other Scientific Applications

The applications discussed above tend to handle relatively small and homogeneous data sets for
which the statistical techniques are appropriate. Huge amount of data have been collected from
scientific domains such as geosciences, astronomy, etc. A large amount of data sets is being
generated because of the fast numerical simulations in various fields such as climate and
ecosystem modeling, chemical engineering, fluid dynamics, etc. Following are the applications
of data mining in the field of Scientific Applications −

● Data Warehouses and data preprocessing.

● Graph-based mining.

● Visualization and domain specific knowledge.

Intrusion Detection

Intrusion refers to any kind of action that threatens integrity, confidentiality, or the availability
of network resources. In this world of connectivity, security has become the major issue. With
increased usage of internet and availability of the tools and tricks for intruding and attacking
network prompted intrusion detection to become a critical component of network
administration. Here is the list of areas in which data mining technology may be applied for
intrusion detection −

● Development of data mining algorithm for intrusion detection.


● Association and correlation analysis, aggregation to help select and build discriminating

attributes.

● Analysis of Stream data.

● Distributed data mining.

● Visualization and query tools.

23.What is association mining/market basket analysis?


A23

A Gentle Introduction on Market Basket Analysis  —  Association Rules

Source: UofT
Introduction
Market Basket Analysis is one of the key techniques used by large retailers to uncover
associations between items. It works by looking for combinations of items that occur together
frequently in transactions. To put it another way, it allows retailers to identify relationships
between the items that people buy.

Association Rules are widely used to analyze retail basket or transaction data, and are intended to
identify strong rules discovered in transaction data using measures of interestingness, based on
the concept of strong rules.

An example of Association Rules

● Assume there are 100 customers


● 10 of them bought milk, 8 bought butter and 6 bought both of them.
● bought milk => bought butter
● support = P(Milk & Butter) = 6/100 = 0.06
● confidence = support/P(Butter) = 0.06/0.08 = 0.75
● lift = confidence/P(Milk) = 0.75/0.10 = 7.5

Note: this example is extremely small. In practice, a rule needs the support of several hundred
transactions, before it can be considered statistically significant, and datasets often contain
thousands or millions of transactions.

Ok, enough for the theory, let’s get to the code.


The dataset we are using today comes from ​UCI Machine Learning repository​. The dataset is
called “Online Retail” and can be found ​here​. It contains all the transactions occurring between
01/12/2010 and 09/12/2011 for a UK-based and registered online retailer.

24.Explain Apriori algorithm with help of an example.


A24

Apriori Algorithm

With the quick growth in e-commerce applications, there is an accumulation vast quantity
of data in months not in years. Data Mining, also known as Knowledge Discovery in
Databases(KDD), to find anomalies, correlations, patterns, and trends to predict outcomes.

Apriori algorithm is a classical algorithm in data mining. It is used for mining frequent
itemsets and relevant association rules. It is devised to operate on a database containing a
lot of transactions, for instance, items brought by customers in a store.

It is very important for effective Market Basket Analysis and it helps the customers in
purchasing their items with more ease which increases the sales of the markets. It has also
been used in the field of healthcare for the detection of adverse drug reactions. It produces
association rules that indicates what all combinations of medications and patient
characteristics lead to ADRs.

Association rules

Association rule learning is a prominent and a well-explored method for determining


relations among variables in large databases. Let us take a look at the formal definition of
the problem of association rules given by ​Rakesh Agrawal​​, the President and Founder of
the Data Insights Laboratories.

Let I={i1,i2,i3,…,in} be a set of n attributes called items and D={t1,t2,…,tn} be the set of
transactions. It is called database. Every transaction, ti in D has a unique transaction ID,
and it consists of a subset of itemsets in I.
⟶Y where X and Y are subsets of I(X,Y⊆
A rule can be defined as an implication, X⟶ ⊆I), and
they have no element in common, i.e., X∩Y. X and Y are the antecedent and the consequent
of the rule, respectively.

Let’s take an easy example from the supermarket sphere. The example that we are
considering is quite small and in practical situations, datasets contain millions or billions of
transactions. The set of itemsets, I ={Onion, Burger, Potato, Milk, Beer} and a database
consisting of six transactions. Each transaction is a tuple of 0’s and 1’s where 0 represents
the absence of an item and 1 the presence.

An example for a rule in this scenario would be {Onion, Potato} => {Burger}, which means
that if onion and potato are bought, customers also buy a burger.

Transaction Onion Potato Burger Milk Beer


ID

t1 1 1 1 0 0

t2 0 1 1 1 0

t3 0 0 0 1 1

t4 1 1 0 1 0
t5 1 1 1 0 1

t6 1 1 1 1 1

There are multiple rules possible even from a very small database, so in order to select the
interesting ones, we use constraints on various measures of interest and significance. We
will look at some of these useful measures such as support, confidence, lift and conviction.

Support

The support of an itemset X, supp(X) is the proportion of transaction in the database in


which the item X appears. It signifies the popularity of an itemset.

supp(X)=Number of transaction in whichXappearsTotal number of transactions.

In the example above, supp(Onion)=46=0.66667.

If the sales of a particular product (item) above a certain proportion have a meaningful
effect on profits, that proportion can be considered as the support threshold. Furthermore,
we can identify itemsets that have support values beyond this threshold as significant
itemsets.

Confidence

Confidence of a rule is defined as follows:

⟶Y)=supp(X∪
conf(X⟶ ∪Y)supp(X)

It signifies the likelihood of item Y being purchased when item X is purchased. So, for the
rule {Onion, Potato} => {Burger},

Undefined control sequence \implies


This implies that for 75% of the transactions containing onion and potatoes, the rule is
correct. It can also be interpreted as the conditional probability P(Y|X), i.e, the probability
of finding the itemset Yin transactions given the transaction already contains X.

It can give some important insights, but it also has a major drawback. It only takes into
account the popularity of the itemset X and not the popularity of Y. If Y is equally popular
as X then there will be a higher probability that a transaction containing X will also
contain Y thus increasing the confidence. To overcome this drawback there is another
measure called lift.

Lift

The lift of a rule is defined as:

⟶Y)=supp(X∪
lift(X⟶ ∪Y)supp(X)∗∗supp(Y)

This signifies the likelihood of the itemset Ybeing purchased when item X is purchased
while taking into account the popularity of Y.

In our example above,

Undefined control sequence \implies

If the value of lift is greater than 1, it means that the itemset Y is likely to be bought with
itemset X, while a value less than 1 implies that itemset Y is unlikely to be bought if the
itemset X is bought.

Conviction

The conviction of a rule can be defined as:

⟶Y)=1−supp(Y)1−conf(X⟶
conv(X⟶ ⟶Y)

For the rule {onion, potato}=>{burger}

Undefined control sequence \implies

The conviction value of 1.32 means that the rule {onion,potato}=>{burger} would be
incorrect 32% more often if the association between X and Y was an accidental chance.
How does Apriori algorithm work?

So far, we learned what the Apriori algorithm is and why is important to learn it.

A key concept in Apriori algorithm is the anti-monotonicity of the support measure. It


assumes that

1. All subsets of a frequent itemset must be frequent


2. Similarly, for any infrequent itemset, all its supersets must be infrequent too

Let us now look at the intuitive explanation of the algorithm with the help of the example
we used above. Before beginning the process, let us set the support threshold to 50%, i.e.
only those items are significant for which support is more than 50%.

Step 1: Create a frequency table of all the items that occur in all the transactions. For our
case:

Item Frequency (No. of


transactions)

Onion(O) 4

Potato(P) 5

Burger(B) 4
Milk(M) 4

Beer(Be) 2

Step 2: We know that only those elements are significant for which the support is greater than or
equal to the threshold support. Here, support threshold is 50%, hence only those items are
significant which occur in more than three transactions and such items are Onion(O), Potato(P),
Burger(B), and Milk(M). Therefore, we are left with:

Item Frequency (No. of


transactions)

Onion(O) 4

Potato(P) 5

Burger(B) 4

Milk(M) 4

The table above represents the single items that are purchased by the customers frequently.
Step 3: The next step is to make all the possible pairs of the significant items keeping in
mind that the order doesn’t matter, i.e., AB is same as BA. To do this, take the first item
and pair it with all the others such as OP, OB, OM. Similarly, consider the second item and
pair it with preceding items, i.e., PB, PM. We are only considering the preceding items
because PO (same as OP) already exists. So, all the pairs in our example are OP, OB, OM,
PB, PM, BM.

Step 4: We will now count the occurrences of each pair in all the transactions.

Itemset Frequency (No. of transactions)

OP 4

OB 3

OM 2

PB 4

PM 3

BM 2
Step 5: Again only those itemsets are significant which cross the support threshold, and those are
OP, OB, PB, and PM.

Step 6: Now let’s say we would like to look for a set of three items that are purchased together.
We will use the itemsets found in step 5 and create a set of 3 items.

To create a set of 3 items another rule, called self-join is required. It says that from the item pairs
OP, OB, PB and PM we look for two pairs with the identical first letter and so we get

● OP and OB, this gives OPB


● PB and PM, this gives PBM

Next, we find the frequency for these two itemsets.

Itemset Frequency (No. of transactions)

OPB 4

PBM 3

Applying the threshold rule again, we find that OPB is the only significant itemset.

Therefore, the set of 3 items that was purchased most frequently is OPB.

The example that we considered was a fairly simple one and mining the frequent itemsets
stopped at 3 items but in practice, there are dozens of items and this process could continue
to many items. Suppose we got the significant sets with 3 items as OPQ, OPR, OQR, OQS
and PQR and now we want to generate the set of 4 items. For this, we will look at the sets
which have first two alphabets common, i.e,

● OPQ and OPR gives OPQR


● OQR and OQS gives OQRS

In general, we have to look for sets which only differ in their last letter/item.

Now that we have looked at an example of the functionality of Apriori Algorithm, let us
formulate the general process.

General Process of the Apriori algorithm

The entire algorithm can be divided into two steps:

Step 1: Apply minimum support to find all the frequent sets with k items in a database.

Step 2: Use the self-join rule to find the frequent sets with k+1 items with the help of
frequent k-itemsets. Repeat this process from k=1 to the point when we are unable to apply
the self-join rule.

This approach of extending a frequent itemset one at a time is called the “bottom up”
approach.
Mining Association Rules

Till now, we have looked at the Apriori algorithm with respect to frequent itemset
generation. There is another task for which we can use this algorithm, i.e., finding
association rules efficiently.

For finding association rules, we need to find all rules having support greater than the
threshold support and confidence greater than the threshold confidence.

But, how do we find these? One possible way is brute force, i.e., to list all the possible
association rules and calculate the support and confidence for each rule. Then eliminate the
rules that fail the threshold support and confidence. But it is computationally very heavy
and prohibitive as the number of all the possible association rules increase exponentially
with the number of items.

Given there are n items in the set I, the total number of possible association rules is
3n–2n+1+1.

We can also use another way, which is called the two-step approach, to find the efficient
association rules.

The two-step approach is:

Step 1: Frequent itemset generation: Find all itemsets for which the support is greater than
the threshold support following the process we have already seen earlier in this article.

Step 2: Rule generation: Create rules from each frequent itemset using the binary partition
of frequent itemsets and look for the ones with high confidence. These rules are called
candidate rules.

Let us look at our previous example to get an efficient association rule. We found that OPB
was the frequent itemset. So for this problem, step 1 is already done. So, let’ see step 2. All
the possible rules using OPB are:

⟶B, OB⟶
OP⟶ ⟶P, PB⟶
⟶O, B⟶
⟶ OP, P⟶
⟶OB, O⟶
⟶PB

If X is a frequent itemset with k elements, then there are 2k−2 candidate association rules.

We will not go deeper into the theory of the Apriori algorithm for rule generation.
25.What is classification and prediction? Compare classification and prediction
methods.
A25

There are two forms of data analysis that can be used for extracting models describing
important classes or to predict future data trends. These two forms are as follows −

● Classification

● Prediction

Classification models predict categorical class labels; and prediction models predict continuous
valued functions. For example, we can build a classification model to categorize bank loan
applications as either safe or risky, or a prediction model to predict the expenditures in dollars
of potential customers on computer equipment given their income and occupation.

What is classification?

Following are the examples of cases where the data analysis task is Classification −

● A bank loan officer wants to analyze the data in order to know which customer (loan

applicant) are risky or which are safe.

● A marketing manager at a company needs to analyze a customer with a given profile,

who will buy a new computer.

In both of the above examples, a model or classifier is constructed to predict the categorical
labels. These labels are risky or safe for loan application data and yes or no for marketing data.

What is prediction?

Following are the examples of cases where the data analysis task is Prediction −

Suppose the marketing manager needs to predict how much a given customer will spend during
a sale at his company. In this example we are bothered to predict a numeric value. Therefore the
data analysis task is an example of numeric prediction. In this case, a model or a predictor will
be constructed that predicts a continuous-valued-function or ordered value.
Note − Regression analysis is a statistical methodology that is most often used for numeric
prediction.

How Does Classification Works?

With the help of the bank loan application that we have discussed above, let us understand the
working of classification. The Data Classification process includes two steps −

● Building the Classifier or Model

● Using Classifier for Classification

Building the Classifier or Model

● This step is the learning step or the learning phase.

● In this step the classification algorithms build the classifier.

● The classifier is built from the training set made up of database tuples and their

associated class labels.

● Each tuple that constitutes the training set is referred to as a category or class. These

tuples can also be referred to as sample, object or data points.

Using Classifier for Classification


In this step, the classifier is used for classification. Here the test data is used to estimate the
accuracy of classification rules. The classification rules can be applied to the new data tuples if
the accuracy is considered acceptable.

Classification and Prediction Issues

The major issue is preparing the data for Classification and Prediction. Preparing the data
involves the following activities −

● Data Cleaning − Data cleaning involves removing the noise and treatment of missing

values. The noise is removed by applying smoothing techniques and the problem of

missing values is solved by replacing a missing value with most commonly occurring

value for that attribute.

● Relevance Analysis − Database may also have the irrelevant attributes. Correlation

analysis is used to know whether any two given attributes are related.

● Data Transformation and reduction − The data can be transformed by any of the

following methods.

○ Normalization − The data is transformed using normalization. Normalization

involves scaling all values for given attribute in order to make them fall within a
small specified range. Normalization is used when in the learning step, the neural

networks or the methods involving measurements are used.

○ Generalization − The data can also be transformed by generalizing it to the

higher concept. For this purpose we can use the concept hierarchies.

Note − Data can also be reduced by some other methods such as wavelet transformation,
binning, histogram analysis, and clustering.

Comparison of Classification and Prediction Methods

Here is the criteria for comparing the methods of Classification and Prediction −

● Accuracy − Accuracy of classifier refers to the ability of classifier. It predict the class

label correctly and the accuracy of the predictor refers to how well a given predictor can

guess the value of predicted attribute for a new data.

● Speed − This refers to the computational cost in generating and using the classifier or

predictor.

● Robustness − It refers to the ability of classifier or predictor to make correct predictions

from given noisy data.

● Scalability − Scalability refers to the ability to construct the classifier or predictor

efficiently; given large amount of data.

● Interpretability − It refers to what extent the classifier or predictor understands.

26.Differentiate between supervised learning and unsupervised learning?


A26

Supervised and Unsupervised Machine Learning Algorithms

Supervised Machine Learning


The majority of practical machine learning uses supervised learning.

Supervised learning is where you have input variables (x) and an output variable (Y) and you use
an algorithm to learn the mapping function from the input to the output.
Y = f(X)

The goal is to approximate the mapping function so well that when you have new input data (x)
that you can predict the output variables (Y) for that data.

It is called supervised learning because the process of an algorithm learning from the training
dataset can be thought of as a teacher supervising the learning process. We know the correct
answers, the algorithm iteratively makes predictions on the training data and is corrected by the
teacher. Learning stops when the algorithm achieves an acceptable level of performance.

Supervised learning problems can be further grouped into regression and classification problems.

Classification: A classification problem is when the output variable is a category, such as “red”
or “blue” or “disease” and “no disease”.
Regression: A regression problem is when the output variable is a real value, such as “dollars” or
“weight”.
Some common types of problems built on top of classification and regression include
recommendation and time series prediction respectively.

Some popular examples of supervised machine learning algorithms are:

Linear regression for regression problems.


Random forest for classification and regression problems.
Support vector machines for classification problems.
Unsupervised Machine Learning
Unsupervised learning is where you only have input data (X) and no corresponding output
variables.

The goal for unsupervised learning is to model the underlying structure or distribution in the data
in order to learn more about the data.

These are called unsupervised learning because unlike supervised learning above there is no
correct answers and there is no teacher. Algorithms are left to their own devises to discover and
present the interesting structure in the data.

Unsupervised learning problems can be further grouped into clustering and association problems.

Clustering: A clustering problem is where you want to discover the inherent groupings in the
data, such as grouping customers by purchasing behavior.
Association: An association rule learning problem is where you want to discover rules that
describe large portions of your data, such as people that buy X also tend to buy Y.
Some popular examples of unsupervised learning algorithms are:
k-means for clustering problems.
Apriori algorithm for association rule learning problems.

27.Explain classification using decision tree induction.


A27
Data Mining - Decision Tree Induction
A decision tree is a structure that includes a root node, branches, and leaf nodes. Each internal
node denotes a test on an attribute, each branch denotes the outcome of a test, and each leaf node
holds a class label. The topmost node in the tree is the root node.

The following decision tree is for the concept buy_computer that indicates whether a customer
at a company is likely to buy a computer or not. Each internal node represents a test on an
attribute. Each leaf node represents a class.

The benefits of having a decision tree are as follows −

● It does not require any domain knowledge.

● It is easy to comprehend.

● The learning and classification steps of a decision tree are simple and fast.

Decision Tree Induction Algorithm


A machine researcher named J. Ross Quinlan in 1980 developed a decision tree algorithm
known as ID3 (Iterative Dichotomiser). Later, he presented C4.5, which was the successor of
ID3. ID3 and C4.5 adopt a greedy approach. In this algorithm, there is no backtracking; the
trees are constructed in a top-down recursive divide-and-conquer manner.

Generating a decision tree form training tuples of data partition D


Algorithm : Generate_decision_tree

Input:
Data partition, D, which is a set of training tuples
and their associated class labels.
attribute_list, the set of candidate attributes.
Attribute selection method, a procedure to determine the
splitting criterion that best partitions that the data
tuples into individual classes. This criterion includes a
splitting_attribute and either a splitting point or splitting subset.

Output:
A Decision Tree

Method
create a node N;

if tuples in D are all of the same class, C then


return N as leaf node labeled with class C;

if attribute_list is empty then


return N as leaf node with labeled
with majority class in D;|| majority voting

apply attribute_selection_method(D, attribute_list)


to find the best splitting_criterion;
label node N with splitting_criterion;

if splitting_attribute is discrete-valued and


multiway splits allowed then // no restricted to binary trees

attribute_list = splitting attribute; // remove splitting attribute


for each outcome j of splitting criterion

// partition the tuples and grow subtrees for each partition


let Dj be the set of data tuples in D satisfying outcome j; // a partition
if Dj is empty then
attach a leaf labeled with the majority
class in D to node N;
else
attach the node returned by Generate
decision tree(Dj, attribute list) to node N;
end for
return N;

28.Explain tree pruning.


A28
Tree Pruning

Tree pruning is performed in order to remove anomalies in the training data due to noise or
outliers. The pruned trees are smaller and less complex.

Tree Pruning Approaches

There are two approaches to prune a tree −

● Pre-pruning − The tree is pruned by halting its construction early.

● Post-pruning - This approach removes a sub-tree from a fully grown tree.

Cost Complexity

The cost complexity is measured by the following two parameters −

● Number of leaves in the tree, and

● Error rate of the tree.

29.What is clustering? Explain k-means clustering with the help of example.

A29

A Hospital Care chain wants to open a series of Emergency-Care wards within a region. We

assume that the hospital knows the location of all the maximum accident prone areas in the

region. They have to decide the number of the Emergency Units to be opened and the location of
these Emergency Units, so that all the accident prone areas are covered in the vicinity of these

Emergency Units.

The challenge is to decide the location of these Emergency Units so that the whole region is

covered. Here is when K-means Clustering comes to rescue!

Before getting to K-means Clustering, let us first understand what Clustering is.

A cluster refers to a small group of objects. Clustering is grouping those objects into clusters. In

order to learn clustering, it is important to understand the scenarios that leads to cluster different

objects. Let us identify few of them.

What is Clustering?

Clustering is dividing data points into homogeneous classes or clusters:

● Points in the same group are as similar as possible

● Points in different group are as dissimilar as possible

When a collection of objects is given, we put objects into group based on similarity.
Application of Clustering:

Clustering is used in almost all the fields. You can infer some ideas from Example 1 to come up

with lot of clustering applications that you would have come across.

Listed here are few more applications, which would add to what you have learnt.

● Clustering helps marketers improve their customer base and work on the target areas. It

helps group people (according to different criteria’s such as willingness, purchasing

power etc.) based on their similarity in many ways related to the product under

consideration.

● Clustering helps in identification of groups of houses on the basis of their value, type and

geographical locations.

● Clustering is used to study earth-quake. Based on the areas hit by an earthquake in a

region, clustering can help analyse the next probable location where earthquake can

occur.

Clustering Algorithms:

A Clustering Algorithm tries to analyse natural groups of data on the basis of some similarity. It

locates the centroid of the group of data points. To carry out effective clustering, the algorithm

evaluates the distance between each point from the centroid of the cluster.

The goal of clustering is to determine the intrinsic grouping in a set of unlabelled data.
What is K-means Clustering?

K-means (Macqueen, 1967) is one of the simplest unsupervised learning algorithms that solve

the well-known clustering problem. K-means clustering is a method of vector quantization,

originally from signal processing, that is popular for cluster analysis in data mining.

K-means Clustering – Example 1:

A pizza chain wants to open its delivery centres across a city. What do you think would be the

possible challenges?

● They need to analyse the areas from where the pizza is being ordered frequently.

● They need to understand as to how many pizza stores has to be opened to cover delivery

in the area.

● They need to figure out the locations for the pizza stores within all these areas in order to

keep the distance between the store and delivery points minimum.
Resolving these challenges includes a lot of analysis and mathematics. We would now learn

about how clustering can provide a meaningful and easy method of sorting out such real life

challenges. Before that let’s see what clustering is.

K-means Clustering Method:

If k is given, the K-means algorithm can be executed in the following steps:

● Partition of objects into k non-empty subsets

● Identifying the cluster centroids (mean point) of the current partition.

● Assigning each point to a specific cluster

● Compute the distances from each point and allot points to the cluster where the distance

from the centroid is minimum.

● After re-allotting the points, find the centroid of the new cluster formed.
The step by step process:

Now, let’s consider the problem in Example 1 and see how we can help the pizza chain to come

up with centres based on K-means algorithm.

Similarly, for opening Hospital Care Wards:

K-means Clustering will group these locations of maximum prone areas into clusters and define

a cluster center for each cluster, which will be the locations where the Emergency Units will

open. These Clusters centers are the centroids of each cluster and are at a minimum distance

from all the points of a particular cluster, henceforth, the Emergency Units will be at minimum

distance from all the accident prone areas within a cluster.


Here is another example for you, try and come up with the solution based on your understanding

of K-means clustering.

30.What is process mining?

A30

When people hear the words data mining, they nowadays have an idea what it means. With data

mining, we often mean a process of analyzing data from several perspectives and summarizing it

into useful information. With the support from this information, we can then make decisions that

affect the success of a company. However, even when data mining is familiar to people, process

mining still seems to be a new topic for many. Very often I encounter questions like, “What is

process mining? What do you do when you do process mining?”

On a general level, we aim to do the same with process mining as with data mining - to analyze

data from different perspectives and summarize it into information that can be used when making

business decisions. But this time the context are the business processes of an organization. In

process mining, we take the data that exists in the information systems of a company and use that

to visualize what is actually happening in the company’s processes and how they are executed in

real life. Almost all IT systems store data in data bases and create logs that can be described in

process mining terms as "event data". This is the basis for process mining and the analyses

conducted.
31.Compare data mining and process mining

A31

DATA MINING VS. PROCESS MINING: WHAT ARE THE DIFFERENCES?

Patterns versus processes

We use data mining to analyze data and to detect or predict patterns. For example: which target

groups buy which products, where does my marketing campaign have the greatest effect, etc ...

Data mining has no direct link with business processes, as opposed to process mining. The latter

focuses on discovering, controlling and improving actual business processes. By analyzing data

derived from the IT systems that support our processes, process mining gives us a true,

end-to-end view of how business processes operate.

Static versus dynamic

Data mining analyzes static information. In other words: data that is available at the time of

analysis. Process mining on the other hand looks at how the data was actually created. Process

mining techniques also allow users to generate processes dynamically based on the most recent

data. Process mining can even provide a real-time view of business processes through a live feed.

Arbitrary versus specific

Data mining will look for hidden patterns in data collections, but does not answer specific

questions. Process mining techniques on the other hand allow you to specifically look for

answers to clear and predefined questions.

Results versus causes

A data mining analysis reveals certain patterns, but does not answer the question of how those

patterns have been established. Data mining is limited solely to the analysis of results. Process
mining on the other hand can provide insight into how results were arrived at. The technique

does not search for patterns in the data, but for causal processes.

Mainstream versus deviations

In data mining, it is important to focus on major patterns within a data set. Data that fall outside

this mainstream patterns are often not included in the analysis. In process mining, exceptions can

sometimes be at least as important. Exceptions may be an early indicator of inefficiencies or

opportunities for improvement.

Want to know how you can apply process mining for process optimization in your company?

Then contact one of our ​process mining consultants​.

32.Discuss different types of Process mining? Demonstrate it diagrammatically.

A32

Process mining types

Process mining could be used for different purposes as is mentioned in the introduction.

Literature mentions different types of process mining [3][4][5][6], t main types of process

mining are:

Process discovery makes a model out of the event log without using any additional information.

It is used to learn how the process flows.

Process conformance compares an existing process model with the event log of that process.

With this technique, it is possible to check if reality and the model correlate.

Process enhancement tries to extend or improve an already existing process. While process

conformance only checks the existing model, process enhancement changes this model. Van der
Aalst [6] calls this re-engineering.

The fourth type that is mentioned by Van der Aalst [6] is operational support. . For example,

predictions could help with deadlines and predicting remaining costs.

Figure 1 [6] positions these four types of process mining. Figure 2 is an adapted figure from van

der Aalst et al. [2] that describes the needed input and delivered output of the different process

mining types. The operational support part of the figure is added to the original figure.

Figure 1: Overview positioning the different types of process mining and the role of log

abstractions and model abstractions. Reprinted from [6].


Figure 2: The basic types of process mining explained in terms of input and output. Adapted

from [2].

33.Process Mining covers different perspectives? Discuss each of them in detail.

A33

Process mining perspectives

How a process is used, by whom, and for what reason is valuable information for an auditor.

Process mining could help with determining these aspects. These aspects represent the following

different process mining perspectives that could be distinguished [1][7][8].

● The ​process perspective​​ focuses on the flow of activities. The goal in this perspective

is to find good descriptions of all possible paths, usually resulting in a process model.

● The ​organisational perspective​​ focuses on how the organisational structure looks

like. The goal of this perspective is to determine which roles are present and how they

interact.

● The ​case perspective​​ focuses on the characteristics of a case, like its path and actors.

● Van der Aalst et al. [2] mentioned a fourth process mining perspective in the process

mining manifesto, the ​time perspective​​. This perspective focuses on the timing of the
events, which could lead to discovering bottlenecks, monitor resources, predict the

remaining time and measure service levels.

These perspectives help to determine how a process was executed, who was involved and what

happened [7]. So, the perspectives indicate how a user looks at a process model. While the

process mining types on the other hand determine how process mining could be used.

34.Name the different Process Mining software.

A34

The technology is often applied to the most common and complex business processes executed
in most organizations, such as ​order to cash​, ​accounts payable​ and ​supply chain management​. An
organization might use process mining software to find the cause of unexpected delays in invoice
processing, for example, by examining the logs of the accounts payable module in an ERP
system. Analysis of an ​audit log​ can also spot deviations from important regulations, such as
U.S. ​Sarbanes-Oxley Act (SOX)​ rules for archiving business records or ​HIPAA​ requirements for
protecting medical records.

The broader benefits of the efficiencies enabled by process mining software include reduced risk,
improved customer satisfaction more revenue, and better transparency of IT systems and
business processes.
At least a dozen commercial software vendors specialize in process mining software, and
academic organizations offer free versions. Many of the products have integration that allows
them to work with popular brands of enterprise applications.

You might also like