Process Risk Simulation

Analyzing the effects of risks and controls in business processes Ralf Angeli, Martin Kling
May 2011

Business White Paper



Ralf Angeli is business analyst and software designer for ARIS products at Software AG. Among other things he is responsible for the functional specication of ARIS Business Simulator.

The rst decade of the 21st century showed that, while we cannot see into the future, no industry can afford to ignore potential what-if scenarios in their planning. Testament to this is the recent oil disaster in the US, global nancial crisis, and nuclear meltdown following the earthquake and tsunami in Japan. While it is impossible to plan fully for the unexpected, some of the subsequent damage which occurred might have been reduced or prevented through using scenario management and simulation techniques.

Martin Kling is responsible on corporate level for the ARIS Solution for Governance, Risk and Compliance at Software AG. Besides driving the future development of features and capabilities to ensure customers increase their GRC maturity he is strongly involved in the supervision of customer projects during setup and delivery and a known author on various GRC topics in books, articles and blogs.

Despite its growing maturity, simulation is still regarded by some as being complicated and impractical from a management perspective, even through the downfalls in static analysis of risk positions pertaining to business processes, projects, insurances or trading are well documented. Simulation is still perceived by some as an approach which involves too much data, too much expertise, and specialist skill sets to implement. However the reality is somewhat different and this mindset is in stark contrast to the results reported by those who have leveraged simulation. Simulation usage has grown from below 10% of process-related projects to over 35% according to Gartner1. And simulation based on real data has grown from one or two percent to over 10%. In a world where business models and constraints are changing faster than ever and investments become higher and higher, simulation can provide a means of mitigating against disaster and achieving high-margin objectives. In fact simulation is now available as part of business process management software and is easy to use by anyone in a business context. Investment in the technology is very low compared with the savings which can be achieved. Generally speaking, it is fairly easy to save 1 million by reducing risk event occurrences and optimizing process performance in parallel. ROI on simulation projects is a given in most cases. This white paper is intended to introduce the use of business process simulation with special focus on risk events and control activities. With the help of a small sample process the authors demonstrate how the effects of risk occurrence as well as the effects of controls introduced to counter the risks can be determined by conducting a simulation study with ARIS Business Simulator. In particular modeling and conguration of risks and controls in a process model, the simulation of the process model and the interpretation of the simulation results will be discussed.

A signicant part of business process management is the analysis and improvement of processes. The analysis may involve many aspects, such as media breaks, added value, throughput times, bottlenecks, resource utilization or cost. But often, operational risk is ignored. Given that this type of risk arises from the operations of an organization, it is inherent in the execution of business processes. In the Basel II regulations operational risk is dened as the risk of loss resulting from inadequate or failed internal processes, people and systems or from external events.2 In the context of business processes, loss is not just associated with nancial damages but also with negative effects on process performance due to additional work or poor availability of resources. In many cases those responsible for dealing with the results of risk occurrences (e.g. damaged products or services) are the same ones that are responsible for the regular execution of the affected process itself. Any mistake (risk occurrence) which has a work activity consequence will automatically result in a resource bottleneck which has to be compensated for by speeding things up or neglecting activities that keep the product or service from being delivered as soon as possible. Such a reaction may lead to increasing risk probabilities and in consequence to more severe bottlenecks should the respective risks occur. The result would be a positive feedback loop which increasingly disrupts the provision of products and services. Depending on the process these effects can be quite signicant and should therefore be taken into consideration during the process analysis. In order to do this the risks have to be identied, analyzed and evaluated. With the resulting information an organization can, based on its risk appetite, decide which risks can be accepted and which risks have to be avoided or reduced by means of countermeasures such as process changes or controls.

1 2
Business process modeling is the basis for the analysis of scenarios. It helps to identify operational risks in complex business processes, and allows discussions around where to place effective controls for countering the risks. Once the risks and controls are identied and specied, their effects can be analyzed and evaluated. This can be done with calculation or simulation methods.

MODELiNG risks iN BUsiNEss prOCEssEs

In the context of a business process a risk event can occur during the execution of an activity, regardless of whether it is caused by the activity or triggered externally. The occurrence of a risk event is not certain but always associated with a probability. And when a risk event occurs it has undesired consequences, such as the loss of money, materials or resources. These traits of risk can be expressed in a model. In the ARIS risk management methodology a risk is modeled with a risk object. Such an object is connected to a function object in order to convey that the risk may occur during the execution of the function. The probability of risk occurrence can be specied in an attribute of the risk object which accepts a value between 0 and 1. Every time the connected function is executed during simulation, the value is used to determine if the risk occurs or not. The direct effect of a risk occurrence is nancial loss. The height of this loss can be given as a xed or distributed value. If a probability distribution is specied, a value is sampled from the distribution every time the risk occurs. Figure 1 shows how risks appear in an event-driven process chain (EPC). The model, which serves as an example for this article, describes (in an over-simplied way) the process of a credit transfer. Assuming that a paper-based credit transfer order is received by a bank, a bank employee enters the data into a banking system. The banking system then electronically transfers a message with the order data to the bank that holds the account to which the funds should be transferred. The sending of the message marks the end of the process. Figure 1: Credit transfer process with risks

Both functions in the process have risks associated with them. The risk of incorrect details being transferred from the paper-based order to the banking application is related to the rst function. In the example it is assumed that this risk has a probability of 10% and causes damages which are equally distributed between 10 and 30 monetary units every time it occurs. Related to the second function is the risk of the banking system failing during the transfer. This risk has a probability of 0.1% and causes damages which follow a normal distribution with a mean value of 10,000 and a standard deviation of 1,000 monetary units every time it occurs. 4
In order to make the model more realistic, some settings regarding the frequency of the process, the processing times and the available resources are made: The transfer of data from the paper-based order into the system takes 20 seconds and is done by one bank employee who is available all the time. In order to have the employee working at capacity, the process needs to be started every 20 seconds. Therefore the frequency of the start event is set to 4,320 occurrences per day. Finally, the transfer of the order data to the receiving bank keeps the banking system busy for 5 seconds. (Note that the values used throughout the example are purely ctitious and do not claim to be overly realistic.)


With the model congured, the simulation can be started. In the example a simulation period of ten days is used. It is possible to have a look at the results during the simulation. This is shown in Figure 2 which provides an impression of the user interface of ARIS Business Simulator. Figure 2: ARIS Business Simulator simulating the sample process with risks

In the right part of the toolbar the controls for the simulation (e.g. run, pause, advance stepwise or reset) are located. Below is a representation of the process model. Objects in the model are highlighted when they are active. Around the objects a few object-related simulation results, like the activation count of a risk or function, are shown. Below the model view is a pane with statistics tables which contain the main output of the simulation and which will be used later for the analysis. In the screenshot the cumulative risk statistics are visible which provide information about the number of risk occurrences and the amount of damages resulting from these occurrences, among other things. At the bottom of the window the simulation status is displayed.

ANaLYsis OF risk EFFECTs

The results of the simulation run are the basis for analysis. In our example the main goal is to nd out how often risks events occurred and how large the losses are due to these occurrences. These values can be interesting in their own right in order to get a better understanding of the cost involved in the execution of the process but they can also be used to evaluate the usefulness of controls which should reduce the losses. (Controls are introduced further below.) Over the course of ten days the process was executed 43,199 times. The throughput time was always 25 seconds. The bank employee worked 240 hours at 100% capacity. (The model contains a person type object, so it was not necessarily one person who has been working for those 240 hours.) Since the process was executed 43,199 times, so were the two functions in the process. This means there were 43,199 opportunities for each risk to occur. The actual occurrence counts are much lower, though, due to the probabilities which were set. The respective numbers can be found in the cumulative risk statistics which are shown in Table 1. Table 1: Cumulative risk statistics (shortened) Data transfer error
Number of occurrences Accumulated amount of damages 4,227 84,457.51

System failure
45 458,378.98

The interpretation of the results in the table is straightforward and the outcome was somewhat to be expected in this simplistic example. The data transfer error occurs 4,227 times and the system fails 45 times throughout the 43,199 opportunities for risk occurrence. These results are in line with the probabilities given to the risk objects. Data transfer errors account for 84,458 and system failures for 458,379 monetary units of damages. So even though system failures are much more seldom than data transfer errors, the overall damages are much higher due to the high amount of damages per occurrence. Based on these results alone one would probably try to improve the system reliability in order to reduce the larger share of damages. However whether this is the right strategy obviously also depends upon the cost for a reduction of these damages. And for these no information is available yet.

So far it has been demonstrated how the consequences and the amount of nancial damages from risk occurrence can be determined with simulation. The following sections show how controls can be used to counter risks, either by preventing them or by mitigating their effects. Since controls usually come at a cost the explanations will also provide some information about how to decide if it is worthwhile for a control to be introduced or not. Controls are a tricky thing to implement. First a clear understanding how the business (process) works is required to make sure the control covers all major possible variations of a process in order to be effective most of the time. It makes no sense to implement controls solely based on a to-be process because quite often the as-is process is substantially different. (Techniques like the reconstruction feature of ARIS Process Performance Manager allow for better condence on this issue.)

Secondly it has to be kept in mind that any control activity requires additional effort and is not directly creating value. To put it bluntly, controls are on the cost side of business. In addition to that they often have to be built into processes where the pain of the original risk is not experienced directly. For example they might be built into production or sales processes, yet it is only the owners of the revenue recognition processes in nancial accounting who see the damages of an incorrect nancial statement rst hand. This problem leads to issues concerning the thoroughness and willingness of control execution. In addition to placing a control where it is allowed to be effective, the control must be efcient. However, efciency of a control can only be determined in comparison with the effects achieved in this case the reduction of damages.


Controls are artefacts or procedures which reduce the probability of risk occurrence (preventive controls) or reduce the damages resulting from a risk occurrence (detective controls). Preventive controls are conducted before or at the time a risk may occur. An example is the four-eye principle which can be applied to reduce the probability of inadequate or fraudulent actions. Detective controls are conducted after the time a risk may have occurred. They are supposed to discover the occurrence of a risk event and thereby open up the possibility to start corrective actions, but can also comprise mechanisms which directly reduce damages. Examples are inspections or insurances. In ARIS, controls are function objects with a special symbol. In order to indicate where they are carried out they can be placed in an EPC, either directly in the control ow or attached to a function which is part of the control ow. The control also has to be connected to the risk it affects. Such a connection can be established in a business controls diagram. (Note for users of ARIS Risk & Compliance Manager: Simulation also supports the risk-based modeling approach, where controls are not added to event-driven process chains but only connected to risks. The models do not have to be changed to use the simulation, though, because the controls can be pulled in via the risks.) The behavior of a control is also governed by the values of certain attributes. As mentioned before, controls can either be preventive or detective. Depending on which type is chosen, different effects have to be congured. For preventive controls the percentage by which the probability of a connected risk is reduced has to be specied. For detective controls it is the percentage by which the damages of an occurred risk are reduced. For both types it is also possible to specify a control effectiveness value. This is a percentage which indicates the probability that the control is carried out successfully and thereby able to reduce the risk occurrence probability or the damages respectively. The effectiveness of a control is evaluated every time the control is carried out and the control effect is only applied if the control was effective.

Figure 3 shows how controls appear in a process model. It contains the same process which was used as an example in the previous sections. Compared to Figure 1 two controls have been added to the process. These have been connected to the functions where they are supposed to be carried out and to the risks they affect. (The connections to the risks were done in business control diagrams which are not shown in the picture.) Figure 3: Credit transfer process with risks and controls

In order to counter the risk of a data transfer error an additional check of the transferred data has been introduced. The double check control is detective because it detects an error after its occurrence and consequently makes it possible to correct it. In the given example an effectiveness of 95% is assumed. If the control is effective it will reduce the damage by 100%. A cost of 0.2 monetary units accrues per execution of the control. (This is the easiest way to account for additional effort in the case of control execution. A more complex option could be a complete re-work process that would be triggered when the control found an error.) The risk of a system failure is countered by the provision of a backup system. This has been modeled as a preventive control because the backup system is supposed to make the whole system, i.e. the combination of the primary system and the backup system, more robust. For the backup system control an effectiveness of 90% is assumed. If it is effective it reduces the probability of a system failure by 100%. Per execution of the control a cost of 0.4 monetary units accrues. The placement of the controls in the gure hints at their types. The detective control was placed after the risk and the preventive control before the risk it affects. However, the position in the graphical representation of the model has no relevance for the simulation choosing when exactly a control is executed. This is only controlled by the control type. The simulation also makes sure that controls which are attached to the same function as the controlled risk are executed before (preventive controls) or after (detective controls) the risk is evaluated.

With the model in place the simulation run can be started and is carried out completely before looking at the results. The user interface of ARIS Business Simulator with the extended model and the results can be seen in Figure 4. Figure 4: ARIS Business Simulator simulating the sample process with risks and controls


The goal of the analysis on the control side is to decide if it is sensible to introduce the controls specied for the risks in this process. In the example there is no regulatory or contractual need for them, so it is possible to concentrate on the monetary consequences. The basic idea is to compare the loss reduction a control achieves with the cost it causes. If the loss reduction is higher than the cost and the risk higher than the risk appetite, then the control should be implemented. (Such a decision obviously depends on other factors as well, but for the sake of simplicity we neglect those in the example.) Information about the loss reduction can be found in the cumulative risk statistics and the cumulative function statistics. The cumulative risk statistics were already used for the previous analysis above. In this analysis a few more performance indicators were included, which are related to the effects of controls, see Table 2.

Table 2: Cumulative risk statistics (shortened; including effects of controls) Data transfer error
Number of occurrences Number of prevented occurrences Number of detected occurrences Accumulated amount of damages Accumulated amount of prevented damages Accumulated amount of damage reduction 4,308 0 4,092 4,218.82 0.00 81,898.83

System failure
4 46 0 40,659.62 463,933.53 0.00

For the data transfer error risk a detective control was introduced. The results show that 4,092 out of 4,308 occurrences were detected by the control. For each of these occurrences a 100% damage reduction was applied. In consequence 4,218.82 monetary units of damage accrued, whereas 81,898.83 monetary units of damage reduction could be achieved. Since the control has no preventive effect, the amount of prevented damages is zero. For the system failure risk, a preventive control was introduced. With the control in place the risk only occurred 4 times and 46 occurrences were prevented. That means without the control, the risk would have occurred 50 times. From a monetary point of view this means that 40,659.62 monetary units of damages actually accrued but 463,933.53 units were prevented. The above results already look quite positive. They are related to risk, however. Regarding the question about the net effect of the introduced controls the cumulative function statistics (see Table 3) have more interesting information to offer because they provide the data per control. We can see that both controls were executed 43,199 times. Based on the effectiveness values given to the controls the double check control was effective 41,013 times and the backup system control 38,884 times out of the total 43,199 executions. These values represent the number of times a control was able to reduce a risk probability or reduce a loss. The actual number of risk preventions or risk reductions can be lower, though. Table 3: Cumulative function statistics (shortened) Double check
Process folders processed Number of effective controls Number of successful effective controls Accumulated amount of prevented damages Accumulated amount of unprevented damages Accumulated amount of damage reduction Accumulated amount of failed damage reduction 43,199 41,013 4,092 0.00 0.00 81,898.83 4,218.82

Backup system
43,199 38,884 46 463,933.53 40,659.62 0.00 0.00

The detective double check control reduced damages in case of 4,092 risk occurrences. However, as seen above the data transfer risk occurred 4,308 times. That means in 216 cases the control was not effective. For these quantities the statistics also include monetary equivalents. The double check control could reduce 81,898.83 monetary units of damages and failed to reduce 4,218.82 units. Since there is a one-to-one relationship between the risk and the control these values match those for the data transfer risk in the cumulative risk statistics.


The preventive backup system control prevented the system failure risk 46 times from occurring. In the remaining four times the control was not effective. (Note that this conclusion is only possible because in this example an effective control reduces the risk probability by 100%. If the risk probability were not fully reduced, even an effective control might not prevent the risk from occurring.) This equates to prevented damages of 463,933.53 monetary units and unprevented damages of 40,659.62 units. Again, these numbers can also be found in the cumulative risk statistics for the system failure risk because in this example there is a one-to-one relationship between the risk and the control. The interesting pieces of information for the analysis goal are the amount of damage reduction by the double check control (81,898.83) and the amount of prevented damages by the backup system control (463,933.53). These need to be compared with the costs of the controls which can be found in the function cost statistics (not shown here). For the double check control these are 8,639.80 monetary units and for the backup system control 17,279.60. This means the introduction of the controls has the following net effects: Double check
Savings Cost Net effect 81,898.83 8,639.80 73,259.03

Backup system
463,933.53 17,279.60 446,653.93

In both cases the amount of savings due to damage reduction or prevention is higher than the cost for the control, i.e. both controls should be implemented from a monetary point of view.

Process risk simulation is a strong means to back up business decisions concerning the risk and performance aspects as well as the control of business processes. Following standard modeling conventions risks and controls may be integrated into business processes and their behavior and effects simulated. For the sake of demonstrating the concepts the examples used above have been kept simple and analysis was mostly focused on the monetary aspects of risks and controls. However process risk simulation is not limited to those areas. The analysis could cover aspects of time and resources as well, for example by dening follow-up activities which are to be carried out once a risk is detected. Such activities can temporarily inuence the process performance. In such a case it can be interesting to track the performance over time and how the system recovers from the occurrence of a risk event. This is where simulation really comes into play. Here it has a clear advantage over calculation methods which can hardly provide information about the dynamic behavior of a process. It should also have become apparent that the usefulness of process simulation grows with the size, complexity and interdependency of the business processes under scrutiny. The more complex business processes are, the more difcult their analysis is and the more simulation becomes preferable as a means to support the analysis. If risks and controls occur in different processes at different times, the effects cannot be determined with simpler methods anymore. The analysis of such complex systems can be supported further by design-of-experiment and optimization capabilities in order to nd the most desirable among different process and risk scenarios.

