Sim Excel Intro

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Introduction to Simulation: Random Variables and Modeling in Excel without Add-Ins

OBJECTIVES
The objective of this note is to introduce the notion of simulation using Excel or a similar tool. We will cover how to generate random numbers from either a continuous or discrete distribution. We will also discuss the formulation of simulation models for simple problems. Excel tools discussed: o Model Building: Using parameters, and Cell names o Simple Excel Functions: If, Max, Min, Vlookup, Rand, Randbetween Examples: A Simple Newsvendor Problem A Single-Server Queue Managerial Issue: o Measurement, depiction and management of variability sequential order of calculations in spreadsheet model. However, for many problems, it is more than adequate.

GENERATING RANDOM NUMBERS


One of the most important aspects of a spreadsheet simulation model is the generator used for simulating a pseudo random number. While it is impossible to generate a truly random number, using combinatorial techniques from mathematics, a sequence of numbers can be generated in such a way that it is virtually indiscernible from a truly random sequence. The key idea is that we only have to generate a random number uniformly distributed between 0 and 1, often represented as U(0,1). Key Idea A discrete random variable X can be described by a probability mass function (pmf). This can be viewed simply as a table which depicts the probability of that X will take a realized value x X x . Some familiar discrete distributions include Bernoulli, geometric, Poisson, Binomial, Hyper-geometric, etc. We are especially interested in the cumulative distribution function (cdf) of X, which is the probability that X takes the value X x. The cdf is always a probability so in its normalized form it is always between 0 and 1. In a similar way, a continuously distributed variable X is described by a probability density function (pdf) rather than a pmf. The technical reason for this distinction is that a density is not directly interpretable as a probability. Nevertheless, in this case also
Simulation Intro w/ Examples Page 1 of 7

SIMULATION

Simulation is a descriptive methodology for analyzing systems having one or more random components. With this technique, one builds a computer model that simulates the random components so that the modeler can investigate the systems performance in greater detail. Often, the goal is to determine how potential changes to the systems controllable parameters would impact the systems overall performance. In many cases, there is no clear optimal decision, but rather a variety of plausible alternatives. The level of modeling sophistication one can achieve in a spreadsheet is limited due to the
1

September 2010. Chester Chambers prepared this note solely as a basis for classroom discussion. Please do not duplicate without permission.

_____________________________________________________________________________

the cdf exists and is analogous to the discrete case. All simulations generate pseudo random variables by exploiting the fact that the cdf of a random number is always uniformly distributed. Thus, the random number corresponds to the realized cdf which is inverted to generate the value of X that uniquely corresponds to the generated value of the cdf. As an illustration, suppose a variable X takes the value 0 with probability .4 and the value 1 with probability .6. Thus the cdf(0)=.4 and cdf(1)=1.0. If the random number takes a realized value of .312, it can be imputed to infer that this was generated by a value of X = 0; similarly, if the random number takes a value of .542, one would infer that the realized value of X = 1 generated this result. It should be intuitive that using this scheme, roughly 40% of the time a value of X=0 would be realized and 60% of the time the value of X=1 will be realized. We will defer till later the question of what were to happen if the value precisely equal to .4 is generated as a random number. Using Excel Numbers to generate Random

which describes processing times at a local business. Processing Time 2 (Min.) 4 (Min.) 6 (Min.) Probability .4 .5 .1

Building this distribution in Excel can be done using the built-in random number generator RAND(). We want to generate the value 2 40% of the time, the value 4 50% of the time, and the value 6 10% of the time. This is done by associating portions of the interval [0,1] with the values 2, 4, and 6. Conceptually, we want to map intervals to numbers as shown below: 2 4 6

.4

.9 1

Excel has a single random number generator RAND() and it generates values uniformly on [0,1] (sometimes written as U[0,1]). We generally need numbers generated from other distributions, and so we need a method for building them. There are two cases to consider: the continuous case and the discrete case.

There are multiple approaches to accomplish this task. Let us consider a few for the sake of illustration. One approach involves a Nested If statement. This is a statement in which one of the choices based on an If statement is another If statement. For this case we can use, =IF(Rand()<0.9,IF(A1<0.4,2,4),6) The logic is that Excel generates a random number uniformly distributed on [0,1]. If that number is greater than or equal to 0.9, Excel returns the value 6 because the first comparison in the If statement yields a value of False. If that value had been true, then Excel would compare the random number to 0.4. If the number is both below 0.9 and 0.4, then surely it is below 0.4 and Excel returns the value of 2. If the first
Simulation Intro w/ Examples Page 2 of 7

Generating Random Numbers from a Discrete Distribution


Suppose we want to generate random numbers that follow a discrete probability distribution like the one shown below,

_____________________________________________________________________________

comparison yields a value of true and second yields a value of False then the number is below 0.9 but above 0.4 and Excel returns a value of 4. Sometimes it is useful to use a single random number in multiple locations. If this is the case we can enter =Rand() in Cell A1 and use the following formula to generate the relevant processing time. =2*IF(A1>0,1,0) + +2*IF(A1>0.9,1,0) 2*IF(A1>0.4,1,0)

=VLOOKUP(RAND(),$A$1:$B$3,2) (placed in any free cell) instructs Excel to lookup the value generated by RAND() in the first column of the table $A$1:$B$3 and return a value from the second column of the table. The value returned from the second column depends on which row the lookup value falls into in the first column. By means of an example, suppose RAND() generates the value .334. This value is bigger than 0 but less than .4. Excel interprets this value as falling in the first row of the table (the highest numbered row where the breakpoint is less than the lookup value). The value returned by VLOOKUP is therefore from the first row of the second column, namely 2. A key benefit of this approach is that it is easily extended to settings with more than 3 rows. Imagine writing a nested IF statement using 10 values instead of 3. You will quickly see that the Vlookup function is far easier to follow and to write. A description of how to use the RAND() function to generate samples from a wide range of distributions is contained in the Excel file ProbabilityDistributions.xls which is included on the course Blackboard page.

Yet a third approach is to use a Lookup function such as, VLOOKUP(Rand(), table, column) The logic is that Excel can look up the value of Rand() in the first column of a table and return an appropriate number (2, 4, or 6) from a specific column in the same table. The table we need will have 2 columns. In the first column we place the cumulative probabilities 0, .4, and .9 (you dont need the value 1). The values in the tables first column must be in ascending order. In the second column you will place the values 2, 4, and 6. It is important to note that this values are lagged by one. So cdf(2)=.4 is associated with the value 4 in the table. The reason is that Excel uses the If < command in Vlookup. So the value of .4 will return the value 4. The final lookup table will look like: (Row 1) (Row 2) (Row 3) 0 .4 .9 2 4 6

Generating Random Numbers for a Continuous Probability Distribution


Lets begin by building a random number generator for the standard normal distribution. This case is representative of how we build a random number generator for any continuous distribution.

Lets suppose the six table numbers shown above are located in cells $A$1:$B$3. Then the command,

For any distribution whose density f(x) is known, you can compute its cumulative distribution function (cdf) denoted by F(s). For each value s, F(s) represents the _____________________________________________________________________________
Simulation Intro w/ Examples Page 3 of 7

probability of obtaining an outcome less than or equal to the value s. For the standard normal distribution, we have the following picture of the density and its cdf:

cdf for short) of the standard normal to the 19 uniformly spaced values .05, .10, .15,.,.95. Effectively, we have taken a collection of Uniformly distributed values and passed them through the CDF of the Standard-Normal distribution to generate values that fit that distribution. Using a parallel approach, we could take these 19 values and pass them through any continuous CDF to generate values that fit the distribution of choice. You will find that many useful statistical distributions have inverse cumulative distribution functions built into Excel. For example, the inverse cdf for the standard normal is NORMSINV(). But as noted earlier, applying the inverse cdf to the regularly spaced points .05, .10, ..95 generates normal values (the dots) that are not quite random. To make these points truly random, we apply the inverse cdf to randomly generated values on [0,1], i.e., values that come from U[0,1]. This can be done using the built-in random number generator RAND(). Using RAND() ensures the horizontal lines have some random variability; the inverse cdf ensures these are transformed into random values from the desired distribution. For example, the random number generator for the standard normal distribution is =NORMSINV(RAND()). Result: Building a random number generator for any continuous probability distribution can be accomplished by applying the distributions inverse cdf to the computers random number generator for U[0,1] (which in Excel is the RAND() function).2

In the picture above, we have divided the range of F(s) (i.e., the interval [0,1] on the y axis) into 20 uniform subintervals of length .05. For each of the corresponding 19 break-points .05, .10, .15, ., .95 along the y-axis (these are the heights of the horizontal lines), determine where the corresponding horizontal line intersects the cdf, and drop an imaginary perpendicular from that intersection point down to the xaxis. Mark the spot on the x-axis where the perpendicular hits using a clearly visible dot. Repeat the process for each of the 19 horizontal lines, and generate 19 dots along the x-axis. A careful analysis of the dots generated along the x-axis should convince you that these points fall in a pattern prescribed by the distributions density (in this case, the standard normal). These dots are literally too regular to be random, but we will fix that in a moment. The dot procedure we have described above is equivalent to applying the inverse cumulative distribution function (inverse

If you wish to convince yourself that this works, generate 1000 values using =Normsinv(Rand()) and create a histogram using the Data Analysis Toolpak
Simulation Intro w/ Examples Page 4 of 7

_____________________________________________________________________________

Example 1: The Newsvendor Model The newsvendor model which is a classical problem in decision analysis can be illustrated using a simple example. Consider a setting in which every day, by 5:00 am the newsvendor must decide how many copies of the daily Newspaper to buy for 10 cents each. The newsvendor is only interested in buying the papers because he can sell them later that day for 25 cents each. Unsold copies have no value and so are discarded, while unmet demand is lost forever. The newsvendor has collected the following data on potential demand which is uncertain: P(0) = Prob {Demand = 0} = 1/10 P(1) = Prob {Demand = 1} = 3/10 P(2) = Prob {Demand = 2} = 4/10 P(3) = Prob {Demand = 3} = 2/10 The following observations are intuitive: 1) if no newspapers are purchased the actual and expected profit per day is zero cents; 2) 0, 1, 2, or 3 newspapers should be bought on any day; 3) the quantity sold cannot exceed the random demand nor can it exceed the order quantity; and 4) the profit on a given day is the random sales times 25 cents less the order quantity times 10 cents. To build a simulation model for this newsvendor, we have to develop a discrete random number generator; and, then link it to a table that relates demand to sales to obtain profit. We may obtain a simulated distribution of profit for every admissible order quantity. Then, we can choose that order quantity which has the most appealing profile of profit.

In the companion spreadsheet Newsvendor.xls a simulation model is presented. A section of that model is replicated below. It shows that if 2 newspapers are purchased every morning, the expected profit will be about 17.5 cents per day. One can also use the sheet to uncover that the expected profit if 1 or 3 newspapers are bought is about 12.5 cents.

It is important to recognize two key points about this simple model. First, the logic is quite simple. We consider what happens over a single period: in this case a day. Next we simulate a large number of days, and look at the distribution of outcomes. Finally, we make decisions, taking the distributions of outcomes under each of the possible decisions into account. The second key point is that we can never forget these are random events. The spreadsheet segment shown above shows the Average Profit after 100 and after 10,000 iterations. Just because the long run expected value is 17.5 does not mean that each day (or any day) will produce exactly that outcome. In this case looking at the first 100 days shows a significantly different average profit level that that seen when a much larger sample is taken. The distinction between an expected value and a single realization of an outcome can never be overlooked.

add-in. You will see that the resulting histogram looks Normal.

_____________________________________________________________________________
Simulation Intro w/ Examples Page 5 of 7

Example 2: The M/M/1 Single Server Queue In the newsvendor simulation it is convenient to define a period as a single day. This works because yesterdays decisions have no impact today and todays decisions have no carry over to tomorrow. Each period of analysis was independent. In queuing models, we typically consider customers who arrive sequentially and are served by a single provider. If the times between arrivals and service times are random variables, the start of service for the next customer that arrives may be delayed if the system is still serving a customer that arrived earlier. In other words, successive discrete arrival and service epochs are linked across time so the logic of the simulation must account for this. In these types of settings, we can view a customer arriving to or leaving from the system as a discrete event. The logic of the model is built around these events. This is commonly labeled as Discrete Event Simulation. In this classical problem, the times between arrivals, or Inter-Arrival Times as well as the Service Times are assumed to have an exponential distribution. When only one server is considered, this single-server queueing system is often designated as an M/M/1 System. Inter-Arrival Times are exponentially distributed / Service Times are exponentially distributed / one server is present. To make the problem specific consider a setting in which, on average, one customer arrives per minute, and the server is capable of serving 4/3 of a customer per minute. Equivalently, we could say that the average Inter-Arrival Time is 1 minute and the average Service Time is minutes. Intuition, and common sense suggests that in such settings the server will be busy 75% of

the time on the average. As long as all times involved are exponentially distributed, this intuition is correct. The cdf of a random variable that has an exponential distribution is given by the formula:

F x 1 erx ,
where r is known at the rate of x. Note that the mean of X is 1/r. Let y be the realization of the random variable Y that is a uniformly distributed random number. Thus to invert the cdf to get x, we must solve the equation:
y 1 e rx , or 1 y e rx .

You should recall that Ln e rx rx where Ln refers to the natural logarithm. After rearranging terms it follows that,

x Ln 1 y r .
We know that x will be positive because the log of a value between 0 and 1 is always negative, thus Ln(1-y) will be positive. Given this inversion, the simulation model shown in TransientMM1.xls has been developed. The snapshot of the model that follows may be used to explain how events are dependent on each other.

The system is assumed to be empty at the start of the simulation. Consequently the first Inter-Arrival Time is also the first Arrival Time. Since the system is empty, this customer receives service immediately and we record that the server will next become available when this service is
Simulation Intro w/ Examples Page 6 of 7

_____________________________________________________________________________

completed. This time is calculated as the first arrival time plus the service time. The Arrival Time for Customer 2 is the Arrival Time for customer 1 plus the second Inter-Arrival Time. In this particular instance, the second customer arrives before service is completed for the first customer and so must wait. The server becomes available to handle Customer 2 only after completing the service for Customer 1. Thus the Available Time for Customer 2 is the completion time for Customer 1. Speaking more generally, the Available Time for customer n will always be the Completion time for customer n-1. In this way, the customers are connected. The period of analysis is the time between arrivals. Using this as the basic building block, we can consider as many customers as we like and look at statistics about this system such as average waiting times, and the average number of customers waiting in line. Since the server is busy 75% of the time, 25% of arriving customers, including Customer 4 (in this case) find that the system is empty when they arrive. After these data are collected over a long period of time, we know that the system will become stable and give an average wait time of roughly 2.360 units. But reaching such a Steady State can take quite a few arrivals. Thus, it is entirely possible that steady state is not attained before the system under consideration ceases to operate. Such would be the case in a clinic that opens at 8:00 am but closes for lunch at noon. More details of this model will be presented in class.

material shown in class, students should develop an appreciation for at least 4 key points. Many problem settings are not easily formulated as optimization problems with fixed parameters, objectives, and constraints. This is particularly true when variability is a major component of the problem setting. Creating a model that behaves as the system behaves, including its variable elements can be a useful tool to develop an understanding of what is going on. This model can then be used to inform decision makers about system potential or improvements. If possible parameter values can be described using distributions, we can use simple techniques to generate values that mimic the distribution of those parameters. Excel provides several tools useful for doing this. While spreadsheets still have major limitations, we can use their ability to generate values from relevant distributions, and to replicate relationships among terms to generate result that describe the behavior of complex systems. This is particularly valuable when dealing with a system that does not lend itself well to more traditional forms of modeling.

TAKE - AWAYS
After considering the comments made above, the examples discussed in class, the reading from the text, and the lecture _____________________________________________________________________________
Simulation Intro w/ Examples Page 7 of 7

You might also like