Professional Documents
Culture Documents
Statistical and Probability Tools For Cost Engineering
Statistical and Probability Tools For Cost Engineering
Statistical and Probability Tools For Cost Engineering
Statistics is the field of study where data are collected for the purpose of
drawing conclusions and making inferences. Descriptive statistics is the
summarization and description of data, and inferential statistics is the
estimation, prediction, and/ or generalization about the population based on
the data from a sample.
Median: Median is the middle number when the data observations are
arranged in ascending or descending order. If the number n of
measurements is even, the median is the average of the two middle
measurements in the ranking. For symmetric data set, the mean equals to
the median. If the median is less than the mean, the data set is skewed to
the right. If the median is greater than the mean, the data set is skewed to
the left.
1| Pa g e
Mode: Mode is the measurement that occurs most often in the data set. If
the observations have two modes, the data set is said to have a bimodal
distribution. When the data set is multi-modal, the mode(s) is no longer a
viable measure of the central tendency. In a large data set, the modal class
is the class containing the largest frequency. The simplest way to define the
mode will then be the midpoint of the modal class.
2| Pa g e
Comparison of the Mean, Median, and Mode
The mean is the most commonly used measure of central location.
However, it is affected by extreme values. For example, the high incomes of
a few employees will influence the mean income of a small company. Under
such situation, the median maybe a better measure of central tendency.
The median is of most value in describing large data sets. It is often used in
reporting salaries, ages, sale prices, and test scores. The mode is frequently
applied in marketing.
Measures of central tendency do not describe the spread of the data set,
which may be of greater interest to the decision-maker.
Range: The difference between the largest and the smallest values of the
data set. The range only uses the two extreme values and ignores the rest
of the data set. One instinctive attempt to measure the dispersion would be
to find the deviation of each value from the mean and then calculate the
average of these deviations. One will find that this value is always zero, an
answer which is no accident. The alternative might be to calculate the
average absolute deviation. However, this measure is rarely used because it
is difficult to handle algebraically and does not have the nice mathematical
properties possessed by the variance.
3| Pa g e
Standard Deviation: The positive square root of the variance. The
population standard deviation is denoted by . The sample standard
deviation is denoted by s.
4| Pa g e
Discrete Random Variables
Binomial Distribution
The name binomial arises from the fact that the probabilities p(x), x = 0, 1,
2 n, are terms of the binomial expansion, (q+p)n.
5| Pa g e
graphic form of f(x) is a smooth curve and the area under the curve
corresponds to probabilities for x. For example, the area A beneath the
curve between the two points "a" and "b" is the probability Pr (a<x<b).
Because there is no area over a point, the probability associated with any
particular value of x, say, x= a, is equal to zero. Pr (a<x<b)= Pr (a x b) or
in other words, the probability is the same regardless of whether the
endpoints of the intervals are included. The total area under the curve,
which is the total probability for x, equals to 1.
The areas under most probability density functions are obtained by the use
of calculus or other numerical methods. This is often a difficult procedure.
However, as with commonly used discrete probability distributions, there
are tables exist for finding probabilities under commonly used continuous
probability distributions.
6| Pa g e
Figure 2.2 Two Normal Distributions
The graph of a normal distribution is called a normal curve and it has the
following characteristics:
7| Pa g e
2.3 Monte Carlo Simulation
8| Pa g e
Develop a frequency distribution from these n-values. This
distribution is the simulated (i.e., empirical) distribution of total
system cost.
9| Pa g e
Monte Carlo Simulation - Crystal Ball - Examples
Initial situation
a product is to be developed, tested and introduced into the market. To
examine the PROFIT & LOSS potential of the product, the following model
has been constructed:
Model
What- if scenarios
we have seen that if our model follows our expectations, we will achieve a
profit of $1 200 000. However, this is a very unstable reading as the
following example demonstrates:
10 | P a g e
If we were to assume that the product could not sell 400000 units but only
300 000, the product would no longer achieve a profit but a loss of
$600 000. This demonstrates how changing only one variable can result in
the product never even reaching the developing stage. A classical what-if
analysis will help us to understand the bounds of our PROFIT & LOSS value
by supplying a worst, probable and best case value to our model.
11 | P a g e
After examining this result table, we have not reached a conclusion but
merely gathered more information, allowing us to depict the range of profit
and loss for our product. We do not know the probability of any given value,
nor do we know what exact PROFIT & LOSS we can expect.
Assigning a distribution
Let us now concentrate on each variable separately, starting with the
testing costs. We can assume that the research costs will vary between $1.3
(million) and $2.3 (million). We can also assume that each value will
emerge with the same probability. Graphically depicted, our assumptions
would look like this:
12 | P a g e
As all values are equally likely to appear this distribution is called: Uniform
distribution.
Running a simulation
If we were to assume each of our variable cost (research, production,
shipping and wages) were uniformly distributed, and our marketing strategy
costs were normally distributed, we could set a minimum and maximum
range for each, using our worst and best case examples from before. Let us
also assume that the amount of units sold as well as the unit price, are also
uniformly distributed between the worst and best case examples. The
PROFIT & LOSS value, which is a calculated value, we will define as our
forecast cell.
13 | P a g e
After defining all variables (assumptions and forecasts) we are ready to
perform a simulation. For every simulation, a number will be picked
randomly for each assumption cell, from within the range specified and with
the distribution specified. From these assumptions, the forecast cell will be
calculated and the result graphically plotted, thus producing a graph
showing each result achieved in the simulation and the probability of
reaching this result.
14 | P a g e
This chart shows the PROFIT & LOSS forecast achieved in the 100000 trials
specified. On the horizontal axis we see the PROFIT & LOSS expected. On
the vertical axis we see the probability with which a certain PROFIT & LOSS
is expected. At the bottom of the windows, we are shown that with 100%
certainty, the total will be within the minimum range of negative infinity and
the maximum range of infinity. If we want to know the probability of
achieving a profit we can change the lower bound of our range to 0. From
this diagram we can read that the certainty of achieving a profit is 25.25%.
It is now not likely that this product will be produced.
From this sensitivity chart, we can see that the number of units sold has the
greatest effect on our business model and that our production costs will
have a relatively small effect on our final PROFIT & LOSS.
15 | P a g e
2.4 References
1. AACE International. 2004. Skills & Knowledge of Cost Engineering. 5th
edition Section 7.
16 | P a g e