Distribution and Loss Functions: Alex Robinson

Distribution and Loss Functions

Alex Robinson

March 9, 2018
Probability Distributions

• A probability distribution links the outcomes of some

random event to the probability that each outcome occurs.
• They come in two flavors: discrete and continuous.
• Discrete distributions are used when there are a finite number
of outcomes. For example, rolling a six sided die.
• When we roll a six sided die, there are six outcomes: rolling a
1, 2, 3, 4, 5, or 6.
Probability Distributions cont.

• Continuous distributions are used when we have an infinite

number of outcomes.
• For example, imagine randomly choosing any real number
(whole numbers, fractions, irrational numbers included)
between 1 and 10.
• There is an equal chance that any given number will be
chosen. However, there are billions upon billions of possible
• So, the probability that any one number (say, 8.47563) will be
chosen is very very small.
Discrete Distributions: Example
• Suppose we randomly chose 20 integers between 1 and 10,
with the frequency that each integer was chosen in the table
Outcome Frequency
1 1
2 2
3 2
4 1
5 4
6 1
7 3
8 2
9 1
10 3
Discrete Distributions: Example
• Assume that each number had an equal probability of being
chosen. Since we chose 20 numbers, this means that if we
have n instances of a given number, the probability assigned
to that number is n/20. Adding to our table, we get:
Outcome Frequency Probability
1 1 0.05
2 2 0.1
3 2 0.1
4 1 0.05
5 4 0.2
6 1 0.05
7 3 0.15
8 2 0.1
9 1 0.05
10 3 0.15
Discrete Distribution Example

• Suppose we place the outcomes from our example into 5

equal bins - 1 and 2, 3 and 4, 5 and 6, and so on. How do we
find the probability that an outcome is in one of these bins?
• For bin one, it is Pr(1) + Pr(2) = 0.05 + 0.1 = 0.15. More
generally, if X is our random event, and we want to know the
probability that it will fall into some set of outcomes, say
{1, 2, . . . , n} then the formula is:
Pr(X = 1, 2, . . . , n) = Pr(X = i)

This is the probability mass of the distribution for the

particular set.
The Probability Mass Function
• If we graph the probability masses of all possible outcomes,
we get the following:

This is the graph of the Probability Mass Function (PMF) for

our example distribution.
The Cumulative Distribution Function

• Related to the PMF is the cumulative distribution function

• The CDF measures the cumulative probability of the
distribution - starting at 0 and ending at a total of 1.
• If we want to know Pr(X < 7), we use the CDF:

Pr(X < 7) = Pr(X = 1, 2) + Pr(X = 3, 4) + Pr(X = 5, 6)

Pr(X < 7) = 0.15 + 0.15 + 0.25 = 0.55

More generally:
Pr(X < x) = Pr(X = i)
Graphing the CDF

If we graph the CDF of our example distribution we get:

Notice that the CDF ranges from 0 to 1 and is upward sloping.

Expected Value
• The expected value of a random event X is the value that
we ”expect” X to be.
• For discrete distributions, the expected value is a weighted
average of the outcomes - with the associated probabilities as
the weights.
• The expected value of our example distribution is

1(0.05) + 2(0.1) + 3(0.1) + 4(0.05) + 5(0.2)+

6(0.05) + 7(0.15) + 8(0.1) + 9(0.05) + 10(0.15) = 5.85

• For many continuous distributions, the expected value of X is
given as the mean parameter.
• For example, if X is normally distributed with mean a (µ) of
1, then E(X ) = 1.
Loss Functions
• Suppose we have a parameter, Q that we do not want
whatever our example distribution (X ) is measuring to exceed.
If we were measuring sales, Q could be our capacity.
• It would be useful to know by how much we expect X to
exceed Q by.
• If X has outcomes 1, . . . , n (imagine an n sided dice), with pi
denoting the probability of event i, the the general formula is
as follows:
E(max(X − Q, 0)) = pi (max(i − Q, 0))
= pi (i − Q)

We need the max(X − Q, 0) since we only want to consider

situations where X exceeds Q.
Loss Functions - Example
• Recall our example discrete distribution from earlier.
• Suppose our Q is 7. We want to know by how much we
expect X to exceed 7 by.
• Using our formula from the previous slide:

E(max(X − 7, 0)) = pi (max(i − 7, 0))
= pi (i − 7)
= p7 (7 − 7) + p8 (8 − 7)
+ p9 (9 − 7) + p10 (10 − 7)
= 1(0.1) + 2(0.05) + 3(0.15)
= 0.65
Continuous Distributions

• Recall that continuous distributions have an infinite number of

• The simplest continuous distribution is the Uniform
• The uniform distribution assigns equal probability to every
outcome in a given interval. So a uniform distribution
between 0 and 10 assigns the same probability to every
number in between 0 and 10, and 0 probability elsewhere.
• Another common continuous distribution is the Normal
distribution. This distribution assigns higher probabilities to
outcomes near the mean - given as a parameter - and lower
probabilities the further away from the mean you get.
Continuous Distributions cont.

• In a uniform distribution, if we assign equal probability to

every number between 0 and 10 being chosen, then the
probability of any one number must be tiny! After all, there
are too many numbers between 0 and 10 to count!
• In fact, if X is a continuous random event, and x is an
outcome, then Pr(X = x) = 0.
• Instead, for continuous distributions (not just uniform), we
think in terms of intervals. So, we might ask what the
probability is that X falls between a and b.
The Probability Density Function

• For discrete distributions, we were able to construct the PMF

dividing outcomes into bins and summing up the probabilities.
• Now, we divide the outcomes into infinitely many, infinitely
small bins.
• The result is a smoothed version of the PMF - called a
probability density function.
• Each point on the PDF represents the probability of that
outcome occurring - relative to all other outcomes.
Example PDFs
The PDF for a uniform distribution between a and b is:

For the normal distribution with a mean of 0 and standard

deviation of 1 it is:
More PDFs

• The real use of PDFs is to figure out Pr(a ≤ X ≤ b) - the

probability that a random event falls within a certain interval.
• Graphically, this is the area under the PDF between the ends
of the interval, a and b (on the x axis).
• Mathematically, this is an integral:
Z b
Pr(a ≤ X ≤ b) = fX (x)dx

where fX (x) gives the probability density at x.

Continuous CDFs
• For discrete distributions we were able to use the PMF to find
the cumulative probability of the distribution in the form of
the CDF.
• The same is possible for continuous distributions - we can use
the PDF to find a ”running” total of probability for the
distribution (also called the CDF).
• Just as for PDFs we thought of adding up an infinite number
of infinitely small bins, we do the same for the CDF.
• Now, instead of just adding up each bin, we take a cumulative
sum over the whole distribution.
• Mathematically, this is once again an integral:
Z x
Pr(X ≤ x) = FX (x) = fX (t)dt
Examples of Continuous CDFs
The CDF for a uniform distribution between a and b is:

The PDF for the normal distribution with a mean of 0 and

standard deviation of 1 is:
Continuous Loss Functions

• We can also define loss functions for continuous distributions.

• Recall that a loss function tells us E(max(X − Q, 0)) for a
continuous random event X and some parameter Q.
• Just like the PDF and CDF, the continuous loss function
formula has an integral rather than a sum:
Z ∞
E(max(X − Q, 0)) = fX (x)(x − Q)dx

• Rather than calculating this by hand, common continuous loss

functions are often evaluated using tables.

