Basics of Statistics

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

Basics of Statistics

Review of Statistics:

What do we have Committee in decision-making process?

In statistics we collect data, then we summarize data and take decision based on the summarize and
sometimes further analysis

→Mean

→Variance & standard deviation

Mean: Mean represents central tendency of the data values, These leads us to the nation “random
variables.”

Suppose you want to invest money in some barrier. This is the only asset you have. If you loss it you will
be ant of the street

A 15 % B 50%

It is granted. Here 60% chance you will loose it all.

You are very rich. You can easily absorbed the loss.

M = (mu) Proportionating mean.


Random variables model the uncertainties that we face before taking a decision which is so very true to
business.

The random variables can be qualitative or quantitative. We are going to study quantitative random
variables.

Quantitative variables are of two types: Discrete variables and Continuous variable

Discrete Variable:

Discrete random variables assume specific values on the number line which could be both fractional and
whole number. The values can be set an ordinal array. It’s countable.

Continuous Variable:

The continuous random variable assumes values from a continuous interval on the number line. By the
very nature of the random variable there are infinite values that the random variable can assume.

Difference between continuous and discrete variable:


Aspects/Nature of random variable:

Whatever kind of random variable we are facing we need to know the two aspects of its nature.

 The set of values it can assume. This is called the sample space and is denoted by S.
 An idea of the likelihood of each value.

Both the piece of information are included in what we call the distribution of random variable.

[0,10] means x ranges from 0 to 10. Both 0 and 10 are included.

(0,10) means x ranges from 0 to 10 but 0 & 10 are not included

(0,10] means x ranges from 0 to 10 but 0 not included

{1,1,2,2,4} means x includes 4 values.


How do we write distribution?

This is an example of tabular distribution. Here f(x) is the probability mass function. In case of discrete
distribution the sum of the probability masses must always exactly 1.0

Example: Casting a fair or unbiased six sided dice

Event: Each value the random variable can take can be called an event. Such as x<3 is an event.
Here 1,1,5 & 2 are less than 3.

The probability of an event is equal to the sum of the probabilities of the outcome that define an event.

P(x<3)=0.1+0.2+0.15=0.45

Information of a distribution:

There are two pieces of summarized information we always need to know about any distribution.

1. Mean
2. Standard deviation

Expected value of X: E(X)

Standard Deviation:
Using Excel:

X F(X) E(X) xi-sum E(X) (xi-sum E(X))^2 f(X)*(xi-sum E(X))^2


1 0.1 0.1 -1.69 2.8561 0.28561
1.5 0.2 0.3 -1.19 1.4161 0.28322
2 0.15 0.3 -0.69 0.4761 0.071415
3 0.05 0.15 0.31 0.0961 0.004805
3.2 0.2 0.64 0.51 0.2601 0.05202
4 0.3 1.2 1.31 1.7161 0.51483
Varianc
    2.69     1.2119 e

Varianc
e 1.2119
1.10086
SD 3
Cumulative Probability:
We are mostly going to work with normal distribution.

 It’s a continuous distribution


 It is bell shaped
 Its perfectly symmetric about the mean.

From the curve we see that ő1< ő2

How to write to state a normal distribution:

xeN(µ, ő)

It means x is distributed normally with a mean and standard deviation.

xeN(100,25)

x is distributed normally with a mean of 100 and standard deviation of 25.

Two questions:
1. How to compute or determine probabilities associated with normal distribution?
2. If we already know the probability how to trace the value of random variable?

Remember in a continuous distribution, f(x) is a mathematical function called probability density


function. For discrete distribution it is called probability mass function.

The curve is called probability density function (PDF) curve.

How do we calculate probability?

P(x=a)=0 in continuous distribution where a is a specific value from the sample space.

In case of continuous distribution

P(x=18)=0

P(x=12)=0

( as in continuous distribution we have infinite samples)

P(x=a) is to be read as probability of x to be exactly equal to a

The sum of probability of all outcome is equal to 1.

In case of continuous distribution we normally calculate the probability of intervals.


Z score:

It is the distance of random variable from the mean in units of SD.

Standard Normal Distribution:

We can find Z(x) for any X of a distribution.

In the case of normal distribution Z(x) associated with if is

 Always normally distributed


 It always has a mean of zero
 It always has a SD of one

Z is actually called standard normal distribution, All other distributions are arbitrary distribution but they
can be converted to standard normal distribution.
Standard normal distribution has a mean of 0 and SD of 1

Sum of independent random variables:


Consider three processes, First is admission application, Second is summary of application s and finally
decision making.

What is the distribution of total production time per unit? T

Another question:

Labor cost for P1, W1=$20/hour for P2, W2=$40/hour for P3, W3=$60/hour

What is the distribution of per unit labor cost?

Here we are considering time as independent variable.

How do we determine that two random variables are independent?

Such as exploring relationship between taking coffee and GPA

Consider X= no of coffee cups taken per day

Y=GPA.

Covariance (x,y)=
∑ (xi−µx )( yi−µy) will be zero if two variables are independent, otherwise it will
n
have positive or negative value.

Positive value means direct relationship between x and y, and negative value means inverse
relationship.

With the assumption of independence we want the following results.

 T is normally distributed
 The mean or expected value of T, E(T)=E(t1)+E(t2)+E(t3)
=60+120+180=360 minutes

As T is the sum of random variable t1, t2 & t3, so the mean or expected value of (T) will be sum of mean
of t1, t2 & t3.

Now we have to determine standard deviation but for calculating SD we need to calculate variance first.

Var(T)=Var(t1)+Var(t2)+Var(t3)

=152+302+602=225+900+3600=4725 min2
Standard Deviation ő T =√ VAR(T )=√ 4725 min2 =68.74 min

The answer is

Now what is the chance that a product will take more than 7 hours to produce?

Now 7 hours=420 mins

Question is P(T)>420 mins

=1-NORM.DIST(420,360,68.74,1)

=0.191371

Note:

For calculating P(T) less than value need to use norm.dist

More than value=1-norm.dist

Solution for distribution of labor cost:

W=labor cost at P1+Labor cost at P2+Labor cost at P3

t1 t2 t3
=W1* + W2* + W3*
60 60 60
t1 t2 t3
= 20* + 40* + 60*
60 60 60
t 1 2t 2
= + +t3
3 3
Assumption

1. W is normally distributed
2. E(W)=(1/3)E(t1)+(2/3)E(t2)=E(t3)
=(60/3)+(2*120/3)+180
=20+80+180=$280

Now we need to calculate standard deviation and for calculating this we need to calculate
variance
Var(W)=

=((1/3)*15)^2+((2/3)*30)^2+(60)^2

=(5)^2+(20)^2+(60)^2

=25+400+3600

=4025

So SD ő=√ 4025 =63.44

So the answer is

Summarizing Theory:

Here x is considered normally distributed and a is the coefficient for every i

Then the below assumption holds

1. Y is normally distributed
=(a1őx1)^2+(a2őx2)^2+(a3őx3)^2

means for every value of i

You might also like