Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 48

1

Business
Analytics
2

Objectives:
 Statistical learning including
quantitative, qualitative analysis
techniques
 Predictive Analytics using linear,

polynomial and logistic regression


techniques and model comparison
 The use of the above analysis and

visualization to aid decision making


3

Content:
 Business Analytics - Introduction
 Statistical Methods for Business Analytics
 Basics of Hypothesis Testing
 Correlation and Regression
 Multiple Linear Regression
 Model Comparison and Performance
 Classification
 Time Series Analysis
4

Till now…
5

All Together:
Mode: 24
5 Numbers: 14-21-23-25-30

SD: 3.235

Skewness: -0.329

N: 75

Mean: 22.8

Median: 23
6

Introduction
• What is Business Analytics
• Types of Data Analysis: Descriptive, Predictive
and Prescriptive
• Big Data Analytics – Volume , Velocity, Variety
• Data Mining
• Data Visualization
• Data Analytics Lifecycle
• Business Intelligence vs. Data Science
7

Summary
1) Functions and its Variables
2) Statistical Learning
3) Estimating Function
4) Purpose of Estimating Function:
• Inferences
• Predictions
5) Methods to Estimate f
• Parametric
• Non-parametric
6) Prediction Accuracy vs Interpretability
7) Supervised vs Unsupervised Learning
8

Interval Estimates
9
10
11
12
13

Sampling Distribution

Sampling
distribution
of x

/2 1 -  of all /2


x values


x
z /2  x z /2  x
14

Margin of Error
A point estimator cannot be expected to provide the
exact value of the population parameter.

An interval estimate can be computed by adding and


subtracting a margin of error to the point estimate.
Point Estimate +/ Margin of Error

x  Margin of Error

The purpose of an interval estimate is to provide


information about how close the point estimate is to
the value of the parameter.
15

Interval Estimate - Population Known

Sampling
distribution
of x

/2 1 -  of all /2


x values

x 
z /2  x z /2  x
16

Interval Estimate - Population Known


Sampling
distribution
of x
1 -  of all
/2 /2
x values
interval
does not x
 interval
include  z /2  x z /2  x includes 
[------------------------- x -------------------------]
[------------------------- x -------------------------]
[------------------------- x -------------------------]
17

Interval Estimate - Population Known


Interval Estimate of

x  z /2
n

where: x is the sample mean


1 - is the confidence coefficient
z/2 is the z value providing an area of
/2 in the upper tail of the standard
normal probability distribution
 is the population standard deviation
n is the sample size
18

Interval Estimate - Population Known


Discount Sounds has retail outlets throughout the United
States. The firm is evaluating a potential location for a new
outlet, based in part, on the mean annual income of the
individuals in the marketing area of the new location.
A sample of size n = 36 was taken; the sample mean income is
$41,100. The population is not believed to be highly skewed.
The population standard deviation is estimated to be $4,500,
and the confidence coefficient to be used in the interval
estimate is 0.95.

$41,100 + $1,470
or
$39,630 to $42,570
19

Command:
CONFIDENCE.NORM(alpha,standard_dev,size)
•Alpha     Required. The significance level used to compute the confidence
level. The confidence level equals 100*(1 - alpha)%, or in other words, an
alpha of 0.05 indicates a 95 percent confidence level.
•Standard_dev     Required. The population standard deviation for the data
range and is assumed to be known.
•Size     Required. The sample size.

Sample Average: 41100


Sample Size: 36
Population Std Deviation: 4500
At 95 % Level of Confidence
20

Command:
CONFIDENCE.NORM(alpha,standard_dev,size)
•Alpha     Required. The significance level used to compute the confidence
level. The confidence level equals 100*(1 - alpha)%, or in other words, an
alpha of 0.05 indicates a 95 percent confidence level.
•Standard_dev     Required. The population standard deviation for the data
range and is assumed to be known.
•Size     Required. The sample size.

Interval Estimate is
x ± CONFIDENCE.NORM
21

Interval Estimate: Population Unknown


Interval Estimate
s
x  t /2
n

where: 1 - = the confidence coefficient


t/2 = the t value providing an area of /2
in the upper tail of a t distribution
with n - 1 degrees of freedom
s = the sample standard deviation
22

Interval Estimate: Population Unknown


A reporter for a student newspaper is writing an article on the cost of off-
campus housing. A sample of 16 one-bedroom apartments within a half-
mile of campus resulted in a sample mean of $750 per month and a sample
standard deviation of $55.

Let us provide a 95% confidence interval estimate of the mean rent per
month for the population of one- bedroom efficiency apartments within a
half-mile of campus. We will assume this population to be normally
distributed.

At 95% confidence,  = 0.05, and /2 = 0.025.


t0.025 is based on n - 1 = 15 degrees of freedom.
In the t distribution table we see that t0.025 = 2.131.
23

Interval Estimate: Population Unknown


A reporter for a student newspaper is writing an article on the cost of off-
campus housing. A sample of 16 one-bedroom apartments within a half-
mile of campus resulted in a sample mean of $750 per month and a sample
standard deviation of $55.

Let us provide a 95% confidence interval estimate of the mean rent per
month for the population of one- bedroom efficiency apartments within a
half-mile of campus. We will assume this population to be normally
distributed.
24

Interval Estimate: Population Unknown


Interval Estimate
s
x  t.025 Margin
n of Error
55
750  2.131  750  29.30
16

We are 95% confident that the mean rent per month


for the population of one-bedroom apartments within
a half-mile of campus is between $720.70 and $779.30.
25

Command:
CONFIDENCE.T(alpha,standard_dev,size)
Alpha     Required. The significance level used to compute the confidence
level. The confidence level equals 100*(1 - alpha)%, or in other words, an
alpha of 0.05 indicates a 95 percent confidence level.
Standard_dev     Required. The sample standard deviation for the data range.
Size     Required. The sample size.

Sample Average: 750


Sample Size: 16
Sample Std Deviation: 55
At 95 % Level of Confidence
26

Command:
CONFIDENCE.T(alpha,standard_dev,size)
Alpha     Required. The significance level used to compute the confidence
level. The confidence level equals 100*(1 - alpha)%, or in other words, an
alpha of 0.05 indicates a 95 percent confidence level.
Standard_dev     Required. The sample standard deviation for the data range.
Size     Required. The sample size.
27

Interval Estimate: Process


Can the
Yes population standard No
deviation  be assumed
known ?

Use the sample


standard deviation
s to estimate

Use  Use s
x  z /2 x  t /2
n n
30

Hypothesis Test: Null and Alternate

H 0 :   0 H 0 :   0 H 0 :   0
H a :   0 H a :   0 H a :   0

One-tailed One-tailed Two-tailed


(lower-tail) (upper-tail)
31

Hypothesis Test: Steps


Step 1. Develop the null and alternative hypotheses.
Step 2. Specify the level of significance .
Step 3. Collect the sample data and compute the
value of the test statistic.

p-Value Approach
Step 4. Use the value of the test statistic to compute
the p-value.
Step 5. Reject H0 if p-value < .
32

Hypothesis Test: Steps


Step 1. Develop the null and alternative hypotheses.
Step 2. Specify the level of significance .
Step 3. Collect the sample data and compute the
value of the test statistic.

Critical Value Approach


Step 4. Use the level of significance to determine the
critical value and the rejection rule.
Step 5. Use the value of the test statistic and the
rejection rule to determine whether to reject
H0.
33

Hypothesis Test: Exercise


The response times for a random sample of 40 medical
emergencies were tabulated. The sample mean is 13.25
minutes. The population standard deviation is believed to be
3.2 minutes.
The EMS director wants to perform a hypothesis test, with a .
05 level of significance, to determine whether the service goal
of 12 minutes or less is being achieved.
H0: 
Ha:
x   13.25  12
z   2.47
 / n 3.2 / 40
For z = 2.47, cumulative probability = 0.9932.
p–value = 1  0.9932 = 0.0068
34

Hypothesis Test: Exercise


The response times for a random sample of 40 medical
emergencies were tabulated. The sample mean is 13.25
minutes. The population standard deviation is believed to be
3.2 minutes.
The EMS director wants to perform a hypothesis test, with a .
05 level of significance, to determine whether the service goal
of 12 minutes or less is being achieved.
H0: 
Ha:
x   13.25  12
z   2.47
 / n 3.2 / 40
For  = .05, z.05 = 1.645
Reject H0 if z > 1.645
35

Hypothesis Test: Exercise


The response times for a random sample of 40 medical
emergencies were tabulated. The sample mean is 13.25
minutes. The population standard deviation is believed to be
3.2 minutes.
The EMS director wants to perform a hypothesis test, with a .
05 level of significance, to determine whether the service goal
of 12 minutes or less is being achieved.
36

There is sufficient statistical evidence to infer


that Metro EMS is not meeting the response
goal of 12 minutes.
37

Command:
Z.TEST(array,x,[sigma])

•Array     Required. The array or range of data against which to


test x.
•x     Required. The value to test.
•Sigma     Optional. The population (known) standard deviation.
If omitted, the sample standard deviation is used.

Sample Average: 13.25


Population Std Deviation: 3.2
At 95 % Level of Confidence
Ho: µ ≥ 12
Ha: µ < 12
38

Always gives p value to the


Command: RH side of sample mean
Z.TEST(array,x,[sigma])

•Array     Required. The array or range of data against which to


test x.
•x     Required. The value to test.
•Sigma     Optional. The population (known) standard deviation.
If omitted, the sample standard deviation is used.

Sample Average: 10.75


Population Std Deviation: 3.2
At 95 % Level of Confidence
39

Hypothesis Test: Two Tailed Hypothesis


Test
1. Compute the value of the test statistic z.
2. If z is in the upper tail (z > 0), compute the
probability that z is greater than or equal to the
value of the test statistic. If z is in the lower tail
(z < 0), compute the probability that z is less than or
equal to the value of the test statistic.
3. Double the tail area obtained in step 2 to obtain
the p –value.
The rejection rule:
Reject H0 if the p-value <  .
H 0 :   0
H a :   0
40

Hypothesis Test: Two Tailed Hypothesis


Test
1. Compute the value of the test statistic z.
2. The critical values will occur in both the lower and
upper tails of the standard normal curve.
3. Use the standard normal probability distribution
table to find z/2 (the z-value with an area of /2 in
the upper tail of the distribution).
The rejection rule is:
Reject H0 if z < -z/2 or z > z/2.

H 0 :   0
H a :   0
41

Hypothesis Test: Exercise


The production line for Glow toothpaste is designed
to fill tubes with a mean weight of 6 oz. Periodically,
a sample of 30 tubes will be selected in order to
check the filling process.
Quality assurance procedures call for the
continuation of the filling process if the sample
results are consistent with the assumption that the
mean filling weight for the population of toothpaste
tubes is 6 oz.; otherwise the process will be adjusted.
Assume that a sample of 30 toothpaste tubes
provides a sample mean of 6.1 oz. The population
standard deviation is believed to be 0.2 oz. Perform
a hypothesis test, at the 0.03 level of significance, to
help determine whether the filling process should
continue operating or be stopped and corrected.
42

Hypothesis Test: Exercise


The production line for Glow toothpaste is designed to fill
tubes with a mean weight of 6 oz.
Assume that a sample of 30 toothpaste tubes provides a sample
mean of 6.1 oz. The population standard deviation is believed
to be 0.2 oz. Perform a hypothesis test, at the 0.03 level of
significance, to help determine whether the filling process
should continue operating or be stopped and corrected.
H0:   = .03
Ha: ≠
x  0 6.1  6
z   2.74
 / n .2 / 30
For /2 = 0.03/2 = 0.015, z.015 = 2.17
Reject H0 if z < -2.17 or z > 2.17
43

Hypothesis Test: Exercise


The production line for Glow toothpaste is designed to fill
tubes with a mean weight of 6 oz.
Assume that a sample of 30 toothpaste tubes provides a sample
mean of 6.1 oz. The population standard deviation is believed
to be 0.2 oz. Perform a hypothesis test, at the 0.03 level of
significance, to help determine whether the filling process
should continue operating or be stopped and corrected.
H0:   = .03
Ha: ≠
x  0 6.1  6
z   2.74
 / n .2 / 30
For z = 2.74, cumulative probability = 0.9969
p–value = 2 x (1  0.9969) = 0.0062
44

Hypothesis Test: Exercise


The production line for Glow toothpaste is designed to fill
tubes with a mean weight of 6 oz.
Assume that a sample of 30 toothpaste tubes provides a sample
mean of 6.1 oz. The population standard deviation is believed
to be 0.2 oz. Perform a hypothesis test, at the 0.03 level of
significance, to help determine whether the filling process
should continue operating or be stopped and corrected.

H0: 
H : ≠
45

Hypothesis Test: Exercise


The production line for Glow toothpaste is designed to fill
tubes with a mean weight of 6 oz.
Assume that a sample of 30 toothpaste tubes provides a sample
mean of 6.1 oz. The population standard deviation is believed
to be 0.2 oz. Perform a hypothesis test, at the 0.03 level of
significance, to help determine whether the filling process
should continue operating or be stopped and corrected.

There is sufficient statistical evidence to infer that the


alternative hypothesis is true (i.e. the mean filling weight is
not 6 ounces).
H0: 
H : ≠
46

Modification
Command: 2 * MIN(p Value, 1 – p Value).
Z.TEST(array,x,[sigma])
•Array     Required. The array or range of data against which to
test x.
•x     Required. The value to test.
•Sigma     Optional. The population (known) standard deviation.
If omitted, the sample standard deviation is used.
Sample Average: 6.1
Sample Size: 30
Population Std Deviation: 0.2
At 97 % Level of Confidence
Ho: µ = 6
Ha: µ ≠ 6
47

Modification 27
Command: 2 * MIN(p Value, 1 – p Value). 25
Z.TEST(array,x,[sigma]) 23
26
•Array     Required. The array or range of data against which to 20
test x. 22
•x     Required. The value to test. 19
28
•Sigma     Optional. The population (known) standard deviation.
23
If omitted, the sample standard deviation is used.
16
27
Population Std Deviation: 4.394 23
At 95 % Level of Confidence 24
19
Ho: µ = 20 22
21
Ha: µ ≠ 20 28
12
17
26
48

One Sample t test


A coffee shop relocates to Italy and wants to
make sure that all lattes are consistent.
They believe that each latte has an average
of 4 oz. of espresso. If this is not the case,
they must increase or decrease the amount.
A random sample of 25 lattes shows a mean
of 4.6 oz. of espresso and a standard
deviation of .489 oz. Use alpha = .05 and
run a one sample t-test to compare with the
known population mean.
Solution
49

 Sample Average: 4.6


 Sample Size: 25

 Sample Std. Deviation: 0.489

 At 95 % Level of Confidence

Ho: µ = 4
Ha: µ ≠ 4
50

Thank You!

???

You might also like