Advanced Regression in Excel S

Advanced Regression in Excel The Excel Statistical Master
Advanced Regression
in Excel
The Excel Statistical Master
By Mark Harmon
Copyright © 2011 Mark Harmon

No part of this publication may be reproduced
or distributed without the express permission
of the author.
mark@ExcelMasterSeries.com
www.ExcelMasterSeries.com
ISBN: 978-0-9833070-6-8
Copyright ©2011 http://ExcelMasterSeries.com/New_Manuals.php Page 1

Table of Contents
Click on Entries to Go To Each
Using Dummy Variable Regression in Excel To Perform Conjoint Analysis 6
Step-By-Step Video Showing How To Perform Conjoint Analysis Using

Dummy Variable Regression in Excel In Order To Find Out Which
Product Attributes Your Customers Value The Most....................................... 7
The 6 Steps of Performing Conjoint Analysis.................................................... 8
Step 1) List All Product Attributes For 1 Product ......................................... 8
Step 2) Make a List of All Possible Combinations of Those Attributes .. 9
Step 3) Have Consumer Rate Each Attribute Combination...................... 10
Step 4) Prepare Completed Survey for Regression.................................... 11
Dummy Variables to Be Removed From Input Data To Prevent

Collinearity......................................................................................................... 11
Step 5) Run Regression in Excel ..................................................................... 11
Step 6) Derive Attribute Utilities From Regression Output ...................... 12
An Example of Using a Dummy Variable........................................................... 13
The Problem of Collinearity - and How To Solve It......................................... 14
The Product Utilities - The Measure of Customer Liking .............................. 14
How To Quickly Read the Output of Regression in Excel ................................ 16
Step-By-Step Video About How To Quickly Read and Understand the

Output of Excel Regression .................................................................................. 17
The 4 Most Important Parts of Regression Output ......................................... 17

1) Overall Regression’s Accuracy................................................................ 18
R Square ............................................................................................................. 18
Adjusted R Square........................................................................................... 18
2) Probability That This Output Was Not By Chance.............................. 19
Significance of F .............................................................................................. 19
3) Individual Regression Coefficient Accuracy ........................................... 20
P-value of each coefficient and the Y-intercept....................................... 20
4) Visual Analysis of Residuals........................................................................ 21
Charting the Residuals ................................................................................... 21
The Residual Chart .......................................................................................... 22
Logistic Regression Analysis in Excel .................................................................. 23
Customer Quality Scores Are Created With Logistic Regression.............. 23
Step-By-Step Video Showing How To Predict if a Prospect Will Buy Using

Logistic Regression in Excel:............................................................................... 24
What is Logistic Regression? .............................................................................. 24
An Example of Logistic Regression In Action ................................................. 25
Create the Predictive Equation ........................................................................ 26
The Logit................................................................................................................. 26
Calculating the Logit Variables - A, B, and Constant................................. 28
Optimizing the Logit Variables in the Excel Solver .................................... 28
The Final, Most Accurate Predictive Equation............................................. 30
You'll Have To Tweek the Constraints in the Excel Solver....................... 31

The Four Steps of Regression in Excel (Including 2 Crucial Ones Always

Skipped).......................................................................................................................... 33
Step-By-Step Video Showing How To Do All 4 Steps of Regression in

Excel, Including the 2 Crucial Initial Steps That No One Does.................... 34
Crucial Step 1) Graphing the Data....................................................................... 35
Crucial Step 2) Running Correlation Analysis on All Variables

Simultaneously ......................................................................................................... 36
Remove Input Variables That Have Low Correlation With Output

Variable ................................................................................................................... 36
Remove Inputs Variables Highly Correlated With Other Input Variables

................................................................................................................................... 37
Adding New Input Variables To The Regression Analysis ....................... 38
Step 3 – Run the Regression in Excel ................................................................ 39
Step 4) Analysis of Excel Output........................................................................ 40
How To Do Nonlinear Regression Using the Excel Solver............................... 41
The Solver dialogue box has the following 4 parameters that need to be
set: ............................................................................................................................... 45
Objective: ............................................................................................................... 46
Decision Variables:.............................................................................................. 46
Constraints: ........................................................................................................... 46
Selection of Solving Method: GRG Nonlinear.............................................. 46
Solver Tips ................................................................................................................. 50
Initial Solver Settings:......................................................................................... 50
Show Iteration Results:. ................................................................................. 50
Use Automatic Scaling:. ................................................................................. 50

Assume Non-Negative:................................................................................... 50
Bypass Solver Reports:. ................................................................................ 50

Using Dummy Variable

Regression in Excel
To Perform Conjoint Analysis
Dummy Variable Regression is a great tool for business managers. Dummy
Variable Regression, for example, provides the means to perform very useful
analysis such as Conjoint Analysis. Conjoint analysis quantifies how desirable
each product attribute choice is relative to the other available choices for a single
product. In other words, the marketer learns which product choices a consumer
values most and by how much. In this article and the linked video, you will learn
exactly how to perform Conjoint Analysis in Excel using Dummy Variable
Regression. That may sound like advanced stuff but it’s really quite a bit simpler
than you might imagine.
The video on the next page will make the entire procedure of Dummy Variable
Regression in Excel to perform Conjoint Analysis much easier to understand:

Step-By-Step Video Showing How To Perform Conjoint Analysis Using

Dummy Variable Regression in Excel In Order To Find Out Which Product
Attributes Your Customers Value The Most
Instructional Video
Go to
http://www.youtube.com/watch?v=EMbiGPGlBEM
to View a
Video From Excel Master Series
About How To Use
Dummy Variable Regression
in Excel To Perform
Conjoint Analysis
(Is Your Internet Connection and Sound Turned On?)
The ultimate objective of Conjoint Analysis is quantify the consumer’s degree of

liking for each of the choices for one product. The “Utility” of an attribute is the
value associated with the consumer’s degree of liking for that choice.

The 6 Steps of Performing Conjoint Analysis

A brief explanation of how Conjoint Analysis and Dummy Variable Regression
are used together to arrive at the Utility for each product attribute is as follows
and also in the linked video above:
Step 1) List All Product Attributes For 1 Product

The marketer lists all of the available choices that a consumer has for one
product. The marketer starts by listing all of the overall attribute categories such
as color and add-ons. The marketer then lists all of the available choices within
each attribute category. For example, here the marketer would be listing all
available colors and add-ons.
List Of All Product Attributes

Step 2) Make a List of All Possible Combinations

of Those Attributes
The marketer then creates a list of all possible combinations of choices available
to the consumer for that one product.

Step 3) Have Consumer Rate Each Attribute

Combination
This list of all possible combinations is handed to the consumer. The consumer
rates each combination on a scale of 1 (least desirable) to 10 (most desirable).

Step 4) Prepare Completed Survey for Regression

The survey results are arranged so that Dummy Variable Regression can be run
on them. Each product choice is assigned its own Dummy Variable and one
Dummy Variable from each overall attribute category is removed. This will be
explained below and also in more detail in the linked video.
Dummy Variables in a regression are variables that can only assume two values.
One Dummy Variable must be created for each product choice.
Dummy Variables to Be Removed From Input Data To Prevent Collinearity
Step 5) Run Regression in Excel

Dummy Variable Regression is then run on the survey results data.

Step 6) Derive Attribute Utilities From Regression

Output
The Utility for each product attribute is derived directly from the coefficients of the
resulting regression equation.
Excel Regression Output

How To Derive The Utilities From the Output
An Example of Using a Dummy Variable

For example, if the product comes only in the colors red and white, There will be
a Dummy Variable for red and one for white. The Dummy Variable for the color
red can take values of only 1 or 0 because the product will either be red or not.
The same applies for the white Dummy Variable, and all other dummy variables.
When the survey is returned, the survey data is converted into the proper layout
for the Regression function in Excel. Each Dummy Variable assigned to a
specific attribute will be assigned the value of 0 or 1, depending on whether that
attribute was an element of the combination that is currently being rated.
Watching this done in the linked video is probably the easiest way to understand
how to do it.

The Problem of Collinearity - and How To Solve It

One problem can occur when Dummy Variables are inputs to a regression. The
problem of Collinearity or Multicollinearity occurs when any independent variable
can be used to predict the value of any other independent variable. For example,
if the product comes in only red or white, you can predict whether the product is
red if you know whether or not the product is white. This is Collinearity.
Collinearity and Multicollinearity are corrected by removing one Dummy Variable

from each choice category. For example, if color choices are red or white, the
Dummy Variable for one of those colors would be removed. Collinearity is then
solved. You cannot predict whether of not the product is red if you do not know
whether the product is white (because the Dummy Variable for white has been
removed).
The data can now be run as a regular regression using Excel’s regression tool.
The linked video shows how to do this in detail.
The regression is run and a regression equation is obtained.
The Product Utilities - The Measure of Customer Liking

The “Utilities” of each of the product choices are set to equal the value of the
coefficients of the regression equation. The “Utility” is the degree of liking that the
consumer attached to that product choice.
For example, the marketer will find out how important the color red was
compared to each of the other product choices during the purchase decision.
Utilities of product choices that were associated with the Dummy Variables that
were removed to prevent collinearity will be assigned the value of 0.
We now have Utilities for each attribute. Now, the overall attractiveness of a
particular combination of choices can be calculated by adding up the individual
Utilities associated with the each of the choices. The sum of the Utilities for each
combination is the regression’s prediction of consumer’s degree of liking for that
combination of product choices.
The removal of the individual Dummy Variables does not affect the accuracy or
completeness of the answer. Adding up the Utilities for each combination will
produce a figure that will be very close to the consumer’s actual rating for that
combination. An example of this is shown in the video.

Showing the Regression Equation Predicts Nearly the Same Score as the
Customer's Ranking of Card 13, Even Though Dummy Variables Were
Removed

How To Quickly Read

the Output of Regression
Analysis Done in Excel
There is a lot more to the Excel Regression output than just the regression
equation. If you know how to quickly read the output of a Regression done in,
you’ll know right away the most important points of a regression: if the overall
regression was a good, whether this output could have occurred by chance,
whether or not all of the independent input variables were good predictors, and
whether residuals show a pattern (which means there’s a problem).
Excel Regression Output With Color-Coding Added
This video will illustrate exactly how to quickly and easily understand the output
of Regression performed in Excel:

Step-By-Step Video About How To Quickly Read and Understand the

Output of Excel Regression
(Is Your Sound and Internet Connection Turned On?)
The 4 Most Important Parts of Regression Output
1) Overall Regression Equation’s Accuracy

(R Square and Adjusted R Square)
2) Probability That This Output Was Not By Chance

(ANOVA – Significance of F)
3) Individual Regression Coefficient and Y-Intercept Accuracy

4) Visual Analysis of Residuals
Some parts of the Excel Regression output are much more important than
others. The goal here is for you to be able to glance at the Excel Regression
output and immediately understand it, so we will focus our attention only on the
four most important parts of the Excel regression output.
1) Overall Regression’s Accuracy
R Square
This is the most important number of the output. R Square tells how well the
regression line approximates the real data. This number tells you how much of
the output variable’s variance is explained by the input variables’ variance.
Ideally we would like to see this at least 0.6 (60%) or 0.7 (70%).
Adjusted R Square
This is quoted most often when explaining the accuracy of the regression
equation. Adjusted R Square is more conservative the R Square because it is
always less than R Square. Another reason that Adjusted R Square is quoted
more often is that when new input variables are added to the Regression
analysis, Adjusted R Square increases only when the new input variable makes
the Regression equation more accurate (improves the Regression equations’s
ability to predict the output). R Square always goes up when a new variable is
added, whether or not the new input variable improves the Regression equation’s
accuracy.
2) Probability That This Output Was Not By

Chance
Significance of F
This indicates the probability that the Regression output could have been
obtained by chance. A small Significance of F confirms the validity of the
Regression output. For example, if Significance of F = 0.030, there is only a 3%
chance that the Regression output was merely a chance occurrence.

3) Individual Regression Coefficient Accuracy
P-value of each coefficient and the Y-intercept
The P-Values of each of these provide the likelihood that they are real results
and did not occur by chance. The lower the P-Value, the higher the likelihood
that that coefficient or Y-Intercept is valid. For example, a P-Value of 0.016 for a
regression coefficient indicates that there is only a 1.6% chance that the result
occurred only as a result of chance.

4) Visual Analysis of Residuals
Charting the Residuals

The Residual Chart
The residuals are the difference between the Regression’s predicted value and
the actual value of the output variable. You can quickly plot the Residuals on a
scatterplot chart. Look for patterns in the scatterplot. The more random (without
patterns) and centered around zero the residuals appear to be, the more likely it
is that the Regression equation is valid.
There are many other pieces of information in the Excel regression output but the
above four items will give a quick read on the validity of your Regression.
Hand Calculation of Regression Problems
Go To
http://excelmasterseries.com/Excel_Statistical_Master/Regression.php
To View How To Solve Regression Problems By Hand (No Excel)
(Is Your Internet Connection Turned On ?)
You'll Quickly See Why You Always Want To Use Excel To Solve Statistical
Problems !

Logistic Regression Analysis in

Excel
Wouldn’t it be great if there was a more accurate way to predict whether your
prospect will buy rather than just taking an educated guess? Well, there is…if
you have enough data on your previous prospects. The tool that makes this
possible is called Logistic Regression and can be easily implemented in Excel.
Customer Quality Scores Are Created With

Logistic Regression
Marketers use Logistic Regression to rank their prospects with a quality score
which indicates that prospect’s likelihood to buy. The more data you’ve collected
from previous prospects, the more accurately you’ll be able to use Logistic
Regression in Excel to calculate your new prospect’s probability of purchasing.

Step-By-Step Video Showing How To Predict if a Prospect Will

Buy Using Logistic Regression in Excel:
Instructional Video
Go to
http://www.youtube.com/watch?v=NHOO7iceJrw
to View a
Video From Excel Master Series
About How To Use
Logistic Regression
in Excel To Predict of Your
Next Prospect
WILL BUY! (or not !#!$%!)
(Is Your Internet Connection and Sound Turned On?)
What is Logistic Regression?

Logistic Regression calculates the probability of the event occurring, such as the
purchase of a product. In general, the thing being predicted in a Regression
equation is represented by the dependent variable or output variable and is

usually labeled as the Y variable in the Regression equation. In the case of

Logistic Regression, this “Y” is binary. In other words, the output or dependent
variable can only take the values of 1 or 0. The predicted event either occurs or it
doesn’t occur – your prospect either will buy or won’t buy. Occasionally this type
of output variable also referred to as a Dummy Dependent Variable.
An Example of Logistic Regression In Action
Here is a marketing example showing how Logistic Regression works. The

embedded video walks through this example in Excel as well:
Suppose that you have collected three pieces of data on each of your previous
prospects. The data you have collected on each prospect was:
1) The prospect’s age

2) The prospect’s gender (1 = Male and 0 = Female)
3) Whether the prospect purchased or not (Did purchase Y = 1, Did not
purchase, Y = 0).

Create the Predictive Equation
With the above data, you could create a predictive equation that would calculate
a new prospect’s probability of purchasing by inputting this new prospect’s age
and gender. This predictive equation will be in the form of:
P(X) = eL/ (1+eL)
P(X) represents the possibility of event X occurring.
The Logit
Event X is a purchase. In other words, P(X) is the probability that Y = 1.
P(X) has only one variable. That is L, which is called the Logit.
The Logit, L = Constant + A * Age + B * Gender
L, the Logit, has 3 variables: Constant, A, and B. They must be known before
P(X) can be calculated. Those 3 variables can be found in Excel by using the
Excel Solver. The Excel Solver will find the optimal combination of those 3
variables that causes the resulting P(X) to most accurately predict whether Y = 1
or 0 for all previous prospects.

Everything To the Right of the Above Is Continued As Follows:

Calculating the Logit Variables - A, B, and

Constant
Here’s how the most optimal set of Logit variables (Constant, A, and B) are found
in Excel:
Using Excel, each recorded prospect has the following calculation performed:
P(X)Y * [ 1 - P(X) ] (1-Y)
The Y refers to Y = 1 if the prospect bought and Y = 0 if the prospect didn’t buy.
The P(X) is the probability of purchase that will be calculated using the equation
listed above. In Excel, the P(X) calculation is initially performed by the Excel
Solver using Logit variables (Constant, A, and B) which are not optimal. The
Excel Solver will then continuously try new combinations of these variables until
the optimal P(X) is found.
Optimizing the Logit Variables in the Excel Solver
Here’s how the Excel Solver knows when it has found the correct combinations
of these 3 variables so that the resulting P(X) equation most accurately predicts
whether Y = 1 or 0:
The equation P(X)Y * [ 1 - P(X) ] (1-Y) is maximized when P(X) is most accurate. It
approaches it highest value (1) when Y = 1 and P(X) approaches 1. It also
approaches its highest value (1) when Y = 0 and P(X) approaches 0. When Y = 1
and P(X) = 1, that is a 100% correct prediction by P(X) that Y = 1. When Y = 0
and P(X) = 0, that is a 100% correct prediction by P(X) that Y = 0.
Each prospect has a separate P(X)Y * [ 1 - P(X) ] (1-Y) value calculated for him or
her.

The sum of each P(X)Y * [ 1 - P(X) ] (1-Y)calculation for all prospects is taken.
The only variables that exist when calculating P(X)Y * [ 1 - P(X) ] (1-Y)are Y and
the variables of P(X), which are Constant, A, and B. Use the Excel Solver, these
variable are adjusted until their values maximize the sum of all
P(X)Y * [ 1 - P(X) ] (1-Y)

The Final, Most Accurate Predictive Equation
When the sum of P(X)Y * [ 1 - P(X) ] (1-Y) is maximized, then the final resulting
P(X) equation is as accurate as possible at predicting whether Y will be 1 or 0.
The Excel Solver Dialogue Box

Stated another way, we now have a predictive equation P(X ) which uses the
optimal combination of Constant, A, and B which most accurately calculates the
probability that Y = 1 given a prospect’s age and gender.
The embedded video provides a clear picture of all of this in action in Excel.
The use of the Excel Solver does require some hand-tweeking to ensure that the
most accurate answer is obtained. The video shows an example of this.
Ultimately what the Solver is doing is adjusting variables Constant, A, and B to
maximize the sum of the column of
P(X)Y * [ 1 - P(X) ] (1-Y) equations. The answer obtained by the Solver should
maximize that sum and provide realistic answers for the probabilities of each
prospect, including the new one.
You'll Have To Tweek the Constraints in the Excel

Solver
You’ll probably find that you have to experiment by applying constraints to the
variables that Solver is adjusting in order to maximize the target sum. The
variables that Solver adjusts are called Decision Variables. Solver allows you to
create constraints on the value of any Decision Variable.

Adding a Constraint to the Solver
In the video, you will be able to watch how a Decision Variable is constrained to
make the final answer more accurate. The Decision Variable called Constant was
constrained to always remain above -25 during the Solver analysis. This resulted
in the most accurate and realistic maximization of the sum of the
P(X)Y * [ 1 - P(X) ] (1-Y) equations.
Conclusion – Logistic Regression in Excel Is an

Incredible Predictor but Not the Simplest Analysis
Logistic Regression is not the simplest type of analysis to understand or perform.

Hopefully this article and video have provided a much clearer picture for you.

The Four Steps of Regression in

Excel
(Including Two Crucial Steps That Most People
Skip)
Running a Regression in Excel is fairly easy. So is running one incorrectly. There

are two crucial steps that should always be performed on the data before any
Regression should be run. Fortunately these two steps are very quick and easy
to do in Excel. They are:
1) Graph the Data

2) Run Correlation Analysis On All Variables
Following is a video of this article showing how to perform all four steps to
Regression in Excel, including the above two crucial steps at the beginning:

Step-By-Step Video Showing How To Do All 4 Steps of Regression in Excel,

Including the 2 Crucial Initial Steps That No One Does, But Should
(Is Your Sound and Internet Connection Turned On?)
Why You Need To Run The 2 Crucial Steps Before

Doing Regression
Here’s why you need to run the two crucial steps prior to regressing any data in
Excel:

Crucial Step 1) Graphing the Data

Whether or not you are using Excel to run a Regression, you should always
graph the data before doing anything else. Eyeballing the data will allow you to
quickly determine whether there is any relationship between the independent
(input) variables and the dependent (output) variable. You also want to evaluate
whether the graph generally appears to be linear or possibly quadratic. Excel’s
Regression Tool works well only for reasonably linear data. Eyeballing the data
upfront will tell you very quickly whether Excel’s Linear Regression is the right
tool for the job.
Graphing The Data To Check If It Is Linear
The input and output variables will be graphed together. The y-axis of the chart
will provide the scale for plotting of those values. The x-axis will provide a
measure of whatever continuum was used, e.g. time, to collect the values of all of
the variables. Excel’s charting function is the way to go here. The above linked
video shows exactly how to chart all the data in Excel.

Crucial Step 2) Running Correlation Analysis on

All Variables Simultaneously
There are two good reasons for doing this. First, we want to remove any input
variables which are clearly not good predictors of the output variable. Second, we
want to make sure that none of the input variables have a high correlation with
(are good predictors of) other input variables.
Running Correlation Analysis on the Data To Prevent Collinearity and also

To Remove Input Variables That Have Low Correlation With the Output
Variable
Correlation of multiple variables is easily done in Excel using the Correlation

Data Analysis tool. The linked video shows exactly how to do that.
Remove Input Variables That Have Low Correlation With

Output Variable
After you have run Correlation Analysis on the data, you will want to remove any
input variables that have a low correlation with the output variable. A Correlation
Coefficient of with an absolute value of less than 0.4 (between -0.4 and +0.4)

between the output variable and an input variable indicates that the input variable
is not a good predictor of the output. That input variable should be removed from
the Regression Analysis. The attached video provides an example of this.
Data Columns Before Removing Input Variable With Low Correlation To

Output
Data Columns After Removing Input Variables With Low Correlation To

Output
Remove Inputs Variables Highly Correlated With Other

Input Variables
After looking at the Correlation Coefficients between the input and output
variables, look at the Correlation Coefficients between the input variables
themselves. You do not want to use pairs of input variables that are good
predictors of each other in a Regression. This will cause a Regression error
known as Collinearity or Multicollinearity. One variable from any pair of highly-
correlated input variables should be removed prior to running the Regression
Analysis. Variables can be considered highly-Correlated if the absolute value of
their Correlation Coefficient is greater the 0.7 (greater than +0.7 or less than -
0.7).

Adding New Input Variables To The Regression Analysis

Here are a few hints about adding new input variables to a Regression Analysis.
First, build up a Regression by starting with a small number of input variables
and add any new ones one at a time. Second, good new input variables
noticeably increase Adjusted R Square and also lower Standard Error without
significantly changing the existing Regression Coefficients.

Step 3 – Run the Regression in Excel

When you are satisfied with the output of the data graph and the Correlation
Analysis, go ahead and run the Regression with Excel. An example of how to do
this is shown in the above video.
The Excel Regression Dialogue Box

Final Step 4) Analysis of Excel Output

The final step of Excel Regression is Analysis of the Excel output. Please refer to
the chapter of this manual that goes into detail about how to quickly read and
understand the output of regression done in Excel.
Excel Regression Output With Color Coding Added
Conclusion - Plotting the Data and Running

Correlation Can Be BIG Time Savers
Plotting the data and running Correlation Analysis prior to running a Regression
can save you lots of time that you might otherwise have to spend making
adjustments to your Regression after running it.

Using How To Do Nonlinear

Regression Using the Excel Solver
Excel Solver is one of the best and easiest curve-fitting devices in the world, if
you know how to use it. Its curve-fitting capabilities make it an excellent tool to
perform nonlinear regression. The Excel Solver will find the equation of the linear
or nonlinear curve which most closely fits a set of data points.
One very important caveat must be added: the user must first determine the
general type of the curve and input that information into Solver at the start. This
information is in the form of the general equation that defines the curve, such as
a0 + a1*x + a2*x2 = c or a*ln(xb) = c. Solver then calculates all needed variables
which produce the equation which most closely fits the data points. We will run
through an example here.
In this problem we are going to show how to use the Excel Solver to calculate an
equation which most closely describes the relationship between sales and
number of ads being run. The purpose of this equation is to be able to predict the
number of sales based upon the number of ads that will be run.
A marketing manager has collected this following data on the company’s sales
vs. the number of ads that were running at different times.
Sales Number of Ads Running

50 6700
55 7500
59 8700
62 8900
75 8800
95 10900
110 11200
125 11400
140 11500
180 12300

Here is an Excel scatter plot of that data:
We would like to create an equation from this data that allows us to predict the
sales based upon the number of ads currently running.
The first step is to eyeball the data and estimate what general type of curve this
graph probably is. In this case it appears to a graph the has a diminishing y value
for an increasing x value. A formula for such a curve would have the general
form:
Y = A1 + A2 * XB1
Sales = A1 + A2 * (Number of Ads Running)B1
We can use the Excel Solver to solve for A1, A2, and B1. We need to arrange
the data in a form that can be input into the Excel Solver as follows:

This table shows the arrangement of data and the calculations. Here we have
created an Excel model based upon our model of:
One example of this formula in action is explained for Cell E16. We are listing the
variable that we are solving for (A1, A2, and B1) in cells B3 to B5. In Solver
language, these solves that we are changing are called Decision Variables.
We have arbitrarily set our Decision Variables for:
A1 = 100
A2 = 100
B1 = 0.05
We now take the difference between the actual number of sales and the number
of sales predicted by our model with our arbitrary settings for the Decision
Variables. The square of each difference is taken and then all squares are
summed up.
We are trying to find the settings for the Decision Variables that will minimize the
sum of the squares of the differences. In other words, we are trying to find A1,
A2, and B1 that will minimize the number in cell G13.
Once the Solver has been installed as an add-in (To add-in Solver: File /
Options / Add-Ins / Manage / Excel Add-Ins / Go / Solver Add-In), you can
access the Solver in Excel 2010 by: Data / Solver.
The following blank Solver dialogue box comes up:

The Solver dialogue box has the following 4 parameters that need to be set:
1) The Objective Cell – This is the target cell that we are either trying to
maximize, minimize, or achieve a certain value.
2) Minimize or Maximize the Target, or attempt to achieve a

certain value in the Objective cell.
3) Decision Variables – A set of variables that will be changed by the

Excel Solver in order to optimize the target cell.
4) Constraints – These are the limitations that the problem subjects the
Solver to during its calculations
Once again, here is the data table for Solver inputs:

Objective:
We are trying to minimize Cell G13, the sum of the square of differences
between the actual and predicted sales.
Decision Variables:
We are changing A1, A2, and B1 (cells B3 to B5) to minimize our Objective, Cell
G13. The Decision Variables are therefore Cells B3 to B5.
Constraints:
There are none for this curve-fitting operation.
Selection of Solving Method: GRG Nonlinear
The GRG Nonlinear method is used when the equation producing the objective is
not linear but is smooth (continuous). Examples of smooth nonlinear functions in
Excel are:
=1/C1, =Log(C1), and =C1^2
These functions have graphs that are curved (nonlinear), but have no breaks
(smooth)
Our sales equation appears to be smooth and non-linear:

Here is the completed Solver dialogue box:
Here is a close-up of the Solver Objective, Decision Variables, and Constraints:

If we now hit the Solve button, we get the following result:

Solver has optimized the Decision Variables to minimize the objective function as
follows:
A1 = -445,616
A2 = 437,247
B1 = 0.00911
The Objective is minimized to: 2,556,343
We can now create an Excel graph of the Actual Sales vs. the Predicted Sales as
follows:
Solver calculates that Sales can be predicted from Number of Ads Running by
the following equation:
Sales = -445616 + 437247 * (Number of Ads Running)0.00911
The trickiest part of this problem is the first step; eyeballing the data to
determine what kind of graph the data is arranged in. You should take time to
evaluate whether you are pursuing calculation of the correct curve type.

Solver Tips
You may notice that if you run this problem through the Solver multiple time, you
will get slightly different answers. Each time that you run Solver’s GRG algorithm,
it will calculate different values for the Decision Variables. You are trying to find
the values for the Decision Variables that minimize the objective function (cell
G13) the most.
When the Solver runs the GRG algorithm, it picks a starting point for its
calculations. Each time you run the Solver GRG method a slightly different
starting point will be picked. That is why different answers will appear during
each run. Choose the Decision Variable value that occur during the run which
produces the lowest value of the Objective. Keep running the Solver until the
objective is not minimized anymore. That should give you the optimal values of
the Decision Variables. That was done in the example above.
Initial Solver Settings:
Here are some Solver settings that you want to configure prior to running the
Solver for most problems. These settings are found when you click the Options
button:
Show Iteration Results: Leave this unchecked. This stops the GRG Solver after
each iteration, displaying the result for that iteration. Very rarely is there a reason
for doing that.
Use Automatic Scaling: Leave this box unchecked. You would only use this
option if you had reason to believe that inputs of the Solver were measured using
different scales.
Assume Non-Negative: Only check this if you are sure that none of the
variables can ever be negative. In this case, that is clearly not the case.
Bypass Solver Reports: Leave this box unchecked. There is no advantage to

not having Solver reports for each Solver run.

Summary
Excel Solver is an easy-to-use and powerful nonlinear regression tool as a result

of its curve-fitting capacity. One use of this is to calculate predictive sales
equations for your company. It will work as long as you have properly determined
the correct general curve type in the beginning.

Meet the Author
Mark Harmon is a master number cruncher. Creating overloaded Excel spreadsheets

loaded with complicated statistical analysis is his idea of a good time. His profession as
an Internet marketing manager provides him with the opportunity and the need to
perform plenty of meaningful statistical analysis at his job.
Mark Harmon is also a natural teacher. As an adjunct professor, he spent five years
teaching more than thirty semester-long courses in marketing and finance at the Anglo-
American College in Prague and the International University in Vienna, Austria. During
that five-year time period, he also worked as an independent marketing consultant in the
Czech Republic and performed long-term assignments for more than one hundred clients.
His years of teaching and consulting have honed his ability to present difficult subject
matter in an easy-to-understand way.
Harmon received a degree in electrical engineering from Villanova University and MBA
in marketing from the Wharton School.

Advanced Regression in Excel S

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advanced Regression in Excel S

Uploaded by

Copyright:

Available Formats

Advanced Regression in Excel The Excel Statistical Master

The Excel Statistical Master

Copyright © 2011 Mark Harmon

Copyright ©2011 http://ExcelMasterSeries.com/New_Manuals.php Page 1

Using Dummy Variable Regression in Excel To Perform Conjoint Analysis 6

Step-By-Step Video Showing How To Perform Conjoint Analysis Using

The 6 Steps of Performing Conjoint Analysis.................................................... 8

Step 1) List All Product Attributes For 1 Product ......................................... 8

Step 2) Make a List of All Possible Combinations of Those Attributes .. 9

Step 3) Have Consumer Rate Each Attribute Combination...................... 10

Step 4) Prepare Completed Survey for Regression.................................... 11

Dummy Variables to Be Removed From Input Data To Prevent

Step 5) Run Regression in Excel ..................................................................... 11

Step 6) Derive Attribute Utilities From Regression Output ...................... 12

An Example of Using a Dummy Variable........................................................... 13

The Problem of Collinearity - and How To Solve It......................................... 14

The Product Utilities - The Measure of Customer Liking .............................. 14

How To Quickly Read the Output of Regression in Excel ................................ 16

Step-By-Step Video About How To Quickly Read and Understand the

The 4 Most Important Parts of Regression Output ......................................... 17

Copyright ©2011 http://ExcelMasterSeries.com/New_Manuals.php Page 2

1) Overall Regression’s Accuracy................................................................ 18

2) Probability That This Output Was Not By Chance.............................. 19

3) Individual Regression Coefficient Accuracy ........................................... 20

P-value of each coefficient and the Y-intercept....................................... 20

4) Visual Analysis of Residuals........................................................................ 21

Charting the Residuals ................................................................................... 21

The Residual Chart .......................................................................................... 22

Logistic Regression Analysis in Excel .................................................................. 23

Customer Quality Scores Are Created With Logistic Regression.............. 23

Step-By-Step Video Showing How To Predict if a Prospect Will Buy Using

What is Logistic Regression? .............................................................................. 24

An Example of Logistic Regression In Action ................................................. 25

Create the Predictive Equation ........................................................................ 26

Calculating the Logit Variables - A, B, and Constant................................. 28

Optimizing the Logit Variables in the Excel Solver .................................... 28

The Final, Most Accurate Predictive Equation............................................. 30

You'll Have To Tweek the Constraints in the Excel Solver....................... 31

Copyright ©2011 http://ExcelMasterSeries.com/New_Manuals.php Page 3

The Four Steps of Regression in Excel (Including 2 Crucial Ones Always

Step-By-Step Video Showing How To Do All 4 Steps of Regression in

Crucial Step 1) Graphing the Data....................................................................... 35

Crucial Step 2) Running Correlation Analysis on All Variables

Remove Input Variables That Have Low Correlation With Output

Remove Inputs Variables Highly Correlated With Other Input Variables

Adding New Input Variables To The Regression Analysis ....................... 38

Step 3 – Run the Regression in Excel ................................................................ 39

Step 4) Analysis of Excel Output........................................................................ 40

How To Do Nonlinear Regression Using the Excel Solver............................... 41

Selection of Solving Method: GRG Nonlinear.............................................. 46

Solver Tips ................................................................................................................. 50

Initial Solver Settings:......................................................................................... 50

Show Iteration Results:. ................................................................................. 50

Use Automatic Scaling:. ................................................................................. 50

Copyright ©2011 http://ExcelMasterSeries.com/New_Manuals.php Page 4

Bypass Solver Reports:. ................................................................................ 50

Copyright ©2011 http://ExcelMasterSeries.com/New_Manuals.php Page 5

Using Dummy Variable

Copyright ©2011 http://ExcelMasterSeries.com/New_Manuals.php Page 6

Step-By-Step Video Showing How To Perform Conjoint Analysis Using

(Is Your Internet Connection and Sound Turned On?)

The ultimate objective of Conjoint Analysis is quantify the consumer’s degree of

Copyright ©2011 http://ExcelMasterSeries.com/New_Manuals.php Page 7

The 6 Steps of Performing Conjoint Analysis