Notes For Business Analytics Part II

Regression Analysis
12/21/2020 Slides used for Educational Purpose only

Need for Regression
▪ The correlation coefficient gives you just the degree of

relationship or association.
▪ It cannot help you estimate or predict the response variable for

a given independent variable.
▪ The response variable is called the dependent variable.
▪ In the present problem involving Sigma Property, ‘%' Occupancy

is the independent variable and ‘Revenue’ is the dependent
variable.
Objectives of Regression Analysis
▪ Explain The variations in the dependent variable as a result of
using a number of independent variables.
▪ Describe the nature of relationship in a precise manner by
way of an equation.
▪ Validate the regression equation statistically.
▪ Predict the value of the dependent variable based on the
target values of the independent variables.
▪ Remove unwanted variable/variables that do not contribute
much toward explaining variations in the dependent variable
Part I-Simple Linear Regression

Regression Model
Simple Linear Regression Model
▪ In this model, dependent variable is a linear function of one independent variable.

For the present case, Revenue may be structured as a linear function of %
occupancy.
▪ Based on sample data collected for the dependent and independent variable, a
model is postulated connecting the dependent variable with the independent
variable in a linear equation form. Symbolically, we write the sample regression
line as follows:
Yˆ = b0 + b1 x1
where
ŷ is the estimate for the dependent variable(revenue)
x1 is the independent variable(% individual occupancy)
b0 and b1 are determined by statistical least square method. b1 is called the regression coefficient(slope)
b0 is the constant term (intercept).Slides used for Educational Purpose only
and 12/21/2020
Historical Perspective
Just for knowledge sake, it is worth pointing out here that the estimates for b0 and b1
obtained by least square method are called ‘Best Linear Unbiased Estimates’ (BLUE)
first pioneered by Gauss and Markoff in the context of General Linear Models that take
care of Multiple Linear Regression as well.

Values of b0 and b1 in the case of simple linear
regression model
 y = nb 0 + b1  x1
 yx = b0  x1 + b1  x1
2
1
Here n denotes the sample size.
Solving these two normal equations,
 (x − x )(y − y) 1
b1 =  ( x − x ) 1
1 1
2
12/21/2020
b0 = y − b1 x 1 Slides used for Educational Purpose only
Simple Linear Regression
To understand the nitty-gritty of simple regression, let us take the present problem for
which we give below the relevant data(Refer File Hotel1.csv)
Revenue PercentOccupancy
514.44 65.70
463.12 61.10
598.18 78.20
454.92 65.40
453.80 63.50
502.23 70.60
626.26 81.20
498.70 72.00
514.46 72.90
623.29 81.70
454.77 62.10
385.57 53.40
You postulate the model for the population in the standard form as follows:
Y= β0+β1X1
Y is the Revenue measured in $1000, β0 is the intercept and β1 is the slope

corresponding the independent variable X1(PercentlOccupancy)
Scatter Diagram-Revenue versus Percent Occupancy

Simple Linear Regression
The estimated regression model to test the population model is
Yˆ = b0 + b1 x1
where ŷ is the estimated dependent variable(Revenue)
x1 is the independent variable(%Occupancy in the sample data)
b0 and b1 are the intercept and slope to be determined by statistical least square method.
Yˆ = -60.3747+8231.7777x1
If the %Occupancy is projected at 85%, then the predicted
Revenuein$1000 upon substitution=639.3234

Backtracking Ability of the Model
Red Color =Actual

Blue Color=Predicted

Part II-Multiple Linear Regression

Multiple Linear Regression
Multiple Linear Regression is an extension of the simple linear regression model
in which the number of independent variables will be more than one. In the
present context of Sigma Property, we add one more independent variable
namely % Group Occupancy.
You postulate the model for the population in the standard form as follows:
Y= β0+β1X1+β2X2
Y is the Revenue measured in $, β0 is the intercept and β1 is the slope

corresponding the independent variable X1(% Individual Occupancy) and β2 is the
slope corresponding the independent variable X2(% Group Occupancy)
Estimated Regression Model
Yˆ = b0 + b1 x1 + b2 x2
Where
ŷ is the estimate for the dependent variable(revenue)

x1 is the independent variable (% individual occupancy)
x 2 is the independent variable (% group occupancy)
b0 , b1 , and b2 represent the intercept, and slopes of the independent variables

respectively.

Linear Programming

Linear Programming

Definition of LPP

Why use Linear
Programming?

Characteristics of a Linear Programming Problem

Graphical Representation of LP

Example 2

"List five areas of application of Linear Programming (LP)
and discuss the usefulness of LP in three of these areas."
• Linear Programming is basically a planning

tool to find out the best solutions under
constraints or limiting factors. The main
objective of a linear programming problem is
to maximize or to minimize some numerical
value.

Various Applications of this technique
• 1. Industrial Applications:
– Product Mix problems
– Blending problems
– Production scheduling
– Assembly line balancing
– Inventory management
• 2. Management applications
– Media selection problems
– Portfolio selection
– Profit/Sales Maximization
– Transportation problems
• 3. It can also be used to solve Diet problems, Flight scheduling,

agriculture problems and many more.

• Linear Programming is the statistical tool for finding the most optimum
solutions to real world problems which have a set of variables linearly
related to each other and are guided by a set of constraints. In simple
terms, a business scenario can involve many parameters for production or
minimizing loss/error for maximizing profit. These can be solved using a
Linear Programming model using the simplex method by identifying the
decision variables and constraints involved.
• The various industry applications of this technique are
• Advertising: Maximizing the reach of the advertisement based on the

slots availability and the cost to advertise on each slot and budget
available for advertising the product.
• Manufacturing: Maximizing the profit earned on products manufactured

based on variables of available raw materials and cost to manufacture
each product including the cost of labour and advertising each product.
• Petroleum Industry: Maximizing the number of barrels sold for gaining

optimum profit on the basis of combinations of crude oil to be used and
constraint to the parameters llike Hydrocarbon and sulphur content per
unit of Petroleum.
• Resource Management (Shift Scheduling): Finding the
optimum number of resources to be employed at a given time
to minimize the cost and maximize the profit based on the
shift allowance per employee and project budget for the
week/month and the number of the leaves to be given per
week to the employee.
• Transportation: Finding the optimum route or number of

drivers to be utilized based on peak timing traffic conditions,
cost of petrol/time taken to transport for a given route and
profit gained per shipment.
• Portfolio Optimizations: Finding the optimum portfolios to

invest in given a budget allocated for investment based on
the % growth of the shares, Dividends received per dollar
investment.

• Civil Engineering Applications: Use of LP has been adopted in many
construction plans, such as steel cutting, template building and earthwork
blending, etc. This can be used in order to optimize the use of
construction equipments to help yield higher profits.
• Cost management in public transportation can be maximize using LP.
For example in Pune local bus services there are few routes for which
multiple buses are ran on daily basis but the all of them are overcrowded.
Similarly there are few routes on which the buses are comparatively less
crowded. On certain peak hours few buses from less crowded areas can
be diverted to the heavy crowded routes. The optimization can be done
using LP method.
• Managing shelf life any perishable product is always a challenge in
demand and supply industry. With the help of LP, considering the demand
of any product on certain areas the supply of the product to that area’s
store can be determine.
• LP can be used to determine the optimized utilization of manpower in
any operations. Companies can identify the proper training and
development of the current workforce basis to the operations requirement,
this will help the company to increase the productivity of the person and
will make the workforce skillful in handling the work with more
accuracy.

Introduction to Machine
Learning

Learning from Data
• Can we learn about the world around us using
data?
• Model building from data
– Take data as input
– Find patterns in the data
– Summarize the pattern in a mathematically precise
way
• Machine learning automates this model building.
The Challenge
• Data unfortunately contains noise. If not,
machine learning would be trivial!
• Think of Data = Information + Noise
• The challenge is to identify the information
content and distill away the noise.
• To help do this, machine learning uses a train and
test approach.
Over fitting Vs under fitting
• If the model we finish with ends up
– modeling the noise as well, we call it “over fitting” -
bad for prediction!
– not modeling all the information, we call it “under
fitting” - bad for prediction!
• The hope is that the model that does the best on
testing data manages to capture/model all the
information but leave out all the noise.
Machine Learning tasks
1. Supervised learning: Building a mathematical model using
data that contains both the inputs and the desired outputs
(ground truth).
– Examples:
• Determining if an image has a horse. The data would include images
with and without the horse (the input), and for each image we would
have a label (the output) indicating if there is a horse in that image.
• Determining is a client might default on a loan
• Determining if a call center employee is likely to quit
– Since we have desired outputs, model performance can be
evaluated by comparisons.
Machine Learning Tasks
2. Unsupervised learning: Building a mathematical model
using data that contains only inputs and no desired outputs.
– Used to find structure in the data, like grouping or clustering of
data points. To discover patterns and group the inputs into
categories.
– Example: an advertising platform segments the population into
smaller groups with similar demographics and purchasing
habits. Helping advertisers reach their target market with
relevant ads.
– Since no labels are provided, there is no specific way to compare
model performance in most unsupervised learning methods.
Tools and techniques
• Supervised learning
– Regression: desired output is a continuous number
– Classification: desired output is a category
• Unsupervised learning
– Clustering: Grouping data
– Dimensionality reduction: Compressing data
– Association rule learning: If X then Y
Intro to Clustering

Clustering
• Clustering is an Unsupervised Learning Technique
• A Cluster: collection of objects that are similar
• Objective is to group similar data points into a group
– Segmenting customers into similar groups
– Automatically organizing similar files/emails into folders
• Simplifies data by reducing many data points into a few
clusters
Distance
• Do define “similarity” you need a measure of
distance
• Examples of common distance measures
– Eucledian Distance
Types of Clustering
1. Connectivity based clustering (Hierarchical clustering): based on the idea that related
objects are closer to each other. Can we then create a hierarchy of clusters/groups.
– Useful when you want flexibility in how many clusters you ultimately want. For
example, imagine grouping items on an online marketplace like Etsy or Amazon.
– In terms of outputs from the algorithm, in addition to cluster assignments you also
build a nice tree (dendrogram) that tells you about the hierarchies between the
clusters. You can then pick the number of clusters you want from this tree.
– In a dendrogram, the y-axis marks the distance at which the clusters merge, while
the objects are placed along the x-axis.
– Algorithms can be agglomerative (start with 1 object and aggregate them into
clusters) or divisive (start with complete data and divide into partitions).
Types of Clustering
2. Centroid based clustering (Eg. K- Means clustering):
The objective is to find K clusters/groups. The way
these groups are defined is by creating a centroid for
each group. The centroids are like the heart of the
cluster, they “capture” the points closest to them and
add them to the cluster.
– Large K produces smaller groups and a small K
produces larger groups
– K-Means uses Euclidian distances and is the most
popular
– Other variants like K-medians and K-mediods use
other distance measures
Clustering

Data we will work with
– Customer Spend Data
• AVG_Mthly_Spend: The average monthly amount spent by customer
• No_of_Visits: The number of times a customer visited in a month
• Item Counts: Count of Apparel, Fruits and Vegetable, Staple Items
purchased
• Can we cluster similar customers together?

Connectivity Based: Hierarchical Clustering
• Hierarchical Clustering techniques create clusters

in a hierarchical tree like structure
• Any type of distance measure can be used as a
measure of similarity
• Cluster tree like output is called Dendrogram
• Techniques either start with individual objects
and sequentially combine them (Agglomerative
), or start from one cluster of all objects and
sequentially divide them (Divisive)
Distance between objects
Centroid based: K-Means
Clustering
• K-Means is probably the most used clustering technique
• Aims to partition the n observations into k clusters so as to

minimize the within-cluster sum of squares (i.e. variance).
• Computationally less expensive compared to hierarchical

techniques.
• Have to pre-define K, the no of clusters

Choosing the optimal K
• Usually subjective, based on striking a good
balance between compression and accuracy
•The “elbow” method is commonly used
Lloyd’s algorithm
1. Assume K Centroids
2. Compute Squared Euclidian distance of each objects with these K

centroids. Assign each to the closest centroid forming clusters.
3. Compute the new centroid (mean) of each cluster based on the

objects assigned to each clusters.
4. Repeat 2 and 3 till convergence: usually defined as the point at which

there is no movement of objects between clusters
TIME SERIES FORECASTING

VISUALIZING TIME SERIES
COMPONENTS

Steps in Forecasting
1. Problem definition:
2. Gathering information
3. Preliminary (exploratory) analysis.
4. Choosing and fitting models
5. Using and evaluating aforecasting model.
Objective of this lesson is to explore several time series data

sets and apply visual methods using R to extract information

Problem Definition
Time series forecasting involves
• Understanding historical pattern of data
• Using past knowledge forecasting for future
Before a forecasting problem is taken up, decision needs to be

made regarding the forecast horizon

Forecast Range
Different industry needs different forecast range for different purpose
Example: Airlines industry: Interested in passenger volumeforecast

Passenger volume is the driving force behind all itsoperation
• Long-term forecast: 5-10 years

̶ Required for strategic decision making
̶ Acknowledging limited reliability of these forecasts
• Mid-term forecast: 2-5 years
̶ Manpower hiring
̶ Decision on addition/alteration in new and existing routes
• Short-term forecast: 2 weeks – 6 months
̶ Manpower rostering
̶ Dynamic pricing

Forecast Range
• Supply Chain: Responds to customer demand
̶ Very long range forecast will not serve the purpose well
̶ In addition to taking into account the past demand, lead time and
planned advertising and other marketing activity must be
incorporated into forecast horizon
• Contract Research Organization doing clinical trials

̶ 2000 trials running simultaneously across the world
̶ Need to forecast monthly for each of some 5000 items required
for trials for next 6 months

Gathering Information
Historical data required for future prediction
If volume of data is limited, forecasts will not

be reliable enough
If data is available for very long past, datamay

not be useful at all

Example: Clay Brick Production

Example: Clay Brick Production
Series not stable
Use stable part for forecast

12/21/2020
0
0.1
0.2
0.3
0.4
0.5
0.6
Week
0001W07
0001W14
0001W21
0001W28
0001W35
0001W42
0001W49
0002W04
0002W11
0002W18
0002W25
0002W32
0002W39
0002W46
0003W01
0003W08
0003W15
0003W22
0003W29
0003W36
0003W43
0003W50
0004W05
Weekly Market Share
0004W12
0004W19
0004W26
Slides used for Educational Purpose only
0004W33
0004W40
0004W47
Example: Crest Toothpaste
0005W02
0005W09
0005W16
0005W23
0005W30
0005W37
0005W44
0005W51
Use later part only for forecast
0006W06
0006W13
Components of Time
Series
Graphs highlight variety of patterns inherent toTS
A TS can be split into several components, each representing one of the

underlying categories of patterns,
Time Series Components

▪ Trend
Systematic
▪ Seasonal component Component
▪ Cyclic component
▪ Irregular component (Error or Random Component)

Trend
• Long term movement of a series: either increasing or
decreasing

12/21/2020
75
95
115
135
155
175
195
215
1988-01
1988-02
1988-03
1988-04
1988-05
1988-06
1988-07
1988-08
1988-09
1988-10
1988-11
1988-12
1989-01
1989-02
1989-03
1989-04
1989-05
1989-06
1989-07
Low Demand (Jan)
1989-08
1989-09
1989-10
1989-11
1989-12
1990-01
Bricks
1990-02
1990-03
1990-04
1990-05
1990-06
1990-07
1990-08
1990-09
1990-10
Slides used for Educational Purpose only
1990-11
1990-12
1991-01
1991-02
1991-03
1991-04
1991-05
1991-06
1991-07
1991-08
High Demand (May)
1991-09
1991-10
1991-11
1991-12
Example: Demand of Bricks
Example: Demand of Bricks
Across each year demand for bricks follow a
repetitive pattern
In a particular month (Jan) demand is the

lowest
In some other months, demand fluctutaes

Seasonality
• Representing intra-year stable fluctuations repeatable year
after year with respect to timing, direction andmagnitude
• Normal variations that recur every year to the

same extent
• A Yearly series does not have seasonality

Seasonality
• Demand for winter clothes
• Airlines and train ticket demands
• Incidence of influenza or other vector-borne
diseases
Stock prices typically will not showany

seasonal pattern

Example: Sale of Shoes
2011-13 demand increasing

2013-15 stable demand
2015 onwards demand declining

Cyclical Component
• In addition to within year stable fluctuation,
demand for this particular style of shoes show
increase over years for a period and then decrease

Systematic Components
• Trend, Seasonality, Cyclicality are part of
systematic component
• These patterns are interpretable
• These can be estimated
• Forecast of time series involves estimation
and extrapolation of these components
We focus on Trend and Seasonality only

Irregular Component
The error or variability associated with the series is the Irregular
component
This component is a randomcomponent
The part of the series that cannot be explained through Systematic

component forms the Irregular Component
Other names of this component is Error or White Noise
This component is assumed to have a normal distribution with 0 mean and

constant variance σ2

Notes For Business Analytics Part II

Uploaded by

Copyright:

Available Formats

You might also like

Notes For Business Analytics Part II

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Notes For Business Analytics Part II

Uploaded by

Copyright:

Available Formats

Regression Analysis

12/21/2020 Slides used for Educational Purpose only

▪ The correlation coefficient gives you just the degree of

▪ It cannot help you estimate or predict the response variable for

▪ The response variable is called the dependent variable.

▪ In the present problem involving Sigma Property, ‘%' Occupancy

12/21/2020 Slides used for Educational Purpose only

▪ In this model, dependent variable is a linear function of one independent variable.

12/21/2020 Slides used for Educational Purpose only

Solving these two normal equations,

Y is the Revenue measured in $1000, β0 is the intercept and β1 is the slope

12/21/2020 Slides used for Educational Purpose only

12/21/2020 Slides used for Educational Purpose only

Red Color =Actual

12/21/2020 Slides used for Educational Purpose only

12/21/2020 Slides used for Educational Purpose only

Y is the Revenue measured in $, β0 is the intercept and β1 is the slope

ŷ is the estimate for the dependent variable(revenue)

b0 , b1 , and b2 represent the intercept, and slopes of the independent variables

12/21/2020 Slides used for Educational Purpose only

12/21/2020 Slides used for Educational Purpose only

12/21/2020 Slides used for Educational Purpose only

12/21/2020 Slides used for Educational Purpose only

12/21/2020 Slides used for Educational Purpose only

12/21/2020 Slides used for Educational Purpose only

12/21/2020 Slides used for Educational Purpose only

12/21/2020 Slides used for Educational Purpose only

• Linear Programming is basically a planning

12/21/2020 Slides used for Educational Purpose only

• 3. It can also be used to solve Diet problems, Flight scheduling,

12/21/2020 Slides used for Educational Purpose only

• The various industry applications of this technique are

• Advertising: Maximizing the reach of the advertisement based on the

• Manufacturing: Maximizing the profit earned on products manufactured

• Petroleum Industry: Maximizing the number of barrels sold for gaining

• Transportation: Finding the optimum route or number of

• Portfolio Optimizations: Finding the optimum portfolios to

12/21/2020 Slides used for Educational Purpose only

12/21/2020 Slides used for Educational Purpose only

12/21/2020 Slides used for Educational Purpose only

12/21/2020 Slides used for Educational Purpose only

12/21/2020 Slides used for Educational Purpose only

• Can we cluster similar customers together?

• Hierarchical Clustering techniques create clusters

• Aims to partition the n observations into k clusters so as to

• Computationally less expensive compared to hierarchical

• Have to pre-define K, the no of clusters

2. Compute Squared Euclidian distance of each objects with these K

3. Compute the new centroid (mean) of each cluster based on the

4. Repeat 2 and 3 till convergence: usually defined as the point at which

12/21/2020 Slides used for Educational Purpose only

12/21/2020 Slides used for Educational Purpose only

Objective of this lesson is to explore several time series data

12/21/2020 Slides used for Educational Purpose only

Before a forecasting problem is taken up, decision needs to be

12/21/2020 Slides used for Educational Purpose only

Example: Airlines industry: Interested in passenger volumeforecast