Business Analytics

Chapter 1
Introduction
Three developments spurred recent explosive growth in the use

of analytical methods in business applications:
First development:
 Technological advances, Internet social networks, and
data generated from personal electronic devices,
produce incredible amounts of data for businesses.
 Businesses want to use these data to improve the

efficiency and profitability of their operations, better
understand their customers, price their products more
effectively, and gain a competitive advantage.
 Technological advances such as improved point-of-sale

scanner technology and the collection of data through e-
commerce.
Second development:
 Ongoing research has resulted in numerous
methodological developments, including:
>Advances in computational approaches to effectively

handle and explore massive amounts of data
>Faster algorithms for optimization and simulation, and
>More effective approaches for visualizing data.

Third development: Decision Making
 The methodological developments were paired with an Managers’ responsibility
explosion in computing power and storage capability.  To make strategic, tactical, or operational decisions.
 Better computing hardware, parallel computing, and Strategic decisions

cloud computing have enabled businesses to solve big  Involve higher-level issues concerned with the overall
problems faster and more accurately than ever before. direction of the organization.
 Cloud computing, the more recent development, is the  These decisions define the organization’s overall goals
remote use of hardware and software over the Internet. and aspirations for the future.
Figure 1.1 - Google Trends Graph of Searches on the term Tactical decisions
Analytics  Concern how the organization should achieve the goals
and objectives set by its strategy.
 They are usually the responsibility of mid level

management.
Operational decisions
 Affect how the firm is run from day to day.
 They are the domain of operations managers, who are
the closest to the customer.
Figure 1.1 is a graph generated by Google Trends that displays Decision making can be defined as the following process:
the search volume for the word analytics from 2004 to 2013 1. Identify and define the problem
(projected) on a percentage basis from the peak. 2. Determine the criteria that will be used to evaluate
alternative solutions
 The figure clearly illustrates the recent increase in interest in 3. Determine the set of alternative solutions
analytics. 4. Evaluate the alternatives
5. Choose an alternative
 Consider the case of the Thoroughbred Running Company Business analytics
(TRC). Historically, TRC had been a catalog-based retail seller  Scientific process of transforming data into insight for
of running shoes and apparel. TRC sales revenue grew making better decisions.
quickly as it changed its emphasis from catalog-based sales
to Internet-based sales.  Used for data-driven or fact-based decision making,
which is often seen as more objective than other
 Recently, TRC decided that it should also establish retail alternatives for decision making.
stores in the malls and downtown areas of major cities. This
is a strategic decision that will take the firm in a new Tools of business analytics can aid decision making by:
direction that it hopes will complement its Internet-based  Creating insights from data
strategy.  Improving our ability to more accurately forecast for
planning
 TRC middle managers will therefore have to make a variety  Helping us quantify risk
of tactical decisions in support of this strategic decision,  Yielding better alternatives through analysis and
including how many new stores to open this year, where to optimization
open these new stores, how many distribution centers will
be needed to support the new stores, and where to locate A Categorization of Analytical Methods and Models
these distribution centers.
Descriptive analytics
 Operations managers in the stores will need to make day-to-  It encompasses the set of techniques that describes
day decisions regarding, for instance, how many pairs of what has happened in the past.
each model and size of shoes to order from the distribution
centers and how to schedule their sales personnel. Examples - data queries, reports, descriptive statistics, data
visualization (data dashboards), data-mining
Common approaches to making decisions: techniques, and basic what-if spreadsheet
 Tradition models.
 Intuition
 Rules of thumb Data query - It is a request for information with certain
 Using the relevant data available characteristics from a database.
Business Analytics Defined
Data dashboards - Collections of tables, charts, maps, and Data mining
summary statistics that are updated as new data become  Used to find patterns or relationships among elements
available. of the data in a large database; often used in predictive
analytics.
Uses of dashboards
 To help management monitor specific aspects of the Example:
company’s performance related to their decision-making A large grocery store chain might be interested in developing
responsibilities. a new targeted marketing campaign that offers a discount
coupon on potato chips.
 For corporate-level managers, daily data dashboards
might summarize sales by region, current inventory By studying historical point-of-sale data, the store may be
levels, and other company-wide metrics. able to use data mining to predict which customers are the most
likely to respond to an offer on discounted chips by purchasing
 Front-line managers may view dashboards that contain higher-margin items such as beer or soft drinks in addition to the
metrics related to staffing levels, local inventory levels, chips, thus increasing the store’s overall revenue.
and short-term sales forecasts.
Simulation
Predictive analytics  It involves the use of probability and statistics to
 It consists of techniques that use models constructed construct a computer model to study the impact of
from past data to predict the future or ascertain the uncertainty on a decision.
impact of one variable on another.
Example:
 Survey data and past purchase behavior may be used to Banks often use simulation to model investment and default
help predict the market share of a new product. risk in order to stress test financial models.
Used in the pharmaceutical industry to assess the risk of

introducing a new drug.
Techniques used in Predictive Analytics: contd.

Optimization models
Prescriptive Analytics Model Field Purpose

 It indicates a best course of action to take
Portfolio Finance Use historical investment r
models eturn data to determine the
Models used in prescriptive analytics: mix of investments that yield
Optimization models the highest expected return
 models that give the best decision subject to constraints while controlling or limiting
of the situation. exposure to risk.
Simulation optimization Supply Operations Provide the cost-minimizing

 combines the use of probability and statistics to model network plant and distribution center
uncertainty with optimization techniques to find good design models locations subject to meeting
decisions in highly complex and highly uncertain settings. the customer service
requirements.
Decision analysis
 Used to develop an optimal strategy when a decision Price Retailing Uses historical data to yield
maker is faced with several decision alternatives and an markdown revenue-maximizing discount
uncertain set of future events. models levels and the timing of
discount offers when goods
 It also employs utility theory, which assigns values to have not sold as planned.
outcomes based on the decision maker’s attitude toward
risk, loss, and other factors.
Big Data
Big data
 A set of data that cannot be managed, processed, or
analyzed with commonly available software in a reasonable
amount of time.
 Big data represents opportunities.  As they realize the advantages of these analytic techniques,
 It also presents analytical challenges from a processing they often progress to more sophisticated techniques in an
point of view and consequently has itself led to an effort to reap the derived competitive advantage.
increase in the use of analytics.
 More companies are hiring data scientists who know  Predictive and prescriptive analytics are sometimes therefore
how to process and analyze massive amounts of data. referred to as advanced analytics.
 Walmart handles over one million purchase transactions per Types of applications of analytics by application area:
hour. Financial analytics
 Facebook processes more than 250 million picture uploads Use of predictive models
per day.  To forecast future financial performance
 To assess the risk of investment portfolios and
Business Analytics in Practice projects
 To construct financial instruments such as
derivatives
Financial analytics (contd.)

Use of prescriptive models
 To construct optimal portfolios of investments
 To allocate assets, and
 To create optimal capital budgeting plans.
 Simulation is also often used to assess risk in the
financial sector
Figure 1.2 - The Spectrum of Business Analytics

Example for use of prescriptive models:
 Companies that apply analytics often follow a trajectory GE Asset Management uses optimization models to decide
similar to that shown in Figure 1.2. how to invest its own cash received from insurance policies and
other financial products, as well as the cash of its clients such as
 Organizations start with basic analytics in the lower left. Genworth Financial.
use of scanner data and data generated from social media
The estimated benefit from the optimization models was $75 has led to an increased interest in marketing analytics.
million over a five-year period.
Marketing analytics (contd.)
Example for use of simulation: A better understanding of consumer behavior through marketing
Deployment by Hypo Real Estate International of simulation analytics leads to:
models to successfully manage commercial real estate risk.  The better use of advertising budgets
 More effective pricing strategies
Human resource (HR) analytics  Improved forecasting of demand
 New area of application for analytics  Improved product line management, and
 The HR function is charged with ensuring that the  Increased customer satisfaction and loyalty
organization
 Has the mix of skill sets necessary to meet its needs Example of high-impact marketing analytics:
 Is hiring the highest-quality talent and providing an
environment that retains it, and Automobile manufacturer Chrysler teamed with J. D. Power
 Achieves its organizational diversity goals. and Associates to develop an innovate set of predictive models to
support its pricing decisions for automobiles.
Example for Human Resource (HR) Analytics:
These models help Chrysler to better understand the
Sears Holding Corporation (SHC), owners of retailers Kmart ramifications of proposed pricing structures (a combination of
and Sears, Roebuck and Company, has created an HR analytics manufacturer’s suggested retail price, interest rate offers, and
team inside its corporate HR function. rebates) and, as a result, to improve its pricing decisions.
The team uses descriptive and predictive analytics to support The models have generated an estimated annual savings of
employee hiring and to track and influence retention. $500 million.
Marketing analytics
 Marketing is one of the fastest growing areas for the
application of analytics.
 A better understanding of consumer behavior through the
treatment:
Working with the Georgia Institute of Technology,
Memorial Sloan-Kettering Cancer Center developed a real-
time prescriptive model to determine the optimal placement
of radioactive seeds for the treatment of prostate cancer.
Using the new model, 20–30 percent fewer seeds are

needed, resulting in a faster and less invasive procedure.
Supply chain analytics

 The core service of companies such as UPS and FedEx is
the efficient delivery of goods, and analytics has long
been used to achieve efficiency.
Figure 1.3 - Google Trends for Marketing, Financial, and Human  The optimal sorting of goods, vehicle and staff
Resource Analytics, 2004–2012 scheduling, and vehicle routing are all key to profitability
for logistics companies such as UPS, FedEx, and others
 While interest in marketing, financial, and human resource like them.
analytics is increasing, the graph clearly shows the  Companies can benefit from better inventory and
pronounced increase in the interest in marketing analytics. processing control and more efficient supply chains.
Health care analytics Example for supply chain analytics:

Descriptive, predictive, and prescriptive analytics are used: ConAgra Foods uses predictive and prescriptive analytics
 To improve patient, staff, and facility scheduling to better plan capacity utilization by incorporating the
 Patient flow inherent uncertainty in commodities pricing.
 Purchasing
 Inventory control ConAgra realized a 100 percent return on their
 Use of prescriptive analytics for diagnosis and treatment investment in analytics in under three months—an unheard
of result for a major technology investment.
Analytics for government and nonprofits
Example for use of prescriptive analytics for diagnosis and  To drive out inefficiencies
 To increase the effectiveness and accountability of professional sports.
programs  To assess players for the amateur drafts and to decide
 Analytics for nonprofit agencies how much to offer players in contract negotiations.
 To ensure their effectiveness and accountability to their  Professional motorcycle racing teams that use
donors and clients. sophisticated optimization for gearbox design to gain
competitive advantage.
Example of analytics for government agencies:
Sports analytics (contd.)
The New York State Department has worked with IBM to use  The use of analytics for off-the-field business decisions is
prescriptive analytics in the development of a more effective also increasing rapidly.
approach to tax collection. The result was an increase in  Using prescriptive analytics, franchises across several
collections from delinquent payers of $83 million over two years. major sports dynamically adjust ticket prices throughout
the season to reflect the relative attractiveness and
Example of analytics for nonprofit agencies: potential demand for each game.
Catholic Relief Services (CRS) is the official international Web analytics - It is the analysis of online activity, which includes,
humanitarian agency of the U.S. Catholic community. The CRS but is not limited to, visits to Web sites and social media sites
mission is to provide relief for the victims of both natural and such as Facebook and LinkedIn.
human-made disasters and to help people in need around the
world through its health, educational, and agricultural programs. Leading companies apply descriptive and advanced analytics to
data collected in online experiments to:
CRS uses an analytical spreadsheet model to assist in the  Determine the best way to configure Web sites,
allocation of its annual budget based on the impact that its  Position ads, and
various relief efforts and programs will have in different  Utilize social networks for the promotion of products
countries. and services
 Online experimentation involves exposing various subgroups

Sports analytics to different versions of a Web site and tracking the results.
 Used for player evaluation and on-field strategy in
 Because of the massive pool of Internet users, experiments
can be conducted without risking the disruption of the
overall business of the company.
 Such experiments are proving to be invaluable because they

enable the company to use trial-and-error in determining
statistically what makes a difference in their Web site traffic
and sales.
Chapter 2
Descriptive Statistics
Overview of Using Data:

Definitions and Goals Variation
 The difference in a variable measured over observations.
Data
 The facts and figures collected, analyzed, and Random variable/uncertain variable
summarized for presentation and interpretation.  A quantity whose values are not known with certainty.
Variable  When we collect data, we are gathering past observed

 A characteristic or a quantity of interest that can take on values, or realizations of a variable.
different values.
 By collecting these past realizations of one or more variables,
Observation our goal is to learn more about the variation of a particular
 Set of values corresponding to a set of variables. business situation.
Table 2.1 - Data for Dow Jones Industrial Index Companies

the data in Table 2.1:
Variables: Symbol, Industry, Share Price, and Volume
Observation: Each row in Table 2.1
Variation: Time, customers, items, etc.
Types of Data
Population
 All elements of interest
Example:
With the thousands of publicly traded companies in the
United States, tracking and analyzing all of these stocks every day
would be too time consuming and expensive.
Sample
 Subset of the population
Random sampling - a sampling method to gather a representative

sample of the population data.
Example:
The Dow represents a sample of 30 stocks of large public
companies based in the United States, and it is often interpreted
to represent the larger population of all publicly traded
companies.
For
Quantitative data Time series data
 Data on which numeric and arithmetic operations, such  Data collected over several time periods.
as addition, subtraction, multiplication, and division, can  Graphs of time series data are frequently found in business
be performed. and economic publications.
 Help analysts understand what happened in the past,
Example: identify trends over time, and project future levels for the
The values for Volume in the Dow data in Table 2.1 can be time series.
summed to calculate a total volume of all shares traded by
companies included in the Dow.
Categorical data
 Data on which arithmetic operations cannot be
performed.
Example:
The data in the Industry column in Table 2.1 are categorical -
the number of companies in the Dow that are in the
telecommunications industry can be counted.
Cross-sectional data
 Data collected from several entities at the same, or Figure 2.1 - Dow Jones Index Values Since 2002
approximately the same, point in time.
Example: (contd.)
Example:
Cross-sectional data: The data in Table 2.1 are cross-sectional Time series data: Figure 2.1 illustrates that the DJI was near
because they describe the 30 companies that comprise the Dow 10,000 in 2002 and climbed to above 14,000 in 2007. However,
at the same point in time (April 2013). the financial crisis in 2008 led to a significant decline in the DJI to
between 6000 and 7000 by 2009. Since 2009, the DJI has been
generally increasing and topped 14,000 in April 2013.
Sources of data
Experimental study - A variable of interest is first identified.
 Then one or more other variables are identified and

controlled or manipulated so that data can be obtained
about how they influence the variable of interest.
Example:
If a pharmaceutical firm is interested in conducting an
experiment to learn about how a new drug affects blood
pressure, then blood pressure is the variable of interest in the
study. The dosage level of the new drug is another variable that
is hoped to have a causal effect on blood pressure. To obtain
data about the effect of the new drug, researchers select a
sample of individuals. The dosage level of the new drug is
controlled as different groups of individuals are given different
dosage levels. Before and after the study, data on blood pressure
Figure 2.2 - Customer Opinion Questionnaire used by Chops City
are collected for each group. Statistical analysis of these
Grill Restaurant
experimental data can help determine how the new drug affects
blood pressure.
Example:
Nonexperimental study:
Nonexperimental study or observational study
Figure 2.2 shows a customer opinion questionnaire used by
Make no attempt to control the variables of interest.
Chops City Grill in Naples, Florida.
 A survey is perhaps the most common type of observational
Note that the customers who fill out the questionnaire are
study.
asked to provide ratings for 12 variables, including overall
experience, the greeting by hostess, the table visit by the
manager, overall service, and so on.
The response categories of excellent, good, average, fair, and
poor provide categorical data that enable Chops City Grill Table 2.2 - Top 20 Selling Automobiles in United States in March
management to maintain high standards for the restaurant’s 2011
food and service. In some cases, the data needed for a particular Figure 2.3 - Top 20 Selling Automobiles Data entered into Excel
application already exist from an experimental or observational with Percent Change in Sales from 2010
study already conducted. Companies maintain a variety of
databases about their employees, customers, and business  Figure 2.3 shows the data from Table 2.2 entered into an
operations. Excel spreadsheet, and the percent change in sales for each
model from March 2010 to March 2011 has been calculated.
Modifying Data in Excel  This is done by entering the formula = (D2-E2)/E2 in cell F2
and then copying the contents of this cell to cells F3 to F20.
Sorting and filtering data in excel
Illustration - To sort the automobiles by March 2010 sales
Step 1: Select cells A1:F21
Step 2: Click the DATA tab in the Ribbon
Step 3: Click Sort in the Sort & Filter group
Step 4: Select the check box for My data has headers
Step 5: In the first Sort by dropdown menu, select Sales (March
2010)
Step 6: In the Order dropdown menu, select Largest to Smallest
Step 7: Click OK
accordingly.
Sorting and filtering data in excel
Illustration - Using Excel’s Filter function to see the sales of
models made by Toyota.
Figure 2.4 - Using Excel’s Sort Function to Sort the Top Selling Step 1: Select cells A1:F21
Automobiles Data Step 2: Click the DATA tab in the Ribbon
Figure 2.5 - Top Selling Automobiles Data Sorted by Sales in Step 3: Click Filter in the Sort & Filter group
March 2010 Sales Step 4: Click on the Filter Arrow in column B, next to
Manufacturer
 The result of using Excel’s Sort function for the March 2010 Step 5: Select only the check box for Toyota. You can easily
data is shown in Figure 2.5. deselect all choices by unchecking (Select All)
 Although the Honda Accord was the best-selling automobile
in March 2011, both the Toyota Camry and the Toyota
Corolla/Matrix outsold the Honda Accord in March 2010.
 Note that while Sales (March 2010), which is in column E, is
sorted, the data in all other columns are adjusted
Step 2: Click on the HOME tab in the Ribbon
Step 3: Click Conditional Formatting in the Styles group
Figure 2.6 - Top Selling Automobiles Data Filtered to Show Only Step 4: Select Highlight Cells Rules, and click Less Than from the
Automobiles Manufactured by Toyota dropdown menu
Step 5: Enter 0% in the Format cells that are LESS THAN: box
 The result (Figure 2.6) is a display of only the data for models Step 6: Click OK
made by Toyota. Figure 2.7 - Using Conditional Formatting in Excel to Highlight
 Of the 20 top-selling models in March 2011, Toyota made Automobiles with Declining Sales from March 2010
three of them.  Here, the models with decreasing sales (Toyota Camry, Ford
 Further filter the data by choosing the down arrows in the Focus, Chevrolet Malibu, and Nissan Versa) are now clearly
other columns. visible.
 All data can be made visible again by clicking on the down
arrow in column B and checking (Select All) or by clicking
Filter in the Sort & Filter Group again from the DATA tab.
Conditional Formatting of Data in Excel:
Makes it easy to identify data that satisfy certain
conditions in a data set.
Illustration - To identify the automobile models in Table 2.2 for
which sales had decreased from March 2010 to March 2011.
Step 1: Starting with the original data shown in Figure 2.3, select
cells F1:F21
Figure 2.8 - Using Conditional Formatting in Excel to Generate

Data Bars for the Top Selling Automobiles Data
 We can choose Data Bars from the Conditional Formatting

dropdown menu in the Styles Group of the HOME tab in the
Ribbon.
 Data bars are essentially a bar chart input into the cells that
show the magnitude of the cell values.
 The width of the bars in this display are comparable to the
values of the variable for which the bars have been drawn; a
value of 20 creates a bar twice as wide as that for a value of
10.
 Negative values are shown to the left side of the axis;
positive values are shown to the right.
 Cells with negative values are shaded in a color different
from that of cells with positive values.
Creating Distributions from Data

Table 2.3 - Data from a Sample of 50 Soft Drink Purchases
Frequency distributions for categorical data  The data in Table 2.3 is taken from a sample of 50 soft drink
purchases.
Frequency distribution  Each purchase is for one of five popular soft drinks, which
 A summary of data that shows the number (frequency) define the five bins: Coca-Cola, Diet Coke, Dr. Pepper, Pepsi,
of observations in each of several nonoverlapping and Sprite.
classes, typically referred to as bins, when dealing with
distributions.
Table 2.4 - Frequency Distribution of Soft Drink Purchases
The frequency distribution summarizes information about the

popularity of the five soft drinks:
 Coca-Cola is the leader, Pepsi is second, Diet Coke is
third, and Sprite and Dr. Pepper are tied for fourth.
 The frequency distribution of soft drink purchases (table 2.4)

is obtained by counting the number of times each soft drink
appears in Table 2.3.
 Coca-Cola appears 19 times, Diet Coke appears 8 times, Dr.

Pepper appears 5 times, Pepsi appears 13 times, and Sprite
appears 5 times.
 This frequency distribution provides a summary of how the

50 soft drink purchases are distributed across the five soft
drinks.
Figure 2.9 - Creating a Frequency Distribution for Soft Drinks Data

in Excel
 Figure 2.9 shows the sample of 50 soft drink purchases in an

Excel spreadsheet.
 Column D contains the five different soft drink categories as

the bins.
 In cell E2, enter the formula =COUNTIF($A$2:$B$26, D2),
where A2:B26 is the range for the sample data, and D2 is the
bin (Coca-Cola) to match.
 The COUNTIF function in Excel counts the number of times a

certain value appears in the indicated range.
 In this case, we want to count the number of times Coca- Table 2.5 - Relative Frequency and Percent Frequency
Cola appears in the sample data. The result is a value of 19 in Distributions of Soft Drink Purchases
cell E2, indicating that Coca-Cola appears 19 times in the
sample data.  Table 2.4 shows that the relative frequency for Coca-Cola is
19/50 = 0.38, the relative frequency for Diet Coke is 8/50 =
 The formula from cell E2 to cells E3 to E6 can be copied to 0.16, and so on.
get frequency counts for Diet Coke, Pepsi, Dr. Pepper, and
Sprite. By using the absolute reference $A$2:$B$26 in the  From the percent frequency distribution, it is seen that 38
formula. percent of the purchases were Coca-Cola, 16 percent of the
purchases were Diet Coke, and so on.
Relative frequency and percent frequency distributions
Relative frequency distribution  Note that 38 percent + 26 percent + 16 percent = 80 percent
 It is a tabular summary of data showing the relative of the purchases were the top three soft drinks.
frequency for each bin.
Frequency distributions for quantitative data
Percent frequency distribution Three steps necessary to define the classes for a frequency
 Summarizes the percent frequency of the data for each bin. distribution with quantitative data:
1. Determine the number of nonoverlapping bins.
 Used to provide estimates of the relative likelihoods of 2. Determine the width of each bin.
different values of a random variable. 3. Determine the bin limits.
 Figure 2.10 shows the data from Table 2.6 entered into an
Excel Worksheet.
 The sample of 20 audit times is contained in cells A2:D6.
 The upper limits of the defined bins are in cells A10:A14.
We can use the FREQUENCY function in Excel to count the

Table 2.6 - Year-End Audit Times (Days) number of observations in each bin:
Step 1. Select cells B10:B14
Step 2. Enter the formula =FREQUENCY(A2:D6, A10:A14).
The range A2:D6 defines the data set, and the range
A10:A14 defines the bins
Step 3. Press CTRL+SHIFT+ENTER
 Excel will then fill in the values for the number of

observations in each bin in cells B10 through B14 because
Table 2.7 - Frequency, Relative Frequency, and Percent these were the cells selected in Step 1 above.
Frequency Distributions for the Audit Time Data
Histogram
 A common graphical presentation of quantitative data
 Constructed by placing the variable of interest on the

horizontal axis and the selected frequency measure
(absolute frequency, relative frequency, or percent
frequency) on the vertical axis.
 The frequency measure of each class is shown by

drawing a rectangle whose base is determined by the
class limits on the horizontal axis and whose height is
Figure 2.10 - Using Excel to Generate a Frequency Distribution for the corresponding frequency measure.
Audit Times Data
 Histograms can be created in Excel using the Data Analysis
ToolPak. Following are the steps to create histogram in Excel.

Step 1. Click the DATA tab in the Ribbon
Step 2. Click Data Analysis in the Analysis group
Step 3. When the Data Analysis dialog box opens, choose
Histogram from the list of Analysis Tools, and click OK

Figure 2.11 - Histogram for the Audit Time Data In the Input Range: box, enter A2:D6
In the Bin Range: box, enter A10:A14
 In figure 2.11, note that the class with the greatest frequency Under Output Options:, select New Worksheet Ply:
is shown by the rectangle appearing above the class of 15–19 Select the check box for Chart Output
days. Click OK
 The height of the rectangle shows that the frequency of this
class is 8.
Figure 2.13 - Completed Histogram for the Audit Time Data using
Data Analysis ToolPak in Excel
 In figure 2.13, we have modified the bin ranges in column A

Figure 2.12 - Creating a Histogram for the Audit Time Data using by typing the values shown in Figure 2.13 into cells A2:A6 so
Data Analysis Toolpak in Excel that the chart created by Excel shows both the lower and
upper limits for each bin. Moderately skewed to the left
Here, tail extends farther to the left than to the right.
 We have also removed the gaps between the columns in the Example: Exam scores, with no scores above 100 percent, most
histogram in Excel to match the traditional format of of the scores above 70 percent, and only a few really low scores.
histograms.
Panel B:
 To remove the gaps between the columns in the Histogram Moderately skewed to the right
created by Excel, follow these steps: Tail extends farther to the right than to the left.
Step 1. Right-click on one of the columns in the histogram Example: Housing prices; a few expensive houses create the
Select Format Data Series… skewness in the right tail.
Step 2. When the Format Data Series pane opens, click the
Series Options button. Panel C:
Set the Gap Width to 0% Symmetric
The left tail mirrors the shape of the right tail.
Example: Data for SAT scores, the heights and weights of people,
and so on lead to histograms that are roughly symmetric.
Panel D:
Highly skewed to the right
Example: Data on housing prices, salaries, purchase amounts,
and so on often result in histograms skewed to the right.
Histogram
provides information about the shape, or form, of a distribution.
Skewness
Lack of symmetry
Figure 2.14 - Histograms Showing Distributions with Different
Levels of Skewness  Important characteristic of the shape of a distribution
Panel A: Cumulative Distributions
or equal to 24. Hence, the
Cumulative frequency distribution  cumulative frequency for this class is 17.
A variation of the frequency distribution that provides  In addition, the cumulative frequency distribution in Table
another tabular summary of quantitative data. 2.8 shows that four audits were completed in 14 days or less
and that 19 audits were completed in 29 days or less.
 Uses the number of classes, class widths, and class limits  The cumulative relative frequency distribution can be
developed for the frequency distribution. computed either by summing the relative frequencies in the
 Shows the number of data items with values less than or relative frequency distribution or by dividing the cumulative
equal to the upper class limit of each class. frequencies by the total number of items.
 Using the latter approach, we found the cumulative relative
frequencies in column 3 of Table 2.8 by dividing the
cumulative frequencies in column 2 by the total number of
items (n = 20).
 The cumulative percent frequencies were again computed by
multiplying the relative frequencies by 100.
 The cumulative relative and percent frequency distributions
show that 0.85 of the audits, or 85 percent, were completed
in 24 days or less, 0.95 of the audits, or 95 percent, were
completed in 29 days or less, and so on.
Table 2.8 - Cumulative Frequency, Cumulative Relative
Frequency, and Cumulative Percent Frequency Distributions for Measures of Location
the Audit Time Data
Mean/Arithmetic
  Mean/Arithmetic
Consider the class mean
with the description “Less than or equal to  Average value for a variable
24.” Average value for a variable.  The mean is devoted by x
 The cumulative frequency for this class is simply the sum of
The
the frequencies for all mean
classes is denoted
with data by or´𝑥 .
values less than
equal to 24.
 The sum of the frequencies for classes 10–14, 15–19, and
20–24 indicates that 4 + 8 + 5 = 17 data values are less than n sample size
x1 value of x for the first observation
n = sample size
𝑥 = value of variable x for the first observation
1
𝑥
x2 value of x for the second observation
xn value of x for the nth observation
Median
Illustration  Value in the middle when the data are arranged in
Computation of the mean home selling price for the sample ascending order.
of 12 home sales:
 Middle value, for an odd number of observations
 Average of two middle values, for an even number of

observations
Illustration
When the number of observations are odd
Consider the class size data for a sample of five college classes:
46 54 42 46 32
Arrange the class size data in ascending order .
Table 2.9 - Data on Home Sales in Cincinnati, Ohio, Suburb 32 42 46 46 54
Middlemost value in the data set = 46.
Median is 46.
 Because n = 5 is odd, median is the middlemost value in the

data set, 46.
Illustration Value that occurs most frequently in a data set.
When the number of observations are even
Consider the data on home sales in Cincinnati, Ohio, Suburb: Consider the class size data:
32 42 46 46 54
Observe - 46 is the only value that occurs more than once.

Mode is 46.
Multimodal data
Data contain at least two modes.
Bimodal data
Data contain exactly two modes.
Illustration (contd.)
When the number of observations are even
Arrange the
data in
ascending
order:
Median=average of two middle values=
 Because n = 12 is even, the median is the average of the

middle two values: 199,500 and 208,000.
Figure 2.15 - Calculating the Mean, Median, and Modes for the
Mode Home Sales Data using Excel
 The mean can be found in Excel using the AVERAGE function. Illustration
The value for the mean in cell E2 is calculated using the Consider the percentage annual returns and growth factors
formula =AVERAGE(B2:B13). for the mutual fund data over the past 10 years.
 The median of a data set can be found in Excel using the

function MEDIAN. The value for the median in cell E3 is
found using the formula =MEDIAN(B2:B13).
 The Excel MODE.SNGL function will return only a single most-

often-occurring value.
 For multimodal distributions, we must use the MODE.MULT

command in Excel to return more than one mode.
Table 2.10 - Percentage Annual Returns and Growth Factors for
To find both of the modes in Excel, we take these steps: the Mutual Fund Data
Step 1. Select cells E4 and E5
Step 2. Enter the formula =MODE.MULT(B2:B13)  We will determine the mean rate of growth for the fund over
Step 3. Press CTRL+SHIFT+ENTER the 10-year period.
 Excel enters the values for both modes of this data set in Solution:
cells E4 and E5: $138,000 and $254,000. Product of the growth factors:
Geometric mean
nth root of the product of n values
Geometric mean of the growth factors:
 Used in analyzing growth rates in financial data
 Sample geometric mean:  Conclude that annual returns grew
at an average annual rate of (1.09-1) 100 % or 2.9%
Largest home sales price - $456,250
Smallest home sales price - $108,000
Range = Largest value – Smallest value
= $456,250 – $108,000
= $348,250
Drawback
Range is based on only two of the observations and thus is
Figure 2.16 - Calculating the Geometric Mean for the Mutual highly influenced by extreme values.
Fund Data Using Excel
Variance
 In Figure 2.16, the value for the geometric mean in cell C13 is Measure of variability that utilizes all the data
found using the formula =GEOMEAN(C2:C11).
 It is based on the deviation about the mean, which is the
Measures of Variability difference between the value of each observation (xi) and the
mean.
Range
Found by subtracting the smallest value from the largest  The deviations about the mean are squared while computing
value in a data set. the variance.
Illustration Sample variance,

Consider the data on home sales in Cincinnati, Ohio, Suburb:
Population variance,
 Expressed as a percentage
Table 2.12 - Computation of Deviations and Squared Deviations

about the Mean for the Class Size Data
Computation of Sample Variance:
Standard deviation
Positive square root of the variance
Measured in the same units as the original data
 For sample,
 For population,
Coefficient of variation
 Measures the standard deviation relative to the mean

Business Analytics

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Business Analytics

Uploaded by

Copyright:

Available Formats

Chapter 1

Three developments spurred recent explosive growth in the use

 Businesses want to use these data to improve the

 Technological advances such as improved point-of-sale

>Advances in computational approaches to effectively

>Faster algorithms for optimization and simulation, and

>More effective approaches for visualizing data.

 Better computing hardware, parallel computing, and Strategic decisions

 They are usually the responsibility of mid level

Used in the pharmaceutical industry to assess the risk of

Techniques used in Predictive Analytics: contd.

Prescriptive Analytics Model Field Purpose

Simulation optimization Supply Operations Provide the cost-minimizing

Financial analytics (contd.)

Figure 1.2 - The Spectrum of Business Analytics

Using the new model, 20–30 percent fewer seeds are

Supply chain analytics

Health care analytics Example for supply chain analytics:

 Online experimentation involves exposing various subgroups

 Such experiments are proving to be invaluable because they

Overview of Using Data:

Variable  When we collect data, we are gathering past observed

Table 2.1 - Data for Dow Jones Industrial Index Companies

Random sampling - a sampling method to gather a representative

Experimental study - A variable of interest is first identified.

 Then one or more other variables are identified and

Figure 2.8 - Using Conditional Formatting in Excel to Generate

 We can choose Data Bars from the Conditional Formatting

Creating Distributions from Data

Table 2.4 - Frequency Distribution of Soft Drink Purchases

The frequency distribution summarizes information about the

 The frequency distribution of soft drink purchases (table 2.4)

 Coca-Cola appears 19 times, Diet Coke appears 8 times, Dr.

 This frequency distribution provides a summary of how the

Figure 2.9 - Creating a Frequency Distribution for Soft Drinks Data

 Figure 2.9 shows the sample of 50 soft drink purchases in an

 Column D contains the five different soft drink categories as

 The COUNTIF function in Excel counts the number of times a

We can use the FREQUENCY function in Excel to count the

 Excel will then fill in the values for the number of

 Constructed by placing the variable of interest on the

 The frequency measure of each class is shown by

ToolPak. Following are the steps to create histogram in Excel.

Histogram from the list of Analysis Tools, and click OK

 In figure 2.13, we have modified the bin ranges in column A

 Average of two middle values, for an even number of

 Because n = 5 is odd, median is the middlemost value in the

Observe - 46 is the only value that occurs more than once.

Median=average of two middle values=

 Because n = 12 is even, the median is the average of the

 The median of a data set can be found in Excel using the

 The Excel MODE.SNGL function will return only a single most-

 For multimodal distributions, we must use the MODE.MULT

Illustration Sample variance,

Table 2.12 - Computation of Deviations and Squared Deviations

Computation of Sample Variance:

Measured in the same units as the original data

 Measures the standard deviation relative to the mean

You might also like