Topic 3 Overview of Using Data

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

Topic 3 Overview of using data

Vincent Hoang (2022), Lecture 3


Camn et al (2016), Chapter 2
Review

BA capability
Source: Modified from Gartner
Two main goals

DATA TYPES, SOURCES, INTRODUCTION TO MODELS


PRIVACY
Data for Business Analytics
• Data: numbers or textual data that are collected through some type
of measurement process
• Information: result of analysing data; that is, extracting meaning from
data to support evaluation and decision making
Metrics and Data Classification
• Metric - a unit of measurement that provides a way to objectively
quantify performance.
• Measurement - the act of obtaining data associated with a metric.
• Measures - numerical values associated with a metric.
Types of Metrics
• Discrete metric - one that is derived from counting something.
◦ A delivery is either on time or not
◦ An order is complete or incomplete
◦ An invoice can have one, two, three, or any number of errors
◦ The number of incomplete orders each day (or the number of errors per invoice)

• Continuous metrics are based on a continuous scale of


measurement.
◦ Any metrics involving dollars, length, time, volume, or weight, for example, are
continuous.
Properties and Scales of measurement
• Scales of measurement is how variables are defined and categorised.
◦ Four common scales of measurement: nominal, ordinal, interval and ratio.

• Each scale of measurement has properties that determine how to


properly analyse the data.
◦ Four properties are of our interest: identity, magnitude, equal intervals and a
minimum value of zero.
Four properties of data
• Identity: Identity refers to each value having a unique meaning.
• Magnitude: Magnitude means that the values have an ordered
relationship to one another, so there is a specific order to the variables.
• Equal intervals: Equal intervals mean that data points along the scale are
equal, so the difference between data points one and two will be the same
as the difference between data points five and six.
• A minimum value of zero: A minimum value of zero means the scale has a
true zero point. Degrees, for example, can fall below zero and still have
meaning. But if you weigh nothing, you don’t exist.
Source: https://studyonline.unsw.edu.au/sites/default/files/UNSW2.png
Nominal scale of measurement
• The nominal scale of measurement defines the identity property of data. The
data can be placed into categories. Examples: eye colour and country of birth.
• This scale doesn’t have any form of numerical meaning. The data can’t be
multiplied, divided, added or subtracted from one another. It’s not possible to
measure the difference between data points.
• Nominal data can be broken down again into three categories:
◦ Nominal with order: data can be sub-categorised in order, e.g. “cold, warm, hot and very hot.”
◦ Nominal without order: data can be sub-categorised as nominal without order, such as male
and female.
◦ Dichotomous: having only two categories or levels, such as “yes’ and ‘no’.
Ordinal scale of measurement
• The ordinal scale defines data that is placed in a specific order.
• While each value is ranked, there’s no information that specifies what
differentiates the categories from each other.
• These values can’t be added to or subtracted from.
• Examples:
◦ satisfaction data points in a survey, where ‘one = happy, two = neutral, and three =
unhappy.’
◦ Where someone finished in a race: first place, second place or third place
◦ Data show what orders the runners finished in but each value doesn’t specify how far the first-place finisher
was in front of the second-place finisher.
Interval scale of measurement
• The interval scale contains properties of nominal and ordered data, but the difference between data points
can be quantified.
• This type of data shows both the order of the variables and the exact differences between the variables.
• They can be added to or subtracted from each other, but not multiplied or divided. For example, 40 degrees
is not 20 degrees multiplied by two.
• The number zero is an existing variable. In the ordinal scale, zero means that the data does not exist. In the
interval scale, zero has meaning – for example, if you measure degrees, zero has a temperature.
• Data points on the interval scale have the same difference between them. The difference on the scale
between 10 and 20 degrees is the same between 20 and 30 degrees.
• This scale is used to quantify the difference between variables, whereas the other two scales are used to
describe qualitative values only.
Ratio scale of measurement
• Ratio scales of measurement include properties from all four scales of measurement.
• The data is nominal, can be classified in order, contains intervals and can be broken
down into exact value. Examples: weight, height and distance.
• Data in the ratio scale can be added, subtracted, divided and multiplied.
• Ratio scales also differ from interval scales in that the scale has a ‘true zero’.
◦ The number zero means that the data has no value point. An example of this is height or weight, as
someone cannot be zero centimetres tall or weigh zero kilos – or be negative centimetres or
negative kilos.
◦ Examples of the use of this scale are calculating shares or sales.

• Of all types of data on the scales of measurement, data scientists can do the most with
ratio data points.
Populations and Samples

 Population - all items of interest for a particular decision or investigation


- all married drivers over 25 years old
- all subscribers to Netflix
 Sample - a subset of the population
- a list of individuals who rented a comedy from
Netflix in the past year
 The purpose of sampling is to obtain sufficient information to draw a valid
inference about a population.
Data collection: primary data
• Data collection for research and analytics can broadly be divided into two
major types: primary data and secondary data.
• Primary data is collected “at source” and specifically for the research at
hand.
◦ The data source could be individuals, groups, organizations, etc.
◦ Data would be actively elicited or passively observed and collected.
◦ Surveys, interviews, and focus groups all fall under the ambit of primary data.
◦ The main advantage of primary data is that it is tailored specifically to the questions
posed by the research project.
◦ The disadvantages are cost and time.
Data collection: secondary data
• Secondary data is that which has been previously collected for a
purpose that is not specific to the research at hand.
• Examples: sales records, industry reports, and interview transcripts
from past research are data that would continue to exist whether or
not the project at hand had come to fruition.
Examples of Data Sources
• Annual reports
• Accounting audits
• Financial profitability analysis
• Economic trends
• Marketing research
• Operations management performance
• Human resource measurements
• Web behaviours: page views, visitor’s country, time of view, length of time, origin and destination
paths, products they searched for and viewed, products purchased, what reviews they read, and
many others
Source: https://www.questionpro.com/blog/data-collection-methods/
Big Data
• Big data refers to massive amounts of business data (volume) from
a wide variety of sources (variety), much of which is available in real
time (velocity), and much of which is uncertain or unpredictable
(veracity).
• The effective use of big data has the potential to transform societies,
economies, industries, and organisations.
• For businesses, using big data will become a key basis of
competition for existing companies.
The Seven V's characteristics of Big Data Analytics
Source: Saggi, M.K., Jain, S., 2018. A Survey Towards an Integration of Big Data Analytics to Big Insights for Value-Creation. Information
Processing & Management 54, 758-790
https://images.app.goo.gl/3LVgJtvisCssSVHf8
A case study
• You would like to know what determines the happiness of students
studying in Vietnamese universities.
• Discuss the following questions
◦ What data do you need?
◦ List some variables and for each variables,
◦ Identify what type (quantitative or qualitative) and scale of measurement
◦ If you have so many variables, how would you decide what variables are better than others?
◦ What are possible data sources and how do you collect data?
Source: https://studyonline.unsw.edu.au/sites/default/files/UNSW2.png
Reliability and Validity
• These two concepts can be applied to data and research design &
testing. With respect to data (in the field of statistics)
◦ Reliability - data are consistent and accurate (or accurately
collected/measured).
◦ Validity - data correctly measures what it is supposed to measure.
Class discussion
• A survey question that asks a customer to rate the quality of the food
in a restaurant for the purpose of measuring customer satisfaction.
Are data collected reliable and valid?
• If not, suggest better types of data to measure customer satisfaction.
Data Privacy: how can or how should?
• Legal Standards: established by law, order, or rule to compel treatment of certain classes of data

• Ethical Standards: established by industry bodies or professional organizations which seek to


establish non-legally binding treatment of information
◦ Most academic/ Science/ Medical/ Legal fields have broad ethical standards-making bodies, some of which
address use of data
◦ Marketing/Advertising associations and alliances or initiatives that also provide some broad guidelines

• Policy Standards: established by a company or agency’s own published Data Privacy policy
◦ Companies should have formal privacy policies that are actively disclosed to consumers. Generally, these
policies outline what is captured & shared, and outline opt-out or opt-in procedures

• Good Judgment Standards: one should always stop to ask “Is this a good idea?” and “What
might be the consequences?”
Personally Identifiable Information (PII)
• Definition: A information about an individual maintained by an
agency, including
◦ any information that can be used to distinguish or trace an individual‘s identity,
such as name, social security number, date and place of birth, mother‘s
maiden name, or biometric records; and
◦ any other information that is linked or linkable to an individual, such as medical,
educational, financial, and employment information

• Examples?
PII-Related Regulations

https://www.strac.io/blog/pii-laws-regulations-worldwide#how-is-pii-regulated
Other sources of consumer information
• Consumer financial information:
◦ A consumer provides to a financial institution to obtain a financial product or service from the
institution
◦ Results from a transaction between the consumer and the institution involving a financial product or
service
◦ A financial institution otherwise obtains about a consumer in connection with providing a financial
product or service

• Data collected by telecommunications companies about a consumer's telephone calls.


◦ It includes the time, date, duration and destination number of each call, the type of network a
consumer subscribes to, and any other information that appears on the consumer's telephone bill.

• Protected health information


Two main goals

DATA TYPES, SOURCES, INTRODUCTION TO MODELS


PRIVACY
Models in Business Analytics
• Model is an abstraction or representation of a real
system, idea, or object.
◦ Captures the most important features
◦ Three forms of a model:
◦ a written or verbal description,
◦ a visual representation,
◦ a mathematical formula, or a spreadsheet
Examples (1)
• The sales of a new product, such as a first-generation iPad or 3D
television, often follow a common pattern.
◦ Verbal description: The rate of sales starts small as early adopters begin to
evaluate a new product and then begins to grow at an increasing rate over time
as positive customer feedback spreads. Eventually, the market begins to
become saturated, and the rate of sales begins to decrease.
Examples (2)
• The sales of a new product,
such as a first-generation
iPad or 3D television, often
follow a common pattern.
◦ Visual model: A sketch of sales as
an S-shaped curve over time
Examples (3)
• The sales of a new product, such as a first-generation iPad or 3D
television, often follow a common pattern.
◦ Mathematical model:

where S is sales, t is time, e is the base of natural logarithms, and a, b and c


are constants
Often we use data to estimate this equation, i.e. to estimate the values for a, b,
and c.
Real World vs
Model World
• A model is an abstraction, or
simplification, of the real world.
• Model is a laboratory—an artificial
environment—in which we can
experiment and test ideas without the
costs and risks of experimenting with
real systems and organizations.
• Source: Powell & Baker 2016. Business
Analytics: The Art of Modelling with
Spreadsheets. Wiley, New York.
Formulation
• We abstract the essential features of the
real world, leaving behind all the
nonessential detail and complexity.
• We then construct our laboratory by
combining our abstractions with specific
assumptions and building a model of the
essential aspects of the real world.
• This is the process of model
formulation.
Decision models
• Inputs:
◦ Data of
◦ Uncontrollable inputs - quantities that can change but cannot be
controlled
◦ Decision variables (options) - controllable and selected at the
discretion of the decision maker
Features of a model
• Data for all variables used in the model:
◦ Uncontrollable inputs - quantities that can change but cannot be controlled
◦ Decision options: decision variables which refer to possible choices, or courses
of action, that we might take.
◦ Outcomes refers to consequences of the decisions- the performance measures
we use to evaluate the results of taking action. Examples include revenue,
cost, profit, or efficiency, or market share, etc.

• Structure.
Model structure
• Structure refers to the logic and the mathematics that link the
elements of our model together.
• A simple example might be the equation P = R - C, in which profit is
calculated as the difference between revenue and cost.
• Another example might be the relationship F = I + P - S, in which
final inventory is calculated from initial inventory, production, and
shipments.
Analysis
• Once built, we can use the model to test ideas & evaluate solutions.

• This process applies logic to take us from our assumptions and


abstractions to a set of derived conclusions. It also relies on
mathematics and reason to explore the implications of our
assumptions. This exploration process leads, hopefully, to insights
about the problem confronting us.

• Sometimes, these insights involve an understanding of why one


solution is beneficial, and another is not; at other times, the insights
involve understanding the sources of risk in a particular solution.

• In another situation, the insights involve identifying the decisions that


are most critical to a good result, or identifying the inputs that have
the strongest influence on a particular outcome.
Interpretation
• To make the model insights useful, we must first
translate them into the terms of the real world and
then communicate them to the actual decision
makers involved.

• Only then do model insights turn into useful


managerial insights. And only then can we begin the
process of evaluating solutions in terms of their
impact on the real world.
Descriptive models
plus fixed cost of $50,000
• Descriptive models explain • Production cost:
• Outsourcing cost:
behaviour and allow users to
• Q = production volume
evaluate potential decisions •
by asking “what-if?” •

questions. • Breakeven Point:

• Example: An outsourcing $50,000 + $125 × Q = $175 × Q

decision model $50,000 = 50 × Q

Q = 1,000
• If Q < 1,000, outsourcing is cheaper.
An outsourcing decision model
Predictive Models
• Predictive models focus on what will happen in the future. Many
predictive models are developed by analysing historical data and
assuming that the past is representative of the future.
• A sales-promotion decision model in the grocery industry: managers typically need to
know how best to use pricing, coupons, and advertising strategies to influence sales.
• Grocers often study the relationship of sales volume to these strategies by conducting
controlled experiments to identify the relationship between them and sales volumes.
That is, they implement different combinations of pricing, coupons, and advertising,
observe the sales that result, and use analytics to develop a predictive model of sales
as a function of these decision strategies.
A Sales-Promotion Decision Model
Model:

Total Sales = 1105.55 + 56.18 x Price + 123.88 x


Coupon + 5.25 x Advertising

If the price is $6.99, no coupons are offered, and no


advertising is done (the experiment corresponding to
week 1), the model estimates sales as ….?

Total Sales = 1105.55 + 56.18 x 6.99 + 123.88 x 0 +


5.25 x 0 = …..

Total Sales = 1105.55 + 56.18 x 6.99 + 123.88 x 0 + 5.25 x 0 = 1,498.25 units


Predictive models
• Prescriptive models help decision makers identify the best solution to a
decision problem. “Best” here refers to objectives in the optimisation problems at
hand.
• Optimization - finding values of decision variables that minimize (or maximize)
something such as cost (or profit)
– Objective function - the equation that minimizes (or maximizes) the quantity
of interest
– Optimal solution - values of the decision variables at the minimum (or
maximum) point
A Prescriptive Pricing Model
• A firm wishes to determine the best pricing for one of its products in order
to maximize revenue.
• Analysts determined the following model:
Sales = −2.9485 × Price + 3,240.9
Total Revenue = Price × Sales

• Identify the price that maximizes total revenue.


Model assumptions
• Assumptions are made to
◦ simplify a model and make it more tractable (i.e. able to be analysed or
solved).
◦ better characterize historical data or past observations.

• The task of the modeler is to select or build an appropriate model


that best represents the behaviour of the real situation.
Model Assumptions - Example
• Economic theory tells us that demand for a product is negatively related to
its price. Thus, as prices increase, demand falls, and vice versa (modelled
by price elasticity - the ratio of the percentage change in demand to the
percentage change in price).
• A key assumption in developing a model is the type of relationship
between demand and price.
• CRUCIAL to check if assumptions are reasonable and hold in the real
world. For example, if P goes up, D goes down for any goods or services?
A Linear Demand
Prediction Model
• As price increases, demand
falls. A simple model is:

• where D is the demand, P is the


unit price, a is a constant that
estimates the demand when the
price is zero, and b is the slope
of the demand function.
A Non-Linear Demand
Prediction Model
• Assumes price elasticity is constant
(constant ratio of % change in
demand to % change in price):

• where c is the demand when the


price is 0 and d > 0 is the price
elasticity.

You might also like