Download as pdf or txt
Download as pdf or txt
You are on page 1of 45



Week 8: Measurement
and Data (Science)

Dr. Kenan Kalaycı

CRICOS code 00025B 1

Lesson objectives
At the end of this lesson, you will be able to:

1. Identify the challenges to estimation

2. Choose between different estimation methods

CRICOS code 00025B 2

Profit maximization


CRICOS code 00025B 3

In God we trust;
all others must bring data.

CRICOS code 00025B 4

“It would be nice if all of the data which sociologists require could be enumerated
because then we could run them through IBM machines and draw charts as the
economists do. However,

not everything that can be counted counts,

and not everything that counts can be

Cameron, W.B., 1963. Informal sociology: A casual introduction to sociological thinking (Vol. 21). Random House.
CRICOS code 00025B 5
Risks with glorifying Data
Management maxim: “You can’t manage what
you don’t measure”.
- → “What gets measured gets managed”.
Data should be the foundation for the decision-
making process, not a substitute for good
CRICOS code 00025B 6
Risks with glorifying Data

• Many important things cannot be quantified.

• Data can be manipulated (juking the stats).
• Data generating process can be biased.
• The past is not always a reliable indicator of
CRICOS code 00025B 7
Data science
Data science is an interdisciplinary field that combines tools from
statistics, computer science, machine learning and various social
sciences to extract insights from data.

Big Data: Digital technologies led to the creation and accumulation

of massive amounts of unstructured data.

Data became cheap. Its complement, data scientist, became

CRICOS code 00025B 8
What makes data big?
• Currently, every person generates about 100+ GB of data each day.
• Data is collected in real-time.
• Multiple data sources: messages, tweets, images and video posted
to social networks; readings from sensors; GPS signals from cell
phones, security camera recordings, and more.
• High dimensionality.
CRICOS code 00025B 9
Data Science Manager
A data science manager is usually responsible for:
• recruiting data engineers and data scientists;
• identifying problems that need to be solved,
• putting the right people on the right problem;
• setting goals and priorities;
• managing the data science process.
Ideally, data science managers should be generalists who have
knowledge of the software and hardware being used, have good
communication skills and domain knowledge.

CRICOS code 00025B 10

Ceteris paribus

CRICOS code 00025B 11

Challenges with estimation
Ceteris paribus: other things being equal or held constant
• Everything else is usually not constant.
- Selection bias
- Unobserved variables (e.g. willingness to pay of
consumers who did not buy the good)
- Measurement error
• These are problems with internal validity =>biased
CRICOS code 00025B 14
The Furious Five

• Randomised trials/experiments
• Regression
• Instrumental variables
• Difference-in-Differences
• Regression Discontinuity Design (RDD)

CRICOS code 00025B 15

Market experiments
• Becoming increasingly common using online platforms
• Any business can run an advertisement experiment on
• Usually limited to individual companies
- Findings are rarely in public domain
• Ethical concerns
- People don’t like being unfavourably discriminated

CRICOS code 00025B 16

Advantages of experiments
1. The experimenter controls the values of key variables
a. Keep it constant or vary it across treatments
2. Randomized treatments
a. Enables the experimenter to create a counter-factual
3. Replicability
• Happenstance field data is hard to replicate
- Many variables change over time and place and often are not
- Data collection procedures are often not transparent
CRICOS code 00025B 17
Experimental design
Avoiding confounds
• Between the control and the treatment there should be only one
variable that is different. This way you can attribute a difference in
results to this particular variable
• If you change two variables at once you don’t know whether the
effect was caused by one variable or the other. Also, if you don’t
find an effect it might be because the two effects that the two
variables cancel each other out.

CRICOS code 00025B 18

Linear Regression

• Of limited use in demand estimation unless we have

experimental data on all agents’ Willingness to Pay and
Willingness to Sell.

CRICOS code 00025B 19

Instrumental Variables (IV)
• Used in many economic applications when correlation between
the explanatory variables and the error term is suspected - for
example, due to omitted variables, measurement error, or other
sources of simultaneity bias.

CRICOS code 00025B 20

Instrumental Variables (IV)
We replace the actual values of the explanatory variable by
predicted values of the explanatory that are - related to the actual
explanatory variable - but uncorrelated with the error term.
e.g. For determinants of the supply of fish that do not affect the
demand for fish, and, similarly, to identify the supply function we
look for determinants of the demand for fish that do not affect the

CRICOS code 00025B 21

Instrumental Variables (IV)
Common source of good IVs:

• Natural experiments (e.g. sudden policy change, technology

- Needs to be unanticipated
• Experiments of nature
- Rainfall, floods, earthquakes, draughts,

CRICOS code 00025B 22

Difference in Differences (DID)
• Outcomes are observed for two groups for two time periods.
• One of the groups is exposed to a treatment in the second period but not
in the first period.
• The other group is not exposed to the treatment at all.
• The average gain in the second (control) group is subtracted from the
average gain in the first (treatment) group. This removes biases in second
period comparisons between the treatment and control group that could be
the result from permanent differences between those groups, as well as
biases from comparisons over time in the treatment group that could be the
result of trends.
The failure of the parallel trend assumption may be a common problem, causing
estimators to be biased.
CRICOS code 00025B 23
Regression Discontinuity Design (RDD)
• Measures the impact of a treatment, by applying a treatment
assignment mechanism based on a continuous eligibility index.
• If treatment is assigned to those either above or below a certain
“cut-off” point, RDD can be used to measure the difference in
outcomes of individuals clustered around the defined cut-off point.
• Need to determine a ‘bandwidth’ around the cut-off point within
which individual units are shown to be statistically comparable. If
so, the difference in outcomes between those above and below
the cut-off can be attributed to the treatment.
CRICOS code 00025B 25
Regression Discontinuity Design (RDD)
Common assignment variables:
• Test/exam scores, GPA =>Common to have a cut-off score for eligibility
• Age, birthdate =>Common to have a cut-off age/birthdate for eligibility
• Geographic location => Similar nearby locations might have different
administrative units
- Tweed Heads(NSW) & Coolangatta (QLD)
• Employment/unemployment duration

• Treatment assignment at the threshold can be "as good as random" if there is

randomness in the assignment variable and the agents considered
(individuals, firms, etc.) cannot perfectly manipulate their treatment status.

CRICOS code 00025B 27

How do you estimate
demand for something that
doesn’t exist?
CRICOS code 00025B 28
Prelaunch Demand Estimation

• Focus groups
• Hypothetical consumer surveys or choice experiments
• Test marketing
- Crowdfunding (e.g. Kickstarter)
- Actual product
- Minimum viable product

CRICOS code 00025B 29

Can we generalise results from laboratory
experiments to real markets?

CRICOS code 00025B 30

External validity

• Applying the conclusions of an economic study outside the

context of that study
- Environment, culture, market, people, time period
• Requires more studies (replications) that vary the context

CRICOS code 00025B 31

Machine Learning (ML)

Machine learning algorithms build a mathematical model based on

training data, to make predictions on new data.

ML is used:
- to predict a future that looks mostly like the past;
- for pattern recognition;
- for decision making.

CRICOS code 00025B 32

Correlation != Causation
But sometimes correlation is all you need.

Data scientists are often only interested in

prediction, not causality:
• If you liked the movie Frozen, you might like
the movie Madagascar.
CRICOS code 00025B 33
Machine Learning Paradigms
1. Unsupervised learning - uses training data that contains the
inputs but not the outputs to build an algorithm to uncover
patterns in the data.
- E.g. cluster groups with similar purchasing habits for targeting

CRICOS code 00025B 34

Machine Learning
2. Supervised learning – uses training data that contains both the
inputs and the desired outputs to build an algorithm to predict the
output when it is not observed.
- E.g. classification algorithms (spam/not spam)
3. Reinforcement Learning – to learn how to take actions to
maximize cumulative reward.
- Trade-off between exploration and exploitation (of current

CRICOS code 00025B 36

A data scientist needs to be able write code. Most popular
programming languages used by data scientists are:
• R
• Rich ecosystem, open-source library of packages
• Python
• Better suited for machine learning at a large-scale.
• Easier to maintain and more robust codes than R
Data Extraction Tools:
• SQL: Structured Query Language
CRICOS code 00025B 37
Data Visualisation

Data visualization is the graphical representation of information and

• To provide an accessible way to see and understand trends,
outliers, and patterns in data.
• Uses visual elements like charts, graphs, and maps, animations.
• Tools: Tableau, PowerBI, Qlikview, Chart Studio, FusionCharts,
Highcharts, Datawrapper, Sisense, Chart.js, D3.js….

CRICOS code 00025B 39

Data Visualisation Principals
1. Show the data
2. Induce the viewer to think about the substance rather than about methodology, graphic
design, the technology of graphic production or something else.
3. Avoid distorting what the data has to say.
4. Present many numbers in a small space.
5. Make large data sets coherent.
6. Encourage the eye to compare different pieces of data.
7. Reveal the data at several levels of detail, from a broad overview to the fine structure.
8. Serve a reasonably clear purpose: description, exploration, tabulation or decoration.
9. Be closely integrated with the statistical and verbal descriptions of a data set.

Tufte, Edward (1983). The Visual Display of Quantitative Information. Cheshire, Connecticut: Graphics Press. ISBN 0-9613921-4-2.
CRICOS code 00025B 40
Reproducibility of data science reports

Reproducibility: An independent researcher should be able to

replicate the analysis and achieve the same results.

• Enables verifiability of claims.

• Increases robustness of the findings.
• Helps others to build up on the reported research.

CRICOS code 00025B 41

Reproducibility of data science reports
Best practices:
1. The research report should be accompanied with the original data.
2. The data generating process should be explained replicate the
data generation process (e.g. experiment).
3. Data analysis should be fully automated and the code to produce
the results should be made publicly available.
4. The analysis code should be written in a clear and concise way.

CRICOS code 00025B 42

Artificial Intelligence
Combining multiple ML algorithms together to solve complex problems.
1. Define the domain structure:
• Break a complex problem into composite tasks that can be solved with ML.
- So far very successful in well-defined structured games (Chess, Go,
- Domain expertise (e.g. economic theory) is valuable in business applications
2. Generate the necessary data
- Conduct experiments, build data management systems.
- Possible to simulate data for ML in games like Go.
3. Build ML algorithms for each task and combine the information to make

CRICOS code 00025B 43

Mental health and wellbeing
Do you struggle with mid-semester
Student Services offers a range of mindfulness programs, events
and resources so you can reduce stress, think more positively and
improve your wellbeing.

UQ students can contact the Counselling and Crisis line 24 hours a

day, 7 days a week.

Book an appointment online (students can access 10 free counselling sessions)

or email questions to
Thank you for your attention!
Dr. Kenan Kalaycı | Senior Lecturer
School of Economics

You might also like