COMPSCI 5590-f23-DS-rr-lecture1-3

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

COMP-SCI 5590-0012

Econometrics of data science

LECTURE 1-3: DESCRIPTIVE STATISTICS


Rezwana Rafiq
Adjunct Instructor, University of Missouri-Kansas City
Assistant Project Scientist, University of California Irvine
Recap
▪ Descriptive statistics consists
of methods for organizing,
displaying, and describing data
by using tables, graphs, and
measures.

Median speed before the law = 57.87 mph


Median speed after the law = 59.45 mph

2
Recap
▪ Some methods and techniques are available to summarize and interpret the data.
• Point estimates: these are the single values of the sample statistics/estimates
that are used to estimate the population parameters.
• Graphical presentations: commonly used graphical representation of data
(e.g., boxplots, histogram, bar chart)

3
Descriptive Statistics: Contents
▪ Point estimates
▪ Measures of central tendency: arithmetic mean, median, mode
▪ Measures of variability: variance, standard deviation, range
▪ Measures of position: quartiles, interquartile range, percentile
▪ Measures of distribution: skewness, kurtosis
▪ Measures of association: covariance, correlation
▪ Graphical representation

* Properties of estimators

4
Measures of Association: Covariance
▪ If we consider two variables X and Y, covariance between them tells us how they
change together.

▪ Population:

▪ Sample:

▪ If the sign of is positive, they increase or decrease together


▪ If the sign of is negative, they vary in opposite direction
▪ Drawback: Since covariance depends on units of measurement, the magnitude is not
informative.

5
Measures of Association: Correlation
▪ Correlation measures the standardized information about the degree of linear
association between two variables.
▪ The correction lies within the interval [-1,1].

▪ Population: Sample:

Travel time
Accident

Speed Speed

Fig. source: Washington, S. et al. (2020) Stat. and Econ. Methods for Transportation Data Analysis

6
Correlation: Examples
Positive correlation Negative correlation

Educational qualification vs. income Educational qualification vs. unemployment rate


No. of workers vs. income Interest rate vs. number of people buying houses
Campus size vs. number of student enrollment Time spent on Facebook vs. test score
State population vs. federal allocation Gas price vs. number of trips made by car

Fig. source: Washington, S. et al. (2020) Stat. and Econ. Methods for Transportation Data Analysis

7
Correlation: Examples
No correlation

Shoe size vs. intelligence


Hair color vs. number of trips made
Height and example score
The amount of time watching TV vs. heating bill

Fig. source: Google.com

8
Measures of Association: Example

9
Measures of Association: Example

10
Measures of Association: Example

11
Descriptive Statistics: Contents
▪ Point estimates
▪ Measures of central tendency: arithmetic mean, median, mode
▪ Measures of variability: variance, standard deviation, range
▪ Measures of position: quartiles, interquartile range, percentile
▪ Measures of distribution: skewness, kurtosis
▪ Measures of association: covariance, correlation
▪ Graphical representation

* Properties of estimators

12
Properties of estimators
▪ In practice, we typically do not know population parameters such as
So, we collect sample data to estimate these population parameters.
▪ An estimator is a rule that combines sample data to give an estimated value of a
population parameter.
▪ Some desirable properties of an estimator:
▪ Unbiasedness: If there are several estimators of population parameter, and if one
of these estimators coincides with the true value of the unknown parameter,
then it is called unbiased.

13
Properties of estimators
▪ Efficiency: There is typically more than one unbiased estimator for a given
population parameter. An estimator is more efficient than another estimator
(both are unbiased) if the variance of is lower than the variance of

▪ Sufficiency: When we have more information, we will be more close to what we


want to know. An estimator is called sufficient if it contains all the information in
a sample about the parameter it estimates.

14
Properties of estimators
▪ Consistency: An estimator is said to be consistent if the probability of being
closer to the true value of the parameter to be estimated increases with
increasing sample size.
▪ This property indicates that will not differ from as

▪ This is formally expressed as: for any constant c.


Here is an estimator and is population parameter.

15

You might also like