Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 11

Kinh tế lượng 2022-2023

Training nội bộ
Sơ bộ về cuộc thi
- Nghiên cứu các ý tưởng: cần đọc
các bài báo để có ý tưởng, CẦN
Cho demo mô
KIẾN THỨC STAT
hình đơn giản:
Research linear regression

- Code: Python
Statsmodel

Report - Cần biết đọc bài báo


- Cần biết viết bài
Outline
• Probability:
• Random variables.
• Probability.
• Distribution function.
• Statistics:
• Population and samples.
• Mean, Variance, Median, Mode, Skew, Kurtosis.
• Special random variables.
• Hypothesis testing, Confidence interval.
• Examples of model
• Linear regression
Probability
• Random variable is possible outcomes of an event.
• Eg1: Flip a fair coin 5 times. A random variable X could be an event that we
only have 3 heads
• Eg2: Roll 3 dices, Y could represent an event that the maximum number of 3
dices exceeds 4.
• Eg3: Roll 3 dices, Z is the number of times that 1 shows up.
• Eg4: Let T be a random real number from the interval [0,1].
• Since we DO NOT know exactly which values these variables will take,
we use probability.
Probability
• Come back to Eg3 above:
• Eg3: Roll 3 dices, Z is the number of times that 1 shows up.
• Then the probability mass function (pmf) for this discrete random variables
is:
z 1 2 3
P(Z = z) p1 p2 p3

• Come back to Eg4 above:


• Eg4: Let T be a random real number from the interval [0,1].
• Since there are infinitely many real numbers in this interval then:
• P(T = 1) = 0; P(0.5 < T < 0.75) = 0.25
Probability
• The pmf and pdf above is the distribution function.
• A probability density function (f) for continuous
random variable is one where at any given value, we
can calculate the probability of the variable at that
point.
• But for continuous variable, the probability at any
point is zero.
• For the eg4, this is call the uniformly continuous
random variable
• To calculate the probability that X belongs to interval
[0.1, 0.9], we take the integral from 0.1 to 0.9 of f
Probability
• Another concept is expectation of a random variable X: E(X).
• For Eg3: E(X) = p1*1 + p2*2 + p3*3 is the expectation or expected
value of random variable X if we roll the dices infinitely many times.
Statistics
• All of the formula, concepts from the previous Probability section are
for population.
• A population is the entire group that you want to draw conclusions
about.
• A sample is the specific group that you will collect data from
• In short, population contains data from the beginning to the end.
Samples are just small subsets of population.
Statistics
• For a given samples (data), we can calculate the statistics (mean,
variance,…) of its using pd.describe() in Python
• One special random, continuous random variable that is used a lot in
theory is Normal distribution.
Statistics
• Remember that the formula for
mean, variance is used on
samples.
• Eg: We have a set of data (a
sample), we wish to find the true
expected value of the population.
If we find the mean by taking the
sum and dividing by the number - If we estimate that the sample mean is X-bar, then the
of data, that is called the sample 95% confidence interval for the population mean is [X-
mean. bar – a, X-bar + a] with a is the variation of estimation.
- It means that 95% of the time, the true value of the
• We will do something called
population mean will lie within this interval.
estimation: by interval, by testing or
by models.
Statistics
• Hypothesis testing is a test concerning
the true value of a population statistics.
• The difference between CI and HP is that
HP only considers 1 value at a time,
whereas CI contains infinitely many
values.
• It also has a confidence level – alpha
with the probability that we reject the
null hypothesis given that the null
hypothesis is true.
• More clear and illustrative example will
be given in the Linear regression part.

You might also like