Professional Documents
Culture Documents
Eda 223 Reviewer All Lacan
Eda 223 Reviewer All Lacan
SAMPLING
• Process of selecting a small number of elements
(sample) from a larger defined target group of FOR POPULATION: A study investigating the
elements (population) prevalence of a rare medical condition in a specific
region.
• The information gathered from the small group will
allow judgments to be made about the larger group. SAMPLING FRAME: List of patient records from
hospitals, clinics, or medical registries within the region.
BASICS OF SAMPLING THEORY
➢ Population * Care must be taken to make sure your sampling frame is
• The studied group of individuals adequate for your needs. *
➢ Element
• Basic unit of information that has a unique meaning
and subcategories of distinct value. A good sample frame for a project on living condition would:
➢ Defined target population
• Chosen group of individuals that is studied. • Include all individuals in the target population.
➢ Sampling unit
• Unit of a population that is used for statistical • Exclude all individuals not in the target
research or study. population.
➢ Sampling Frame
• Eligible members of a population from which • Include all accurate information that can be used
samples are drawn. to contact selected individuals.
SAMPLING ERROR
II. SAMPLING METHODS
• Any type of bias that is attributable to mistakes in
either, 1. PROBABILITY SAMPLING
o Drawing a sample • Sampling technique in which the researcher chooses
o Determining the sample size samples from a larger population using a method
based on the Theory of Probability.
• Random selection
I. DEFINING A POPULATION OF •
INTEREST TYPES OF PROBABILITY SAMPLING:
Population of interest is entirely dependent on,
1. Simple Random Sampling – every unit has an
a.) Management Problem equal nonzero chance of being selected.
b.) Research Problems and;
c.) Research Design Ex. In studying reading habits of student at a
university, they assigned a unique number to
Some bases for defining Population: each student and then use a random number
o Geographic Area generator to select a sample of 100 students from
o Demographics the entire student population.
o Usage/Lifestyle
o Awareness 2. Systematic Random Sampling – the defined
target population is ordered, and the sample is
selected according to position using a skip
interval.
SAMPLING FRAME
Ex. The individuals that are picked is according
• List of population elements (people, companies, to the assigned number uniformly defined.
houses, cities, etc.) from which units to be sampled (numbers divisible by 5 → 5th, 10th, 15th, etc.)
can be selected.
• Difficult to get an accurate list.
• Sample frame error – occurs when certain elements 3. Stratified Random Sampling – population is
of the population are accidentally omitted or not divided into different subgroups and samples
included on the list. are selected from each.
Ex. Stratifying employees working on different speaking to a member of that club to forward the
departments (from sales, marketing, and HR). survey to its co-members creating a chain-
referral.
2. Level of Confidence
2. NON-PROBABILITY SAMPLING
• Sampling technique in which the researcher selects • Risk level of Confidence level is based on Central
samples based on the subjective judgment of the Limit Theorem.
researcher rather than random selection.
Example: We measure the heights of 40 randomly chosen 2. SAMPLE SIZE OF A SIMILAR STUDY
men and get a mean height of 175cm. We also know the • Without reviewing the methods used in these studies
standard deviation of men’s height is 20cm: may run the risk of repeating errors that were made in
determining the sample size for another study.
1ST → GIVEN:
3. PUBLISHED TABLES
̅(𝒔𝒂𝒎𝒑𝒍𝒆 𝒎𝒆𝒂𝒏) = 𝟏𝟕𝟓𝒄𝒎
𝒙
𝒔
̅ ± 𝒛(
𝑪𝑰 = 𝒙 )
√𝒏
𝒔
̅ ± 𝒛(
𝑪𝑰 = 𝒙 )
√𝒏
𝟐𝟎
→ 𝑪𝑰 = 𝟏𝟕𝟓 ± 𝟏. 𝟗𝟓𝟗𝟗𝟔 ( )
√𝟒𝟎
→ 𝑪𝑰 = 𝟏𝟕𝟓 ± 𝟔. 𝟏𝟗𝟖
SLOVIN’S FORMULA
𝑵
𝒏=
𝟏 + 𝑵𝒆𝟐
Where:
𝒆 − 𝑴𝒂𝒓𝒈𝒊𝒏 𝒐𝒇 𝑬𝒓𝒓𝒐𝒓
𝑵 − 𝒕𝒐𝒕𝒂𝒍 𝒑𝒐𝒑𝒖𝒍𝒂𝒕𝒊𝒐𝒏
TYPES OF VARIABLES:
1. Nominal – data that are numbered
CORRELATION COEFFICIENT
Pearson’s Correlation / Product Moment Correlation Coefficient
- Measures the nature and strength between two variables of the quantitative type.
- Denoted by r
▪ POSITIVE CORRELATION
𝑾𝑯𝑬𝑹𝑬: 𝑿 ↑ 𝒀 ↑
▪ NEGATIVE CORRELATION
𝑾𝑯𝑬𝑹𝑬: 𝑿 ↑ 𝒀 ↓
▪ ZERO CORRELATION
-absence of any systematic tendency
PEARSON – r FORMULA
𝒏 ∑ 𝒙𝒚 − ∑ 𝒙 ∑ 𝒚
𝒓=
√[∑ 𝒙𝟐 − (∑ 𝒙)𝟐 ][∑ 𝒚𝟐 − (∑ 𝒚)𝟐 ]
Where:
MATH PHYSICS
STUDENT SCORES (X) SCORES (Y) 𝑿𝟐 𝒀𝟐 XY
A 65 65 4225 4225 4225
B 66 63 4356 3969 4158
C 68 64 4624 4096 4352
D 71 67 5041 4489 4757
E 65 70 4225 4900 4550
F 68 62 4624 3844 4216
G 67 69 4489 4761 4623
H 62 71 3844 5041 4402
I 60 70 3600 4900 4200
J 69 68 4761 4624 4692
TOTAL: 661 669 43789 44849 44175
𝒏 ∑ 𝒙𝒚 − ∑ 𝒙 ∑ 𝒚
𝒓=
√[∑ 𝒙𝟐 − (∑ 𝒙)𝟐 ][∑ 𝒚𝟐 − (∑ 𝒚)𝟐 ]
[𝟏𝟎(44175) − (𝟔𝟔𝟏)(𝟔𝟔𝟗)]
𝒓=
√[𝟏𝟎(𝟒𝟑𝟕𝟖𝟗) − (𝟔𝟔𝟏)𝟐 ][𝟏𝟎(𝟒𝟒𝟖𝟒𝟗) − (𝟔𝟔𝟗)𝟐 ]
Pearson r -0.483774464
MODERATE NEGATIVE
CORRELATION
𝟔 ∑ 𝒅𝟐
𝒓𝒔 = 𝟏 −
𝒏(𝒏𝟐 − 𝟏)
Where:
𝒓𝒔 = 𝒄𝒐𝒆𝒇𝒇𝒊𝒄𝒊𝒆𝒏𝒕 𝒐𝒇 𝒄𝒐𝒓𝒓𝒆𝒍𝒂𝒕𝒊𝒐𝒏
𝒏 − 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒄𝒂𝒔𝒆𝒔
Ex. In a study of the relationship between level of education and income the following data was obtained. Find the
relationship between them and comment.
Level of squared
Education Difference Difference
Sample (x) Rank (x) Income (y) Rank (y) (d) (d^2)
A Preparatory 5.5 45 4 1.5 2.25
B Elementary 3.5 35 5 -1.5 2.25
C University 1 18 7 -6 36
D Elementary 3.5 50 2.5 1 1
E Secondary 2 65 1 1 1
F Illiterate 7 50 2.5 4.5 20.25
G Preparatory 5.5 20 6 -0.5 0.25
6.30E+0
TOTAL: 1
n 7
REGRESSION
- For prediction
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑦 = 𝑚𝑥 + 𝑏 𝑚=
𝑛 ∑ 𝑥 2 − 𝑛(∑ 𝑥 )2
Where:
𝑚 − 𝑠𝑙𝑜𝑝𝑒
𝑥 − 𝑥 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
𝑏 − 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡
Intercept (b)
𝑏 = 𝑦̅ − 𝑚𝑥̅
Where:
∑𝑦 ∑𝑥
𝑦̅ = ; 𝑥̅ =
𝑛 𝑛
Problem: The following table shows the percent of water and the number of calories in various canned soups to
which 100g of water are added.
a.) Find the equation of the regression line for the date. Round constants to the nearest hundredth.
b.) Use the equation to find the expected number of the calories in a soup that is 89% water. Round to the
nearest whole number.
Regression Equation
Where: 𝒚 = 𝒎𝒙 + 𝒃
Where: ̅ − 𝒎𝒙
𝒚 − 𝒊𝒏𝒕𝒆𝒓𝒄𝒆𝒑𝒕, 𝒃 = 𝒚 ̅ 𝑏 767.122807
- Status quo
- Reduce probability of one error and the ONE-TAIL Z TEST FOR MEAN
other one goes up (𝝈 𝑲𝑵𝑶𝑾𝑵)
ASSUMPTIONS
Z-TEST STATISTICS (𝝈 KNOWN) o Population is Normally
- Convert sample statistics to standardized Z Distributed
Variable o If not normal, use large samples
o Null Hypothesis has = sign only
Z test
̅ − 𝝁𝒙̅
𝒙 o Z test Statistics:
𝒁=
𝝈𝒙̅ Z test statistics
̅ − 𝝁𝒙̅ (𝒙
𝒙 ̅ − 𝝁)
𝒁= = 𝝈
𝝈𝒙̅
√𝒏
- Compare to Critical Z value(s)
➔ If Z test statistics falls in Critical
region, Reject 𝑯𝟎 ; Otherwise Do No
Reject 𝑯𝟎
P-VALUE TEST
(𝒙 ̅𝟐 ) − 𝒅𝟎
̅𝟏 − 𝒙 99% 2.58
𝒛=
𝝈𝟐 𝝈𝟐𝟐
√ 𝟏
𝒏𝟏 + 𝒏𝟐
GIVEN:
(𝒙
̅ −𝝁)
1.5 → 𝑧 = 𝝈
average 372.5 Z-value √𝒏
GIVEN:
average 788.00
(𝒙
̅ −𝝁)
-1.643167673→ 𝑧 = 𝝈
number of bulbs 30.00 z- value √𝒏
0.05
dev (proportion) 40.00 z-value (crit) -1.959963985 → 𝑁𝑂𝑅𝑀. 𝑆. 𝐼𝑁𝑉 ( )
2
significance 0.05
𝒙 𝟐
−𝟏.𝟔𝟒𝟑𝟐 𝟏
confidence 0.95 p-value 0.101 → 𝟐 𝒙 (𝟎. 𝟎𝟓 + ∫𝟎 𝒆− 𝟐 𝒅𝒙 )
√𝟐𝝅
mean 800.00 p-value (crit) 0.05
Ho is not rejected
Problem for t-test (SINGLE MEAN): past
experience indicates that the time required for high
school seniors to complete a standardized test is a
normal random variable with a mean of 35 minutes.
If a random sample of 20 high school seniors took an
average of 33.1 minutes to complete this test with a
standard deviation of 4.3 minutes. Test the
hypothesis, at the 0.05 level of significance, that 𝜇 =
35 𝑚𝑖𝑛𝑢𝑡𝑒𝑠 against the alternative that 𝜇 <
35 𝑚𝑖𝑛𝑢𝑡𝑒𝑠
t-test for deviation known (sample) 1 tail (left)
Ho: There is significant evidence that the time required to complete the test is 35 minutes
H1: There is significant evidence that the time required to complete the test is less than 35 minutes
GIVEN:
average 33.1
̅−𝝁
𝒙
-1.976060073 → 𝒕= 𝒔
no. of student 20 t-value √𝒏
dev (sample) 4.3 t-value (crit) -1.729132812 → 𝑻. 𝑰𝑵𝑽(𝟎. 𝟎𝟓, 𝟏𝟗)
significance 0.05
confidence 0.95 -1.976060073 < -1.729132812
mean 35
Df → 𝒏 − 𝟏 19 Ho is rejected
t-stat to p value
Ho: There is a significant evidence that the two means are equal
H1: There is a significant evidence that the two means are not equal
GIVEN:
1st number 25
1st ave 81
(𝒙̅ 𝟏 −𝒙̅ 𝟐 )−𝒅𝟎
4.221685587 → 𝒛 =
𝝈𝟐 𝝈𝟐
√ 𝟏+ 𝟐
𝒏 𝒏
𝟏 𝟐
1st dev 5.2 z-value
𝟎.𝟎𝟓
2nd number 36 z-value (crit) 1.959963985 → 𝑵𝑶𝑹𝑴. 𝑺. 𝑰𝑵𝑽
𝟐
2nd ave 76
𝒙 𝟐
𝟒.𝟐𝟐𝟏𝟕 𝟏
2nd dev 3.4 p-value 2.42482E-05→ 𝟎. 𝟓 − ∫𝟎 𝒆− 𝟐 𝒅𝒙
√𝟐𝝅
significance 0.05 p-value (crit) 0.05
diff 0
Reject Ho
Problem z-test (double mean): A manufacturer
claims that the average tensile strength of thread A
exceeds the average tensile strength of thread B by at
least 12 kilograms. To test this claim, 50 pieces of
each type of thread were tested under similar
conditions. Type A thread had an average tensile
strength of 86.7 kilograms with a standard deviation
of 6.28 kilograms, while type B thread had an
average tensile strength of 77.8 kilograms with a
standard deviation of 5.61 kilograms. Test the
manufacturer’s claim using a 0.05 level of
significance.
Ho: There is a significant evidence that the difference did not exceed 12 kilograms
H1: There is a significant evidence that the difference exceed 12 kilograms
GIVEN:
1st number 50
1st ave 86.7
(𝒙̅ 𝟏 −𝒙̅ 𝟐 )−𝒅𝟎
-2.603103417→ 𝒛 =
𝝈𝟐 𝝈𝟐
√ 𝟏+ 𝟐
𝒏 𝒏
𝟏 𝟐
1st dev 6.28 z-value
2nd number 50 z-value (crit) 1.644853627→ 𝑵𝑶𝑹𝑴. 𝑺. 𝑰𝑵𝑽(𝟏 − 𝟎. 𝟎𝟓)
2nd ave 77.8
2nd dev 5.61
significance 0.05
diff 12
Do Not Reject Ho
LESSON 9: TEST OF INDEPENDENCE
The Chi-square test of independence is a statistical hypothesis test used to determine whether two categorical or nominal variables are likely to be related or not.
𝐻0 : 𝑉𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
𝐻1 : 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠 𝑎𝑟𝑒 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
Example problem: A criminologist conducted a survey to determine whether the incidence of certain types of crime varied from one part of a large city to another. The particular
crimes of interest were assault, burglary, larceny, and homicide. The following table shows the numbers of crimes committed in four areas of the city during the past year.
STEP PROCESS
Actual
TYPE OF CRIME
DISTRICT Assault Burglary Larceny Homicide Total
OBSERVED VALUES A 162 118 451 18 749
B 310 196 996 25 1527
C 258 193 458 10 919
D 280 175 390 19 864
Total 1010 682 2295 72 4059
Expected
EXPECTED VALUES USING,
TYPE OF CRIME
𝑻𝑶𝑻𝑨𝑳 𝑹𝑶𝑾 𝑨𝑳𝑰𝑮𝑵𝑬𝑫 𝒙 𝑻𝑶𝑻𝑨𝑳 𝑪𝑶𝑳𝑼𝑴𝑵 𝑨𝑳𝑰𝑮𝑵𝑬𝑫
𝑬𝑿𝑷𝑬𝑪𝑻𝑬𝑫 = DISTRICT Assault Burglary Larceny Homicide Total
𝑶𝑽𝑬𝑹𝑨𝑳𝑳 𝑻𝑶𝑻𝑨𝑳
A 186.3735 125.8482 423.4922 13.28603 749
WHERE:
TOTAL ROW – TOTAL IN A ROW B 379.963 256.5691 863.3814 27.08647 1527
TOTAL COLUMN – TOTAL IN A COLUMN C 228.6746 154.4119 519.612 16.30155 919
OVERALL TOTAL – TOTAL OF ALL
D 214.9889 145.1707 488.5144 15.32594 864
Total 1010 682 2295 72 4059
CHI-SQUARE TABLE USING, Chi
(𝑶𝑩𝑺𝑬𝑹𝑽𝑬𝑫 − 𝑬𝑿𝑷𝑬𝑪𝑻𝑬𝑫)𝟐
𝑪𝑯𝑰 =
𝑬𝑿𝑷𝑬𝑪𝑻𝑬𝑫 TYPE OF CRIME
WHERE: DISTRICT Assault Burglary Larceny Homicide Total
OBSERVED – OBSERVED DATA IN THE SAME CELL A 3.187508 0.489438 1.786755 1.672546 7.136247
EXPECTED – EXPECTED DATA IN THE SAME CELL
B 12.88238 14.29875 20.37072 0.160721 47.71257
C 3.760725 9.643294 7.305519 2.435937 23.14548
D 19.65888 6.129233 19.86654 0.880775 46.53542
Total 39.48949 30.56071 49.32953 5.14998 124.5297
CHI-SQUARED STAT. (OVERALL SUM), DEGREE OF FREEDOM 𝐶ℎ𝑖 𝑠𝑞𝑢𝑎𝑟𝑒𝑑 𝑠𝑡𝑎𝑡 124.5297127
(𝑫𝑭 = (𝑪𝑶𝑳𝑼𝑴𝑵 − 𝟏) 𝒙 (𝑹𝑶𝑾 − 𝟏)
𝑑𝑓 9
Decision: Reject Ho
Do Not Reject Ho
𝐻0 : µ1 = µ2 = µ3 = µ4 = µ5 = µ6
HYPOTHESIS (NULL & ALTERNATIVE)
𝐻1 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑡𝑤𝑜 𝑚𝑒𝑎𝑛𝑠 𝑑𝑖𝑓𝑓𝑒𝑟 𝑓𝑟𝑜𝑚 𝑒𝑎𝑐ℎ 𝑜𝑡ℎ𝑒𝑟
MACHINE
1 2 3 4 5 6 Total:
17.5 16.4 20.3 14.6 17.5 18.3 104.6
MEAN AND TOTAL SUM OF THE SCORES FOR EACH 16.9 19.2 15.7 16.7 19.2 16.2 103.9
SAMPLES 15.8 17.7 17.8 20.8 16.5 17.5 106.1
18.6 15.4 18.9 18.9 20.5 20.1 112.4
Total: 68.8 68.7 72.7 71 73.7 72.1 427
Mean 17.2 17.175 18.175 17.75 18.425 18.025 106.75
MACHINE
𝑋12 𝑋 22 𝑋32 𝑋42 𝑋52 𝑋62
306.25 268.96 412.09 213.16 306.25 334.89
SQUARES OF EACH SCORES AND THEIR SUMS
285.61 368.64 246.49 278.89 368.64 262.44
249.64 313.29 316.84 432.64 272.25 306.25
345.96 237.16 357.21 357.21 420.25 404.01
Total: 1187.46 1188.05 1332.63 1281.9 1367.39 1307.59
SUM OF SQUARES (SS EACH) USING,
(∑ 𝒙)𝟐
𝑺𝑺 = ∑ 𝑿𝟐 −
𝒏
DEGREES OF FREEDOM
df
𝒅𝒇𝒕𝒐𝒕𝒂𝒍 = 𝑵 − 𝟏
Between 5
𝒅𝒇𝒘𝒊𝒕𝒉𝒊𝒏 = 𝑵 − 𝒌 Within 18
𝒅𝒇𝒃𝒆𝒕𝒘𝒆𝒆𝒏 = 𝒌 − 𝟏 Total 23
𝑺𝑺𝒃𝒆𝒕𝒘𝒆𝒆𝒏 MS
𝑴𝑺𝒃𝒆𝒕𝒘𝒆𝒆𝒏 =
𝒅𝒇𝒃𝒆𝒕𝒘𝒆𝒆𝒏
Between 1.0676667
𝑺𝑺𝒘𝒊𝒕𝒉𝒊𝒏 Within 3.48
𝑴𝑺𝒘𝒊𝒕𝒉𝒊𝒏 =
𝒅𝒇𝒘𝒊𝒕𝒉𝒊𝒏
Total
F-statistics F