Professional Documents
Culture Documents
MMW (Data Management) - Part 2
MMW (Data Management) - Part 2
MMW (Data Management) - Part 2
Data Management-Part 2
Luzviminda T. Orilla, PhD
Measures of Relative Position
Percentiles and Quartiles
are useful when you want to know where the score is located in
reference to the other scores.
Percentile is a data value for which the specified percentage of
the data is below that value.
The median is the 50th percentile.
The 25th, 50th , 75th percentiles divide the data into lower quartile
Q1, middle quartile Q2, and upper quartile Q3, respectively.
In using quartiles, there are five numbers to be used altogether:
min value, Q1, median, Q3, and max value.
Quartiles are useful for box plots.
Normal Distribution
It is an extremely important concept, because it occurs so often
in the data we collect from the natural world, as well as in
many of the more theoretical ideas that are the foundation of
statistics.
• Characteristics of a Normal Distribution
• Area under the Curve
Areas under the curve that are symmetric about the mean are
equal.
The total area under the curve is 1.
• Empirical Rule for a Normal Distribution
• In a normal distribution, approximately
68% of the data lie within 1 standard deviation of the mean.
95% of the data lie within 2 standard deviations of the mean.
99.7% of the data lie within 3 standard deviations of the mean.
Example
• Empirical Rule for a Normal Distribution
• Example. The heights of a large group of people are assumed to be
normally distributed. Their mean height is 66.5 inches, and the
standard deviation is 2.4 inches. Find and interpret the intervals
representing one, two, and three standard deviations of the mean.
•z-score of xi in a population: xi
z xi
•z-score of xi in a sample: xi x
z xi
s
• Standard Normal Distribution
• Bivariate data are data sets in which each subject has two
observations associated with it.
INTERVAL REMARKS
0.9-0.99 Very High
0.7-0.89 High
0.5-0.69 Moderate
0.3-0.49 Low
BELOW 0.3 Very Low
• Given DATA: Happiness vs Life Expectancy
Life
Country Happiness Expectancy(LE)
Japan 6.8 80.80
South Korea 6.2 74.20
China 6.3 70.40
Taiwan 6.2 76.40
Indonesia 6.6 78.00
Philippines 6.4 69.00
Singapore 6.8 77.60
Vietnam 6.1 69.40
India 6.2 63.00
Bangladesh 5.7 59.50
Happiness Index
R-value Significance(P) Null
=0.05 Hypothesis
CV: 4.05
Life Expectancy 0.817 TV: 2.101 Rejected
PV= 0.000
*p<0.05
Interpretation
0.9- 0.99 Very Highly correlated
0.7-.89 Highly correlated
0.5-0.69 Moderately correlated
0.3 -0.49 Low correlation
Below 0.3 Very low correlation
STATISTICAL HYPOTHESES
tcritregion
If tcrit falls in the critical =2.101for rejection of , we reject . Thus, If | tcal|
> |tcrit |, reject . If not, retain .
STEP 3
Calculate the test Statistic
n2
tcal r
1 r 2
n2
tr
1 r 2
10 2
0.82
1 (.82) 2
8 8
0.82 0.82 0.82 24.42002 .82(4.9416)
1 .6784 0.3276
t 4.05
tobt t0.05 2.101
STEP 4
Make a decision
Since the tcal (4.05) is greater than tcrit (2.101), hence, we reject the
null hypothesis that the correlation is zero. The computed t-value
exceeded the required value for significance at the .05 probability level.
This will led us to say that there exist a real correlation between the
happiness index and the life expectancy.
STEP 5 Interpretation
As the happiness index increases, the life expectancy also
increases.
Linear Regression
Linear regression is an approach for modeling the
relationship between a dependent variable
(outcome) and one or more explanatory
variables. The case of one explanatory variable
is called simple linear regression.
ŷ ax b
• where and
• n xy x y b y ax
a
n x x
2 2
• Scatterplot is a graph of plotted points showing
the relationship between two numerical variables.
Examining a Scatterplot
1. Describe the overall pattern of a scatterplot by the form,
direction, and strength of the relationship.
2. Then look for any striking deviations from the pattern.
Identify each occurrence of an outlier.
Using the previous data and developing the predictive equation