Professional Documents
Culture Documents
Chapter 5 - RM
Chapter 5 - RM
Chapter 5 - RM
Basics of Statistics
Research Methods
Dr. Asif Mahmood
Institute of Business & Management,
UET Lahore
Descriptive Statistics
• Descriptive statistics are used by researchers to report on
populations and samples
• It speeds up and simplify comprehension of a group’s characteristics
• It does not aim to use the data to learn about the population that the
sample of data is thought to represent— e.g., demographics
Types of descriptive statistics: Bivariate Analysis Univariate Analysis
• Organize Data
– Tables (Frequency Distributions, Relative Frequency Distributions,
Cross-tabulations and Contingency tables)
– Graphs (Bar Chart or Histogram, Frequency Polygon, Scatterplot,
etc.)
Univariate Analysis
• Summarize Data
− Central Tendency (Mean, Median,
Mode)
− Variation {Range, Interquartile
Range, Variance, Standard
Deviation, shape of distribution
(kurtosis, skewness)}
Statistical Analysis
Inferential Statistics
• Testing a hypothesis and drawing conclusions
about a population, based on sample
• statistical inference consists of:
– selecting a statistical model of the process that generates the
data
– deducing propositions from the model
• Forms of inferential statistics:
– Estimation Statistics
• point estimate, interval estimate
– Hypothesis Testing
• T test, Chi Square, or ANOVA to test whether a hypothesis
about the mean is true or not
The Shape of A Distribution
Normal distribution
• Majority of scores lie around the centre of the
distribution, and look the same on both sides
Two main ways in which a distribution can deviate from
normal:
1. Skew (Lack of symmetry): Most frequent scores are
Tail
clustered at one end of the scale
• Positively skewed: The frequent scores are
clustered at the lower end and the tail points
towards the higher or more positive scores
• Negatively skewed: The frequent scores are
clustered at the higher end and the tail points
towards the lower or more negative scores
Tail
2. Kurtosis (Pointyness): the degree to which scores cluster at
the ends of the distribution
• Positive kurtosis (leptokurtic): It has many scores in the
tails (heavy-tailed distribution) and is pointy
• Negative kurtosis (platykurtic): Thin in the tails (has light
tails) and tends to be flatter than normal
Measures of Central Tendency
• Mode
– The mode in a distribution of data is simply the 2.0
Count
1.4
– It may give you the most likely experience rather 1.2 Bimodal Distribution
than the “typical” or “central” experience 1.0
82.00 89.00
87.00
96.00
93.00
98.00
97.00
103.00 106.00 109.00 115.00 120.00 128.00 140.00
102.00 105.00 107.00 111.00 119.00 127.00 131.00 162.00
IQ
Symmetric
Skewed
N.B. In symmetric distributions, the
mean, median, and mode are the same,
whereas in skewed data, the mean and
median lie further toward the skew than Mean
Median
the mode Mode Mode MedianMean
The dispersion in a distribution
• Range
– Difference between the largest and smallest score in the data set
• The Inter Quartile Range or IQR is the difference between the 25th
and 75th percentile scores
• Quartiles are the three values that split the sorted data into four equal
parts
• Second Quartile (Median) splits the data into two equal parts
• Lower Quartile is the median of the lower half of the data
• Upper Quartile is the median of the upper half of the data
• Percentiles are points that split the data into 100 equal parts
Mean
The smaller the variance, the closer the individual scores are to the
mean.
Mean
Statistical Model
Degree of freedom,
for a sample
• Sum of squared error and the mean squared error are used to assess the fit
of a model
When the model is mean, the mean
squared error has a special name:
variance (Standard Deviation???)
Important Definitions
Sampling variation
• Samples will vary because they contain different members of the
population
Sampling distribution
• A sampling distribution is the frequency distribution of sample means (or
any other parameter) from the same population
Standard error of the mean (SE) or standard error
• The standard deviation of sample means
Important
Central limit theorem
• As samples get large (>30), the sampling distribution has a normal
distribution with a mean equal to the population mean, and a standard
deviation (approximation) of
Standard Deviation of sample
• A large standard error (relative to the sample mean) means that there is
a lot of variability between the means of different samples and so the
sample might not be representative of the population
Central Limit Theorem—Again
Sampling
Distribution having
sample size 1
Standardization
• Z score
– It transforms the original distribution to one in which the mean becomes zero
and the standard deviation becomes 1 without changing the symmetry of the
distribution. The process is called Standardization
Standard Error
• General Formula
Exercise
Solution
=
Type I and Type II Errors
Effect size (e)
• The separation between the null
hypothesis value and a particular
value specified for the alternative
hypothesis
α- Level (Type I error)
• α states what chance of making an
error (by falsely concluding that the
null hypothesis should be rejected)
we, as researchers, are willing to
tolerate in the particular research context.
E.g., An intelligent is not passed; An innocent person is punished
β-level (Type II error)
• Occurs when we believe that there is no effect in the population when, in reality,
there is (we falsely ‘fail to reject null hypothesis’)
• Cohen (1992) suggests that the maximum acceptable probability of a Type II
error would be .2 (or 20%)
• β error is generally considered less severe or costly than an α error
Type I and Type II Errors in Testing a Hypothesis
Correct Wrong
Type I and Type II Errors (Skip this)