WK 1 3

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

STATISTICS (Week 1-3) Qualitative - deals with characteristics and

descriptors that can’t be easily measured, but


Basic Statistical Concepts
can be observed subjectively
What is Statistics?
- It is the Science of collecting, organizing,
Four Levels of Data Management
presenting, analyzing, and interpreting
data to assist in making more effective 1. Nominal – lowest level of data management
decisions. - for identification and
classification
Two Branches of Statistics
1. Descriptive Statistics - using the data 2. Ordinal – use to reflect some rank or order of
gathered on a group to describe or reach individual or objects
conclusions about the same group
3. Interval – zero is arbitrary
(E.g. class average, range of scores in an exam) (eg. Temperature)
2. Inferential Statistics - a researcher gathers
data from a sample and uses the statistics 4. Ratio – highest level of data measurement
generated to reach conclusions about the - zero is absolute (e.g. height)
population from the sample drawn.
Statistics vs Parameter
Two Types of Variable - Parameter measures the characteristic
of a population.
Variable – characteristic of interest about an
- Statistic measures the characteristic of a
object under investigation
sample.
Independent
Sampling - the process of selecting certain
- Manipulated members or a subset of the population to make
- Causes the change statistical inferences from them and to estimate
Dependent characteristics of the whole population.

- the variable that the investigator Reasons for sampling


measures to determine the effect of the 1. The sample can save money
independent variable 2. The sample can save time
3. For a given resource, the sample can
broaden the scope of the study.
4. Since the research process is sometimes
DATA
discouraging, sampling can save the
– the set of values collected from the product.
variables 5. If getting the population is impossible,
sampling is the only option.
Quantitative - deals with numbers and things you
can measure.
1. Discrete
 Countable
 Data are obtained by counting
(Example: the number of children in a
family)
2. Continuous
 Can assume an infinite number of 1. Probability Sampling Techniques - every
values in an interval between any member of the population has equal chance to
two specific values be selected. (S3C)
(E.g. temperature)
a. Simple Random Sampling
- most basic random technique of sampling methodology. It arises
- the basis for other random sampling from the failure to collect data on
techniques all items in the sample and results
- every item of a population has the chance to be in nonresponse bias. Follow-up is
selected required for nonresponses after a
- every sample of a fixed size has the same specific period because not
selection as every other sample of that size. everyone will respond to your
surveys as others will do.
b. Systematic Random Sampling
3. Sampling error - happens when
there are variations or chance
c. Cluster Sampling
differences from sample to
d. Stratified Sampling sample.
- more efficient from simple random 4. Measurement error - three
- you are confident that there is a representation sources of measurement error are
of items across the entire population ambiguous wording of questions,
the Hawthorne effect, and
respondent error.
2. Non-probability Sampling Techniques - the
researcher selects samples based on the
subjective judgement. (ConJuSQou)
CENTRAL TENDENCIES
a. Convenience Sampling
- convenient for the researcher since the Median
samples are easy. - the measure of the location or centrality
- inexpensive of the observations
- rank your data from smallest to largest
b. Judgmental or Purposive Sampling and look for the middle value
- the opinions of the preselected experts are - not affected by outliers
essential
- you cannot generalize the results of their Mean
opinion - their average
- e most common measure in a central
c. Snowball Sampling tendency
- X-bar represents it
d. Quota Sampling - summing all the values of data or
observation and divided by the number of
Survey Error observations
1. Coverage error - occurs if there a) Sample Mean- The sample mean
are groups of items excluded from is the sum of the values in a
sample divided by the number of
the sampling frame, and they have
data points in the sample.
no chance to be selected.
Coverage error results in a
selection bias.
2. Nonresponse error -

b) Population mean= population divided by


the population size, N.
Nonresponse to sample surveys is
one of the most serious problems Mode
that occur in practical applications
- the value that occurs most frequently
- Like the median and unlike the mean,
extreme values do not affect the mode

Range
- get the difference between the largest
observation and the lowest observation
- sets the boundaries 1. Sample variance came from your sample size,
while population variance came from the
Quartile population.
- divides the number of data points into 2. They also differ in the computation of the
four equal parts, or quarters denominator. If you compute for the sample
- 1st quartile is the middle number mean, sample variance, and even sample
between the smallest number and the standard deviation, we use n-1 in the
median of the data set. denominator instead of n. The reason is that
- Quartile 2 is sometimes called the using n in the denominator sample formulas
median results in a statistic that tends to underestimate
- Quartile 3 is the largest number the population.

Interquartile range Standard Deviation


- more resistant to outliers - measures the spread of your data set
- it contains information only about the from the mean
difference between the upper and lower - If the data points are far from the mean,
quartiles the more spread out the data, and its
- can be computed by using the formula standard deviation is high.
Q3 - Q1. - If the SD is lower, it means the data points
are close to their average value.
Variance
- the average squared deviation or a) Sample Standard Deviation - measures
difference of the data points from their the spread or dispersion of the sample
mean data set. It is represented by (S)
a) Sample Variance - the sample
variance is the sum of the squared
differences around the mean
divided by the sample size minus
1.
b) Population Standard Deviation -
measures the spread or dispersion of the
population data set. It is represented by
(σ)

b) Population Variance - the The Coefficient of Variation (CV)


population variance is the sum of - measures the variation in percentage
the squared differences around rather in terms of units of data.
the population mean divided by - measures the spread of the data relative
the population size, N to the mean by computing the relative
variability
Difference:
- the ratio of the standard deviation to the numerical variables (X and Y).
mean (average)

Coefficient Of Correlation
Shapes
- indicates the relative strength of
- measures of shape are tools that can
describe the shape of a distribution of the linear relationship between
data. two numerical variables. This
correlation means that as one
variable changes in value, the
other variable also changes either
increases or decreases.
1. sample coefficient correlation
of the sample data is
represented by r. When you use
sample data, the coefficient of
a) Skewness can be seen on the tail correlation is unlikely to be
of your curve. It could be exactly
rightskewed or left-skewed +1, 0, or -1 as compared to the
b) Kurtosis represents the peak of a population data.
distribution
i. Leptokurtic distributions - if
the peak of distributions is
high and thin.

Strength: is positively related to the correlation


coefficient.
ii. Platykurtic distributions - if - A perfectly linear relationship,
the peak of distributions is either extreme +1 or extreme -1,
flat and spread out. had a strong strength. Though in
actual practice, you cannot see
this type of perfect relationship in
the data set.
- The zero coefficient presents no
strength of the relationship.
- Not perfectly linear are those
iii. Mesokurtic distributions - correlation coefficients whose
these are the shape of
normal distribution.

Covariance
- measures the strength of the
linear relationship between two
value is in between 0 and +1/-1.
There is a relationship, and the
strength of the relationship
depends on how closely the data
points to the line.
-
Direction: You can determine the
direction of your graph based on the sign
of the correlation coefficient.
- In positive coefficients, as the
value of one variable increases,
the other variable increases, and
there is an upward slope
- In negative coefficients, as the
value of one variable increases,
the value of the other variable
decrease, and there is a
downward slope of the graph.

You might also like