Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

STATISTICS IN RESEARCH 3. Stratified Sampling.

If a population has distinct groups, it is possible to divide the

population into these groups and to draw SRS’s from each of the groups. The groups are
Statistics refers to numerical facts. Examples of statistics in this sense are:the number that called strata. Strata are designed so that members in each strata are more homogeneous, that
represents the income of a family, the number of students enrolled in a class and the like is, more similar to each other. The results are then grouped together to form the sample. This
technique is particularly useful in populations that can be stratified into groups by gender,
Statistics is a group of methods that are used to collect, organize, present, analyze, and race, geography.
interpret data to make desicions. –( refers to the field of study)
4. Cluster Sampling. This method uses intact groups called clusters. Suppose a medical
Types of Statistics researcher wants to study the patients in Metro Manila. It would be very costly and
time-consuming to obtain a random sample since they would be spread over different parts
Descriptive Statistics – consists of methods for organizing, displaying, and describing data of Metro Manila. Rather, a few hospitals could be selected at random and the patients in
by using tables, graphs, and summary measures. these hospitals would be studies in a cluster.

Inferential Statistics – consists of methods that use sample results to help make prediction. Basic terms
This is also called inductive reasoning or inductive statistics.
Element or Members of a sample or population is a specific subject or object about which
Population Versus Sample the information is collected.
Variable is a characteristic under study that assumes different values for different elements.
Population – consists of all elements (individuals, items, or objects) whose characteristics Observation or measurement is the value of a variable for an element.
are being studied. The population being studied is called the target population. Parameter is any characteristic of a population and is measurable.
Data are numbers or measurements that are collected as a result from observation,
Sample – a portion of the population selected for study interview, questionnaire, experiment and so forth.

A sample that represents the characteristics of the population as closely as possible is called a
representative sample. Types of Variables

A sample drawn in such a way that each element of the population have equal chances of A. Quantitative Variable – a variable that can be measured numerically.
being selected is called random sample.
1. Discrete Variable – a variable whose values are countable.
Four Basic Methods of Sampling
2. Continuous Variable – a variable that can assume any numerical value over a certain
1. Random Sampling. This is done by using chance methods or random numbers. For interval or intervals
example, number each subject in the population. Place each number in a bowl, and select as
many card numbers as needed. The subjects whose numbers are selected composes the B. Qualitative or Categorical Variable – a variable that cannot assume a numerical value
sample. but can be classified into two or more nonnumeric categories.

2. Systematic Sampling. This is done by numbering each subject of the population and Frequency Distribution
then selecting every kth number. For example, there are 5000 families in a city. Fifty
families are needed as sample for an experiment. Since 5000/50 = 100, then k = 100. This The frequency distribution is an arrangement of the data which shows the frequency of
means that every 100th subject would be selected. However, the first subject would be different values or groups of values of a variable. It can be done direct from the raw data.
selected at random from subjects 1 to 100. Suppose the subject 88 was selected, then the
sample would consist of subjects whose numbers were 88, 188, 288, and so on until 50 Class Frequency – refers to the number of observation belonging to a class interval of the
families were obtained. number of items within a category.
Class interval – is a grouping or category defined by a lower limit and upper limit such as Convert each frequency to percentage by multiplying it to 100%
12-14; 15-17, 18-20 and so forth
In the class interval 21-23 for example, 21 is the lower limit and 23 is the upper limit.
Four Basic Methods of Sampling
Class Marks – are the midpoints of the classes and they are found by adding the lower and
the upper limit and dividing by 2. Random Sampling-this is done by using chance methods or random numbers.

Range – the difference between the highest and lowest values in the set of data. Systematic Sampling-This is done by numbering each subject of the population and then
selecting every nth number.
Suggested Steps in the Construction of Frequency Distribution Example: There are 5000 families in a city. Fifty families are needed as sample for
an experiment. Since 5000 ÷ 50 = 100, then n = 100. This means that every 100th subject
1. Find the range. would be selected. However, the first subject would be selected at random from subjects 1 to
2. Determine the tentative number of groups (called classes) to use. The maximum number 100.
of classes is 15-20 no matter how many observation there are. The ideal number of class
interval is somewhere between 5 and 15. Stratified Sampling-If a population has distinct groups, it is possible to divide the
3. Determine the approximate size of the class interval by dividing range by the desired population into these groups and to draw SRS’s from each of the groups. The groups are
number of class intervals. called strata.
4. Write the class intervals starting with the lowest lower limit as determined by researcher’s
choice. Measures of Central Tendency
5. Determine the class frequency for each class interval by referring to the tally column.
6. Compute for the class mark. The number which gives a summary of the characteristics of a given set of data is called the
measure of central tendency or measure of central location.
Summation Notation-is used to denote the sum of values. (Ʃ) - symbol -such measures of central tendency can be computed in two ways, one in
ungrouped data form and grouped data.
Cummulative Frequency - is obtained by cumulating absolute frequencies.
Ungrouped Data or Raw Data – are those data which are not yet organized or arranged into
Relative frequency Distribution is a tabular arrangement of data showing the proportion in frequency distribution.
percent of each frequency to the total frequency.
1. Arithmetic Mean

Graphical devices to represent a frequency distribution The arithmetic mean or arithmetic average is defined as the sum of the values of a variable
divided by the number of observations.
Histogram – consists of a set of rectangles having bases on a horizontal axis which centers The symbol for the sample mean is x bar (x), and for the population mean is the Greek letter
on the class marks. The base widths corresponds to the class size and the heights of the mu (μ)
rectangles correspond to the class frequencies.
X or μ = Ʃx/N or n, where x bar is the sample mean,
Frequency Polygon-is a line graph of class frequencies plotted against the class marks. It is μ is the population mean,
made by connecting the midpoints of the rectangular tops in the histogram or joining the N or n is the total number of items in the population or in the sample,
plotted points for the class marks and their corresponding frequencies. x is the observed value
Ʃ is the summation notation
Pie Graph- It is a circle that is divided into sections of wedges according to the percentage
or frequencies in each category of the distribution.
Since there are 360° in a circle, the frequency for each class must be converted into a
proportional part of the circle. This conversion is done using the formula,
degree = f/n (360°) where f is the frequency and n = sum of the frequencies.
Mean for Grouped Data
i = class interval or class width
To compute for the arithmetic mean of grouped data, we need to determine the midpoint of
each class interval. The mean assumed that the class mark of each class is the average value Note: the modal class is the class interval with the highest frequency
of all items falling in that class.
1. Long method x = Ʃfx/n where:
x = sample mean HYPOTHESIS TESTING
x = the class midpoint or class mark
f = the corresponding frequencies Hypothesis testing is a decision-making process for evaluating claims about a population.
n = the total number of items
The z-test and the t-test are the statistical tests for hypothesis testing.
Null hypothesis (Ho) – states that there is no difference between a parameter and a specific
The median of ungrouped data arranged in array (increasing or decreasing order of value
magnitude) is the middle value when the number of items is odd or the arithmetic average of
the two middle values when the number of items is even. The median is usually denoted by Alternative hypothesis (Ha) – states a specific difference between a parameter and a
Mdn. specific value

Median for Grouped Data Possible sets of statistical hypothesis

Mdn = Lmd + [(n/2 – cf)/f] i 1. two-tailed test

where, Mdn = median Ho : parameter = specific value
Lmd = lower class boundary of median class Ha : parameter ≠ specific value
n = total number of observations 2. left-tailed test
f = frequency of the median class Ho: parameter = specific value
cf = cumulative frequency preceding/before the Ha: parameter < specific value
median class 3. right-tailed test
i = class size Ho: parameter = specific value
Ha: parameter > specific value
When to reject and to accept Ho
The mode of a grouped data is defined as the midpoint of the class interval with the highest
frequency (modal class). The mode obtained in this manner is called a crude mode, because reject Ho accept Ho
it is just a rough approximation of the actual mode. So, to determine the true mode, we use Ho is true Type I error correct decision
the formula Ho is false correct decision Type II error

Mo = LMo + [d1/(d1 + d2)] i Steps in hypothesis testing

Where, Mo = mode 1. State the null and alternative hypothesis.

LMo = lower class boundary of modal class 2. Select the level of significance. (0.10, 0.05, 0.01) Note: the level of significance is the
d1 = difference between the frequency of the modal class and the frequency maximum probability of committing a type I error.
of the next class lower in value 3. Determine the critical value and the rejection region/s.
4. State the decision rule.
d2 = difference between frequency of the modal class and the frequency of 5. Compute the test statistic.
the class next higher in value 6. Make a decision, whether to reject or accept the null hypothesis.
The z-test Test of Hypothesis (Two Population)

It is a statistical test for the mean of a population. It can be used when the sample size is The null hypothesis under test is,
greater than or equal to 30, or when the population is normally distributed and standard Ho: μ1 = μ2
deviation is known. The formula is, The test statistic appropriate for the purpose is
z= x–y
√ σ12 /m + σ22 /n
m = sample for group 1
Where x –sample mean n = sample for group 2
μ – hypothesized mean x = mean of group 1
σ – population deviation y = mean of group 2
n – sample size σ2 = variance of group 1 and 2

The t-test Paired t-Test

The t-test is a statistical test for the mean of a population and is used when the population The method of paired t-test reduces the problem of comparing the means of two populations
is normally or approximately normally distributed, standard deviation is known, and n<30. to that of a one-sample t-test.
The formula for the t test is
A common method of obtaining matched pairs is to match an experimental unit with itself,
leading to what is widely reffered to as a “before-after” study.

t = d/(sd/√n)
The degrees of freedom are d.f. = n – 1 Where:
d = ∑d/n, d=difference of the pair
The formula for the t test is similar to the z test. But since the population standard deviation
is unknown, the sample standard deviation is used. The critical values for a t test are found in sd = ∑d2 – (∑d)2/n
a t table. n–1

Sample Size

It is necessary to determine the size of the sample to make an accurate estimate. It depends
on the maximum error of estimate, the population standard deviation, and the degree of
Measures of Variation


The range is the simplest measure of variation to calculate. It is just the difference between
the largest and the smallest value in a given set of data. A much larger range suggests greater
variation or dispersion.

The range has a disadvantage of being influenced by extreme values called outliers. Another
is that it is based on two values only. All the other values in the set are being ignored.


The standard deviation is the most commonly used measure of variation. The standard
deviation indicates how closely the values of a given data set are clustered around the mean.

A lower value of the standard deviation means that the values of that given data set are
spread over a smaller range around them. On the other hand, a large value of the standard
deviation means that values of that data set are spread over a larger range around the mean.

You might also like