Professional Documents
Culture Documents
Statistical Biology Module
Statistical Biology Module
Overview:
Statistics affects many facets of our lives. In every life, whether at home or at work,
we usually keep records and read reports. An item is a record, or report is a fact that
expressed in terms of a numerical value or described by its quality or kind. The single item
or fact is referred to as a datum, such as color of the leaves, the number of students in the
class, the height and width and the number of bacterial colonies are all example of data.
And how to deal with it is the major concern of statistics.
Objectives:
1. Define biostatistics and identify its importance.
2. Explain the methods of collecting statistical data and variables.
3. Discuss different sampling techniques
Content:
Statistics is a science that deals with the collection, organization, analysis, interpretation and
presentation information that can be stated numerically.
Major areas of Statistics:
1. Descriptive Statistics- this includes anything done to the data which is designed to
summarize or describe, without going any further; that is without attempting to infer
anything that goes beyond the data themselves.
2. Statistical Inference- comprises the methods concerned with the analysis of a subset
of data leading to predictions or inferences about he entire set of data.. analysis
requires the generalization which go beyond the data.
Biostatistics Is statistics applied to the biological sciences.
Perhaps the most difficult of statistics is the logic associated with inductive Inferences, yet all
scientific evidence is based on this type of statistical inference. The same logic is used, though not
always explicitly, when a physician practices medicine: what is observed for a large group of patients
to make a specific decision about that particular patient.
When taking a clinical history, conducting a physical examination, or requesting laboratory analyses,
radiographic evaluations or test, a physician is collecting information (data) to help choose
diagnostic and therapeutic actions. The decisions reached are based on knowledge obtained from
training, from literature, from experience, or from some similar sources.
General principles are applied to specific situation at hand in order to reach the best decision
possible for a particular patient. Much of the basic medical training centers around deductive
reasoning
This type of reasoning- from the general to the specific- is called deductive reasoning.
We conduct experiments and comparative studies to focus on questions that arise from our work.
We study few patients ( or experimental animals), and from what we observe we try to make
rational inferences about what happens in general. This type of reasoning- from the specific
subject(s) at hand to general. This type of reasoning is called Inductive Reasoning. This approach to
medical research- pushing back the bounds of knowledge concerning human health- follows what is
known as the Scientific Method, which has four basic steps.
1. Making observation………i.e.., gathering data
2. Generating a hypothesis………the underlying law and order suggested by the data
3. Deciding how to test the hypothesis……what critical data required?
4. Experimenting ( or observing) –this leads to an inference that either to rejects or affirms the
hypothesis. If the hypothesis is rejected, then we go back to step 2.
If it is affirmed, this does not necessary mean it is true, only that in the light of current
knowledge and methods it appears to be so. The hypothesis is constantly refined and tested
as more knowledge becomes available.
All data collected from biological system have variability, the statistician is concerned with
summarizing trends in data and drawing conclusions in spite of the uncertainty by variability in the
data. An understanding of statistics will enhance your ability to interpret data, whether for the
purpose of treating a particular patient or for drawing general conclusions from a research study, as
well as enable you to distinguish fact from fancy in everyday life.
Summary:
Biostatistics deals with the collection, organization, presentation, analysis and interpretation of
biological information that can be stated numerically.
Activity:
A. Suppose that a set of measurement represent the total rainfall in the province of Sultan
Kudarat during the month of July has been recorded for the past 15 years. Any values
describing the data.
Write descriptive or Inferential statistics in the following value based on the data above.
1. The average rainfall within 15 yrs is 3.0 cm.
2. For 15 years , Month of July have rain.
3. Next July we expect a rain.
4. This July 2021 we will expect between 3.2 and 3.4 cm of rain.
B. 1.Decide what reasoning must be employ in the situation below in order to give diagnoses
and treatment. Discuss why?
a. Stroke patient
b. Yellowing of leaves of your potted plant.
c. Swelling of gums and painful tooth.
C. Differentiate the following:
1. Statistics and Biostatistics
2. Deductive and Inductive reasoning
3. Descriptive and Inferential statistics
Overview:
The basic unit of statistical analysis is data. There are generally two types of data and there is no
formula for selecting the best method to be used in gathering data. It depends on the researcher’s
design of the study, the type of data, the time available to complete the study, and the financial
capacity.
Objectives:
Content:
Classification of Data
1. Quantitative Data-data that can be expressed in numbers. These are the things that can be
measured like weight, length, number of colonies, mortality rate and etc.
2. Qualitative Data- are facts for which no numerical measure exists. They are usually
expressed in categories or kind. Example are color of the skin, which could be black, brown
or white; a person’s sex, which is male or female; It may be presence or absence of metallic
sheen in the colony of the bacteria; and others
In order to assure the accuracy of data, one must know the right sources and methods of collecting
them.
1. Primary Data- it refers to the information which are gathered directly from an original
source, or which are based on the direct or firsthand experiences.
2. Secondary Data- refer to the information which are taken from published data which are
previously gathered by other individuals or agencies or data which comes from other
sources other than the respondents.
1. Interview Method- person to person exchange between the interviewer and interviewee.
2. Questionnaire Method- written response are given to prepared questions. A questionnaire is
a list of questions which are intended to elicit answers to the problem of a study.
Questionnaire may be mailed, send online or hand carried.
3. Registration Method- method of gathering information is enforced by certain laws. Examples
are the registration of birth, deaths, motor vehicles, marriages and licenses.
4. Observation method- the investigator observes the behavior of persons or organisms and
their outcomes. This is usually used when the subjects cannot talk and write.
5. Experimental Method- this method is used when the objective is to determine the cause and
effect relationship of certain phenomena under controlled condition. Scientific researchers
usually use the experimental method.
Collected data must be organized in order to show significant characteristics. They can be
presented in three forms
Kinds of graphs
a.Bar graph
b. Pie graph
c. Line graph
Variable is a numerical characteristic or attribute associated with the population being studied.
Types of Variables
1. Categorical or qualitative variables are classified according to some attributes or categories
Ex. Gender, religion, blood type, civil status…
Categories may be ordered which may or may not assigned specific numerical values
such as: Performance Rating ( poor, fair, good, very good, excellent). IQ score ( low, average,
high)
2. Numerical – valued or quantitative variables are variables that are classified according to
numerical characteristics such as height, age, pulse rate, number of children,
speed.Numerical-valued variables are often grouped into class intervals.
Ex. Age in year- 5-9, 10-14, 15-19 and 20& above.
Height in cm- 100-149, 150-199, 200-249
Scales of Measurement
In selecting the statistical tool to be used for drawing inferences on a random sample, the type of
measurement scale must be carefully chosen. Measurements are classified into four.
1.Nominal scale - is a measurement scale that classified elements into two or more categories or
classes, the numbers indicating that the elements are different but not according to order or
magnitude.
Ex.
Table 1. Distribution of Medical Students of University of the Philippines Grouped According to Race
And Civil Status
Race Single Married Widow/er Separated Total
American 10 5 0 1 16
Chinese 29 8 5 10 52
Japanese 18 11 1 3 33
Filipino 32 3 4 20 59
Total 89 27 10 34 160
The medicals are classified according to race and civil status.
2.Ordinal Scale - is a measurement scale that ranks individuals in terms of the degree to which they
possess a characteristic of interest.
Ex.
Table2. Anxiety Level of Patients with Mental Disorder on Hospital Q.
Sex 0 1 2 3 Total
Male 9 16 2 1 28
Female 21 10 4 7 42
Toatal 30 26 6 8 70
Legend: 0 = not anxious
1 = low anxiety level
2 = moderate anxiety level
3 = high anxiety level
3. Interval Scale – Interval is a measurement scale, in addition to ordering scores from high to
low. It also establishes a uniform unit in the scale so that any equal distance between two
scores is of equal magnitude. Aptitude scores from 80 to 90 are of equal difference as
aptitude scores from 90-100 ( both being equal to 10.)
4. Ratio Scale – Ratio is a measurement scale in addition to being an interval scale, that also has
absolute zero in the scale.
Summary:
SCALE of Measurement
Application:
1. 25 ft.
2. Medium size
3. 30%
4. 6 meter
5. 4 colonies
6. Male
7. Absent
8. 100 seeds
9. Blue eyes
10. 500 acre
Overview:
Analysis of data in research work requires that the number of population should be determined and
specified if possible, so that the required sample size can easily be calculated based on sampling
techniques and research designs. If the population is small, it is sometimes convienient to obtain the
information by collecting the data for the whole of the population (total enumeration). However, if
the population is large, more time and money can be saved by measuring only a sample drawn from
the population. When the measurement is destructive, sampling is of course unavoidable for
obvious reason.
Objectives:
At the end of the lesson, you should be able to:
1. compute the sample size;
2. enumerate the different sampling methods;
3. identify the use of different sampling methods in data collection.
Content:
Population – is the group of all study units about which a particular investigation may provide
information. Population is denoted by “μ”
Target population – is the whole group of study units to which we are interested in applying our
conclusions.
Study population - is the group of study unit to which we legitimately apply our conclusion.
Sample – a subset or a representative part of the population; hence, the sample must possess the
same characteristics of the population. Sample size is denoted by “n”.
Sampling
Sample
Population Inference
Types of Sampling:
1. Non- Probability or Judgment sampling
Sampling is based on a judgment selection of “typical” or representative elements of the
population under study considering an arbitrarily set criteria.
1.1. Purposive Sampling – a sample is drawn from the population where what constitute the
representative elements or sample is already a preconceived idea.
1.2. Quota Sampling – sample is drawn for convenient and on the basis of a quota.
1.3. Sampling is done haphazardly
1.4. Sampling which involves volunteers
1.5. Convenience or Accidental sampling – Sampling where elements of the sample are those
that are readily accessible to the sampler.
2. Probability Sampling – Sampling with a definite set of rules and procedures for drawing the
sample is being followed. It allows one to evaluate the probability of each element to be
part of the sample, even prior to drawing the actual sample. Probability samples are suitable
to statistical analysis and scientific research.
2.1. Simple Random Sampling – sampling actually drawn from the the whole population,
without replacement and with equal probability of selection for every possible sample.
Methods of simple random sampling are:
a. The box method
b. Use of the table of random number
c. Use of computer software package of random number generated.
2.2. Systematic Sampling – a method of sampling wherein a sample is drawn by taking say
every K- the unit in the population starting from the ith unit drawn at random. This is
used when there is ready list of the total population. Most practical way of sampling.
2.3. Stratified Sampling – a sampling procedure wherein the population is divided into non
overlapping strata. These strata is homogeneous and a random sample is drawn
independently from each stratum. This scheme is used to that different groups of a
population are adequately represented in the sample.
2.4. Cluster Sampling – the total population is divided into a number of relatively small
subdivision and some of these subdivisions or clusters are randomly selected for
inclusion in the overall sample.
2.5. Multi-stage Sampling the technique uses several stages or phases in getting the sample
from the general population. However selection of the sample is still done at random. It
is useful in conducting nation - wide survey involving a large universe.
Sample size is advisable if the population is equal to or more than 100. But it is inapplicable to a
population less than 100. Total population or census is advisable for population less than 100 for
categorization purposes. To have a scientific determination of sample size, the formula below was
suggested by Calmorin and Calmorin(1997 )
Ss= NV + { S2 + (1-p) }
NSe + { V2 + p(1-p) }
Where:
Ss = Sample size
N = Total number of population
V = The standard value (2.58) of 1 percent level of probability with 0.99 reliability.
Se = Sampling Error (0.01)
P = The largest possible proportion (0.50)
For instance, if the total population is 500, the standard value at 1% level of probability is 2.58 with
99% reliability with a sampling error of 1% or 0.01, and the proportion of a target population is 50%
or 0.05; then the sample size is computed as follows:
Given:
N = 500
V = 2.58
Se =0.01
P = 0.50
Ss= NV + { S2 + (1-p) }
NSe + { V2 + p(1-p) }
S= 193.57 or 194
The sample size of 500 is 194 which represents the subject of the study.
Summary:
In gathering statistical information for data analysis, the researcher:
1.must identify first the subject of the study.
2. delimit of determine the scope and coverage of the subject of the study.
3. determine their population and sampling size.
4. determine the sampling methods or techniques to be utilized.
5. prepare the necessary data gathering instruments for purposes of investigation.
There are two types of samples: the probability sample and the nonprobability sample.
Activity:
Choose the best answer among the choices.
1.The best random sampling design because every individual in the population has equal chance of
inclusion in the sample is
a. Stratified random sampling
b. Simple random sampling
c. Restricted random sampling
2.The sampling design in which all individuals in the population are arranged in methodical manner
and the nth name may be chosen in the construction of the sample is
a. Systematic sampling
b. Stratified random sampling
c. Unrestricted random sampling
3.The sampling design based on selecting the individuals as samples according to the criteria of the
researcher which serve as controls is
a. Quota sampling
b. Incidental sampling
c. Purposive sampling
d. Cluster sampling
4.The sampling design which is intended to improve the validity of the sample and is applicable
when the population being studied is homogeneous is
a. Cluster sampling
b.Simple random sampling
c. stratified sampling
10.Sampling design in which the population is grouped into small units such as blocks or districts is
a.Purposive sampling
b. Quota sampling
c. Cluster sampling
12.Sampling design in which the researcher simply takes the closest individuals as subjects of the
study because they are most available is
a. Quota sampling
b. Purposive sampling
c.Cluster sampling
14. The sampling design which is popular in the field of opinion research is
a. Incidental sampling
b. Cluster sampling
c. Quota sampling
II. Compute the sample size of the following population. Show your solution.
1.230
2. 340
3. 570
4.890
5. 2,300
CHAPTER II Organization and Presentation of Data
Overview:
Gathered data can be made more interesting by presenting them in the form of graphs and
tables. For instance, the readers do not appreciate reading a statistical report on the current
population of the different countries in the world. If the report is just a list of numbers from a
paragraph to another paragraph.
Data types that are tabulated are the frequency distribution, correlated data and time series data.
There is no need to construct the frequency distribution if the table of observations is less than 30.
Data that are presented in frequency distribution table form are called grouped data and those that
are not are ungrouped data.
Objectives
1.present the data into different forms.
2. determine the appropriate graph for a particular information.
3. construct a frequency table.
Overview:
After applying the different methods of collecting data, the raw data gathered from primary or
secondary sources should be organized and presented in summarized form. This lesson focuses on
the different forms of data presentation, and the different types of graphs and charts.
Objectives :
2.Tabular. This form of presentation uses statistical table that shows the data in a
more concise and systematic manner. The table facilitates the analysis of
relationships of data.
3.Graphical: This form of presentation is the most interesting and the most effective means of
organizing and presenting statistical data. The important relationships of data can be easily seen
merely looking at colorful figures that are creatively designed
Different types of graphs/charts
A. Area. This type of chart displays graphically quantitative data. It is based on the line chart.
The area between axis and line are commonly emphasized with colors, texture, and
hatchings. Commonly one compares two or more quantities with an area chart
B. Bar. This type of data presentation is composed of bars or rectangular prisms of equal
widths. It can be horizontally or vertically in single or paired bar graphs. The length of each
rectangle is proportional to the frequency of observed item or magnitude of class under
interval of item being studied. Information can easily be drawn by reading this graph in a
two-way dimension. It can be made more interesting especially if different colors will be
used or different shades will be applied to give distinction for each bar. In some cases, bars
can be drawn in opposite directions to illustrate contrasting situation.
Bar chart with vertical bars. Bar chart with horizontal bars.
Categories are on the x-axis Categories are on the y-axis
b.Column. This is a data visualization where each category is represented by a rectangle, with the
height of the rectangle being proportional to the values being plotted. Column charts are also known
as vertical bar column.
Pie Chart. This represents relationships of the different components of a data. It is the ideal graph if you want to
show the partition of a whole. The angles or sectors should be proportional to the percentage components of the
data. The use of different color or legends will be helpful to identify each component easily.
a.Doughnut. This is a built-in chart type. Doughnut charts are meant to express a “part-to-whole” relationship,
where all pieces together represent 100%. Doughnut chart work best to display data with a small number of
categories.
C.Line Graph. This type of data presentation shows relationships between two sets of quantities. This type is
often used to predict growth trends such as sales and population for a long period o\\\\\\\\\\
D.Scatter. This type illustrates the relationships between two variables, points are plotted in a Cartesian plane. It
is like making a line graph except that there is no need to connect the points.
Ice Cream
©sweetspot.com
To facilitate in making the graphs, you can use the Microsoft Excel to create your chart.
This will guide you through the steps of selecting the chart type, adding chart titles and
labels. Before starting to use the Microsoft excel select the data, or range that you want
to convert into chart. The following discussion is a step-by- step procedure on how to
create a chart.
Example:
Six Months birth of Female and Male babies.
X Y
20 35
30 25
40 65
50 45
60 50
70 80
1. Select the range A1:A7. Hold down the Ctrl key and then select the
range B1:B7. (Both ranges of data will appear on the chart)
2. Click the Insert button on the formulating toolbar. Then click the
recommended charts box will open as shown in figure 2.1.
3. Click the All Charts if you want to view all the types of charts. Click the
Column or any type of chart you want to use in the Chart type list,
and then select the first chart sub- type in the second row. Click the
Press and Hold to View Sample button inbox will open as shown in
the dialog box. At this point you will see how your chart will look like.
Note: You can select the data you want in the chart and press ALT+F1 to
create a chart immediately, but it might not be the best chart for the data,
if you don’t see a chart you like or want to use, select the Change Chart
Type or All Charts tab to see all charts types.
STEPS IN INTERPRETING GRAPHS, CHARTS, AND TABLES
1.Read the title of the graph, chart, or table. The title tells what information is being displayed.
2. Look at the legend of the graph, chart, or table. It will explain symbols and color use in the graph or chart.
3.Read the label of the graph, chart, or table. The labels tell you what variables or parameters are being
displayed.
4. Draw conclusions based on the data. You can reach conclusions faster with graphs or
charts than using a data table or a written description of the data.
Summary:
Chart Type with Description
Important characteristic of a large mass of data can be readily assessed by grouping the data into different classes
and then determining the number of observations that fall in each of the classes. To obtain information quickly
from the numerical data, the data must be organized in some systematic fashion such as in a form of frequency
distribution.
A frequency table is a device for organizing and representing grouped data. When the data contains more than 30
cases, a frequency distribution table is constructed to make the task more manageable and to save time in
calculating different statistics. Following steps in constructing frequency table is helpful.
Objectives:
Content:
Step 2: Determine the class size (𝑖) by dividing the range by the described number of class intervals. The
number of classes for a frequency distribution table varies from 5 to 20, depending mainly on the number
of observations in the data set. It is preferable to have more classes as the size of a data increases. The
decision about the number of classes is arbitrarily made by the data organizer.
Step 3. Determine the number of observation falling into each class interval, find the class frequencies.
This is done by using a tally or score sheet.
Example:
Construct the frequency distribution table of the data of the ages of patients in Hospital Q, May 2000
25 28 27 30 32 25 31 26 29 6
31 20 21 32 18 50 53 60 50 54
45 40 37 25 20 27 32 24 29 30
25 24 10 12 15 28
Solution:
Steps:
2. Determine I (class interval). Divide the range by convenient number of classes having the same size.
Example we chose 6 as the number of classes(n).
i= 54/6
i=9
3. Construct a frequency distribution table having a class size of 6 and class interval of 9.
4. Tally the data and determine the frequency of each class.
5. Determine the class mark or mid-point and class boundaries.
Since the lowest number is 6, this becomes the lower limit of the first class interval. 14 is the higher limit of this
interval (9-1) or 8 then add to 6. The lower limit of the next class interval is to add the value of the class interval
(i), in this case i is 9. Therefore add 9 ( 6+9=15+9=24…33,42,51,60. The same procedure also to find the upper
limit of the next class intervals 14+9=23+9=32…41,50,59,68
a.Class Mark
It is the mid-point of the class interval. Add the lower limit and the upper limit then divide by two.
b. Class Boundary
It is also known as the exact limit, and can be obtained by subtracting 0.5 from the lower limit of each interval
and adding 0.5 to the upper limit. (Refer to table 1.1).
After the data has been collected and tabulated, the next step is to sketch the graph to make the data more
presentable, easier to understand and more appealing and pleasing to the reader.
FREQUENCY Histogram ( refer to table 1.1).The frequency is represented by points in the vertical axis and the
class interval in the horizontal axis. The ordered pair of points in the vertical and horizontal axes is plotted by
placing the bars in the graph area.
20
18
FREQUENCY
16
14
12
10
8
6
4
2
0
AGE
6 to 14 15 to 23 24 to 32 33 to 41 42 to 50 51 to 59 60 t0 68
FREQUENCY POLYGON- Points connected by line segments are utilized in the Frequency Polygon.( Refers to the
table 1.2. above)
25
20
FREQUENCY
15
10
0
6 to 14 15 to 23 24 to 32 33 to 41 42 to 50 51 to 59 60 t0 68
AGE
Cumulative Frequency Ogive is commonly used in statistical reports and text.
40
35
Frequency
30
25
20
15
10
5
0
6 to 14 15 to 23 24 to 32 33 to 41 42 to 50 51 to 59 60 t0 68
Age
60
relative frequency in percent
50
40
30
20
10
0
6 to 14 15 to 23 24 to 32 33 to 41 42 to 50 51 to 59 60 t0 68
Age in years
Summary:
Definition and advantage of frequency distribution is to condense and simplify data without losing the
essential details. The frequency distribution achieves condensation of data by losing the idetity of the
individual values. Despite this loss of identity, a great deal has been gained by this condensation:
1. It can be seen that all the information revealed by the array can be obtained from the frequency
distribution with greater ease.
2. The distribution not only shows clearly the condensation of the individual values to variables and
below the concentration.
3. The data formed into frequency distribution, comparison between two or more series can be made
more readily, frequency tables are indispensable for speeding up competitions of many other
descriptive measures.
Activity:
Construct a distribution of the following amounts of sulfur oxide(Kg) emitted by an industrial plant on 80
days
158 264 112 110 204 147 162 205 208 133
181 248 261 209 214 180 243 118 179 187
128 155 192 77 225 193 94 139 286 194
216 135 246 200 241 90 176 167 169 235
184 257 201 132 237 107 190 145 181 318
285 266 201 170 223 275 239 175 98 227
152 230 296 219 105 173 62 180 229 246
194 123 159 227 268 191 185 144 83 259
1. Construct a frequency distribution table and graph: Frequency polygon and Frequency Histogram
2. Construct the less than and greater than cumulative frequency ogive
3. Construct a relative frequency table and graph relative frequency polygon
Note: use 7 as a convenient number for number of classes
Chapter 3. Descriptive Statistics
Overview:
Descriptive statistics are used to describe the basic features of the data in a study. They provide
simple summaries about the sample and the measures. Together with simple graphics analysis,
they form the basis of virtually every quantitative analysis of data.
Objectives:
Objectives:
1. Determine the characteristics of the mean, median and mode.
2. Compute the mean, median and mode of the ungrouped data and grouped data.
3. Determine the use of mean, median and mode
Content:
Solution: ∑x
μ = ------
N
=500+450+460+450+400
5
= 2260 /5
= 452 patients
∑x
Sample mean = ------
n
=5+12+15+20+15+24+0+24+30+10+16+18
12
= 189/12
=15.75 hours
Median:
The median is the value of the middle term in a data set that has been ranked in decreasing or
decreasing order. As is obvious from the definition of the median, it divides a ranked data set
into two equal parts. The calculation of the median consists of the following two steps:
1. Rank the data set in increasing or decreasing order.
2. Find the middle term. The value of this term is the median.
Note that if the number of observations in a data set is odd, then the median is given by the
value of the middle term in the ranked data. However, if the number of observations is even,
then the median is given by the average of the values of the two middle terms. To illustrates this,
consider the following values:
Examples:
1.When the number of observations is odd, say n= 9. Find the median of: 13 43 23 20 51
64 49 80 55
Solution:
Arrange the data in ascending or descending order.
13 20 23 43 49 51 55 64 80
The median is the middle score of the data; therefore, 49 is the median.
2. When the number of observations is even, say n=10.
Find the median of: 34 56 89 42 26 14 28 56 78 98
Solution:
Arrange the data in ascending or descending order.
14 26 28 34 42 56 56 78 89 98
The median is the average of the two middle scores; therefore, getting the
average of the 5th and 6th values, 49 is the median.
Median = 42+56
2
Median = 49
Mode
In statistics, the mode is the value that occurs with the highest frequency in a
data set. If there is no common score, the said data has no mode. A
distribution with only one mode is said to be unimodal while a distribution
with two or more modes is described as multi-modal.
Examples:
1. Find the mode of the following scores:
0 1 3 5 3 3 8 9 3 4
Solution:
Simply get the value of the most frequent appearing value. The mode of the
given data is 3.
∑fx
Mean = ------
N
Where f is the frequency, x is the class mark or the midpoint, and N is the
Class Interval f X Fx
40 – 44 4 42 168
45 – 49 3 47 141
50 – 54 4 52 208
55 – 59 3 57 171
60 – 64 10 62 620
65 – 69 2 67 134
70 – 74 5 72 360
75 – 79 8 77 616
80 – 84 3 82 246
85 – 89 6 87 522
90 – 94 2 92 184
N=50 ∑fx=3,370
ˉx = ∑fx
3,370
=
50
= 67.4 is the mean of the scores of 50 students.
Median:
To compute the median from grouped data we also must determine the “less than”
cumulative frequency. The median is the sum of the lower limit of the median class and a
fractional part of the class interval size.
Md=Lm +¿)i
Where:
md = Median
𝐿𝑚 = Lower boundary of the lower limit of the median class
N = total frequency
¿ CF = Less than cumulative frequency below the median class
f = frequency of the median class
i = class interval
Table 3.1.1 Scores of students in Statistics
40 – 44 4 42 168 4
45 – 49 3 47 141 7
50 – 54 4 52 208 11
55 – 59 3 57 171 14
60 – 64 10 62 620 24
65 – 69 2 67 134 26
70 – 74 5 72 360 31
75 – 79 8 77 616 39
80 – 84 3 82 246 42
85 – 89 6 87 522 48
90 – 94 2 92 184 50
N=50 ∑fx=3,37
0
Md=Lm +¿)i
Md=64.5+¿)5
Md=64.5+¿)5
Md=64.5+(0.5)5
Md=¿67
Mode
The mode in a frequency distribution is within the class interval with the highest frequency.
The class interval with the frequency is known as the modal class. A crude mode may be
determined by taking the class mark with the highest frequency. However, this rough
approximation may be improved by considering the frequencies adjoining the modal class.
∆1
Mo = Lm + ( ¿i
∆ 1+∆ 2
Where:
Lm is the lower limit of the modal class (this is the class interval with
the highest frequency)
∆1 is the difference between the highest frequency and the frequency
above it.
∆2 is the difference between the highest frequency and the frequency
below it.
𝑖 is the class interval
Using the table 3.1.1
Class Interval f
40 – 44 4
45 – 49 3
50 – 54 4 ∆2
55 – 59 3
60 – 64 10
65 – 69 2
Modal Class
70 – 74 5
75 – 79 8
80 – 84 3 ∆1
85 – 89 6
90 – 94 2
∆1
Mo = Lm + ( ¿i
∆ 1+∆ 2
8
Mo = 59.5 + ( ¿5
8+7
Mo = 59.5 +2.67
Mo = 62.17
Summary:
A measure of central tendency is a summary statistic that
represents the center point or typical value of a dataset. These measures
indicate where values in a distribution fall and are also referred to as the
central location of a distribution. You can think of it as the tendency of
data to cluster around a middle value. In statistics, the three most
common measures of central tendency are the mean, median, and mode.
Each of these measures calculates the location of the central point using a
different method. Choosing the best measure of central tendency depends
on the type of data you have.
Activity:
Refer to table 3.1.2
A..Find the mean of AIDS cases of four hospitals.
2. Which hospital has the highest mean?
3. What year have the highest AIDS cases?
4. What year have the lowest case?
Table 3.1.2. AIDS Cases
Hospital A B C D
Yr. 1999 500 200 211 100
2000 400 350 250 100
2001 100 140 620 250
2002 80 140 401 300
2003 50 175 200 180
Class Interval F
95-99 10
90-94 20
85-89 25
80-84 28
75-79 12
70-74 8
Quartiles
K in the formula = to 1,2,3 and 4.
[ ]
kN
−¿ CF
4 i
Qk =LQk +
f Qk
Where: LQk = is the lower limit of the class interval where the quartile class is found.
¿ CF = is the less than cumulative frequency before the quartile class
f Qk = is the frequency of the quartile class
i = is the class interval
Deciles:
k in the formula = 1,2,3,…10.
[ ]
kN
−¿ CF
10
Dk = LDk + i
f Dk
Percentiles:
K in the percentile formula =1, 2, 3,…100.
[ ]
kN
−¿ CF
100
Pk =L Dk + i
f Pk
Summary:
The most common and most widely used point measure is the percentile.
In order that the obtained values from any set of observations have common frame
of reference that is meaningful, any of the foregoing point measures is applicable. If
an individual is given a percentage value of 75, this means that in a typical sample of
100, he would excel above 75 individuals of lower rank. If the researcher wants to
determine in the class limit fall one-fourth of the case, quartile is used and decile if it
is within tenth parts.
Activity:
A.Find the IQ scores belong to the following position: Q2, D4, D6, D3 , P10, P75 , P90
IQ Scores of 22 students.
87 90 95 96 97 98 99 100 100 100 100
88 102 102 103 105 105 105 107 108 110
B.1.Find the Q3, Q1, D5, D8, P25, P75, P50 of the distribution below. Write the
formula and the solution.
Class Interval Frequency
7-9 13
10-12 18
13-15 25
16-18 20
19-21 17
22-24 10
Overview:
The measures of variation enable us to know how varied the observations
are, whether they are extreme value in the distribution, or whether their
values are very close to each other. If the measure is zero, it means that there
is no variation at all. The observations are all alike, or homogeneous.
Otherwise, they are heterogeneous. The common measures of variation are
the range, variance, standard deviation and coefficient of variation.
Objectives:
1. Determine the measures of variability.
2. Describe the characteristics of each measures of variability.
3. Compute range, Mean absolute deviation, variance, sd, and CV.
Content:
Range
Range is the simplest form of measuring variation of a distribution. To get
the range, substract
the lowest score or observation from the highest score.
MAD=
∑ |x−ˉ x| for ungrouped data
N
MAD=
∑ f |x−ˉ x| for grouped data
N
Ex. Find the MAD of the ages of scientists: 34,35,45,56,32,25 and 40.
Solution:
Find the mean. 34+35+45+56+32+25+ 40/ 7 =38.1
X x−ˉ x |x−ˉ x|
34 -4.14 4.14
35 -3.14 3.14
45 6.86 6.86
56 17.86 17.86
32 -6.14 6.14
25 -13.14 13.14
40 1.86 1.86
Total 53.14
53.14
MAD = =7.59 … it means, that the age of scientist is 7.59yrs older or
7
younger from 38.1 yr. old scientist.
Variance
Variance is the other measure of variation which can be used instead of the range.
The variance considers the deviation of each observation from the mean. To obtain
the variance of a distribution, compute the deviation from the mean of each raw
score. Then, square the deviations from the mean and add them. Finally, divide the
resulting sum by N, or the total number of cases.
A. Grouped Data:
1. Population Variance for Grouped Data
2
Σ f ( x −ˉ x )
σ 2n =
N
2 2
2 N ∑ f x −( ∑ fx )
s N-1 =
N ( N −1 )
B. Ungrouped Data:
1. Population Variance for Ungrouped Data
2
Σ ( x−ˉ x )
σ 2N=
N
2
Σ ( x−ˉ x )
σ 2N-1 =
N−1
σ =√ σ 2 N
S=√ s 2n−1
Coefficient of Variation
Coefficient of variation is the measure of relative variability. It may defined
as the
ratio of standard deviation to the arithmetic mean. It is expressed in
percentage. This
measure is used to compare two sets of data to determine if they are
similarly or differently
“scattered”. The CV formula is:
standard deviation
CV= x 100%
mean
4
CVfemale = x100% =2.70%
148
Comparing the relative variation in the height of the male and female
students. It can be
seen that the male students have higher CV than the female
students. Thus, male
student’s height are more varied
Summary:
To facilitate easy and accurate computation of the mean,standard deviation
and the variance,
the scientific calculator may be used. Note: Every calculator work differently so
you should know
how your calculator works to perform the task.
Steps in finding the mean and the standard deviation for ungrouped data:
1. Clear the memory
The memory of your calculator may be cleared by picking thr reset button at
theback or by pressing the shift key then AC/ON , then the = button.
2. Set the calculator to the SD mode or its equivalent (Stat, Stat0, or Stat 1
mode).
3. Input the data one at a time followed by the M+ key.
4. When all the data have been stored, press the shift-keys followed by the =
button for the mean.
5. Using the shift-, the standard deviation is obtained.
6. The variance is simply calculated by squaring the standard deviation.
Activity:
1. At SKSU-Tacurong Campus, upon the release of the CHED allowance of the
scholars, Twenty scholars pledge the following donations in pesos to the
PPA of nurses during the pandemic.
50 100 75 40 60 50 200 25 30 35
50 45 60 100 200 200 150 150 100 45
Compute the range ,MAD, variance, standard deviation,and Coefficient
variation.
Show the formula and solution. Check your computation using the
calculator program.
2. Calculate the population and sample standard deviation of the following
scores:
15 15 25 13 32 30 23 26 65 45 44
Overview :
This special topic is included to provide health science students especially student in
nursing program and biology students conducting health related research with basic
information on essential health care statistics, the use of statistical formulae, and the
interpretation of statistical calculations for the analysis of patient health condition.
Objectives:
1. To identify the parameters in the health research.
2. To determine the data sampling and collection in health research.
3. calculate and process health data for analysis.
Overview:
Health statistics data are data that are collected from hospital in-patients and out-
patients and they are recorded by those who work in the health care industry.
Quantitative research guides health care decision makers with statistics--numerical
data collected from measurements or observation that describe the characteristics
of specific population samples. Descriptive statistics summarize the utility, efficacy
and costs of medical goods and services. Increasingly, health care organizations
employ statistical analysis to measure their performance outcomes.
Objectives:
1. define the health data
2. identify the vital statistics
3. determine the importance of the vital statistics.
Content:
Morbidity and Mortality
Morbility pertains to disease cases and it is obtained to supply data on the
occurrence of disease.While,
Mortality refers to the death cases to provide data on the occurrence of death.
Demographic Variables
While health care is concerned with human health conditions, characteristics of
human population must be studied. Data which describe the human population such
as age, gender, income, and health status need to be considered in health care
analysis. These variables are referred to as demographic variables. The size of the
human population is also considered as a demographic variables and how it changes
over time.
Vital Statistics
Data which show significant records of events and dates a human life are vital
statistics. A few vital statistics that are important to be recorded are birth, death,
marriage, mortality and morbidity.
Data Requestors
Health statistics data are also important to the hospital administration to determine
and assess the quality of service rendered by its staff to the patients. The following
are the usual requestors;
1.administation and governing board
2. medical staff
3. outside agencies – DOH,LGU, Researchers from academe, pharmaceutical,
economist, and others
4. other organization-
The ten great public health achievements identified by the CDC only made possible
by health statistics and research are:
1. Routine immunization of children
2. Motor-vehicle safety
3. Workplace safety
4. Control of infectious diseases
5. Declines in deaths from heart disease and stroke
6. Safer and healthier foods
7. Healthier mothers and babies
8. Family planning
9. Fluoridation of drinking water
10. Recognition of tobacco as a health hazard
Two major achievements in 21st century made possible by health statistics and
research are:
Personalized medicine: In personalized medicine, an individual's genetic
profile and his or her unique biochemistry are used to customize treatment.
For example, which medicines are likely to provide the best results with the
fewest side effects?
Disease modification: If a person is diagnosed early enough, it might be
possible to inhibit the disease so that it never debilitates the person..
The other benefits of health statistics and research are in the fields of prolonging life;
preventing diseases – identifying lifestyle risk factors; preventing infectious diseases
– vaccines and randomized controlled trials; preventing disabilities; access to health
services and lifestyle and understanding cultural norms and working around it.
Summary:
Health statistics data must be carefully handled with full knowledge. These must be
properly process and analyze to come up a valid truth for success result and
purpose. Hospitals and other large provider service organizations implement data-
driven, continuous quality improvement programs to maximize efficiency.
Government health and human service agencies gauge the overall health and well-
being of populations with statistical information. Researchers employ scientific
methods to gather data on human population samples. The health care industry
benefits from knowing consumer market characteristics such as age, sex, race,
income and disabilities. These "demographic" statistics can predict the types of
services that people are using and the level of care that is affordable to them.
Activity:
1. Where can a researcher collect vital health statistics of patients positive of
Covid virus in region 12.
2. Suppose you are researching on the medicines effective against pneumonia,
Give at least 5 possible data you will gather in the hospitals to satisfy your
research problem.
3. Give the importance of demographic health profiles.
Overview:
Health care facilities device and use rates to determine the percentage of an event.
The formulae in the following section are based on the rate formula where n is the
number of times something happens and N is the number of times it could have
happened.
Objectives:
1. Compute the death rates and morbidity rates.
2. Identify the data needed for death and morbidity rates.
3. Analyze the implication of the result of the computed rates.
Content:
Death or Mortality rate maybe classified as gross or net death rate (GDR). Gross
death rate represents the death
Rate including all death while net death rate (NDR) represents death rate excluding
death under 48 hours after
Admission. The following formulae are used.
number of deaths
a. GDR = x 100%
total number of discharges
The formula below is the formula applied to determine the Newborn Death rate
NBDR)
Ex. The following data was obtained from MCU-FDTMF. Calculate the GDR and NDR
and NBDR.
Admission discharge Death
<48hrs ¿ 48 hrs
Adult/children 285 301 2 13
Newborn 12 19 1 3
Solution:
19
GDR= x100 = 5.94%
320
19−3 16
NDR = x 100 = x 100 = 5.05%
320−3 317
Thus GDR is 5.94% for every 100 population.
4
NBDR = x 100= 17.39.
23
Hence, 17.39% is the rate of infant mortality.
Morbidity rate
Morbidity pertain to disease. Morbidity rate could be measure or calcuted in four
ways according to its prevalence, incidence, complications and fatality.
a. Prevalence rate
Prevalence rate is the ratio between cases of known disease and the entire
population.
number of cases
Prevalence= x100%
population
Example
A total of 42,325 tuberculosis cases are known in the country. Compute for the
prevalence of the disease if the current Philippine population is 87 millon.
Solution:
42,325
Prevalence= x100% = .0486% , Therefore the
87,000,000
prevalence of tuberculosis in the country is .0486% per 100 population or
4.86%per 10,000 population.
b. Incidence Rate:
If the prevalence refers to the existence of a known case of a disease. Incidence
Rate refers to the rate newly reported cases of disease.
Of the 87 millon population of the country in the recent year, 150,000 newly
reported cases of diabetes mellitus were reported to the Health department.
Determine the incidence of diabetes mellitus.
Solution:
150,000
Incidence = x 100% = .001724% , the incidence of
87,000,000
diabetes mellitus is 1.724 per
100,000 population.
c. Complication Rate
Complication refer to a disorder which resulted after admission and modifies the
patient’s condition. Medical malpractice may result to a complication or even
death to a patient.
complication cases
Complications = x 100%
population at risk
In January 2006, 3 out of the 62 cancer patients had undergone a surgical
procedure due to complications, Calculate the complication rate.
Solution:
3
Complications = x 100% = .0484% The complication rate ,
62
therefore , is 4.84% for every
10,000 population.
d. Fatality Rate
Fatality rate is the rate of death cases due to a particular disease.
Example: If there were 75,895 hypertension cases reported in the year 2000 and
3,612 died, what is the fatality of hypertension that year?
Solution:
3,612
Fatality = x 100% = 4.76%. The data revealed fatality
75,895
of 4.76% per 100 population
due to the disease
hypertension.
Summary:
1.If you cannot measure it…You cannot improve it - Meaningful quality
improvement must be data-driven.
2. Managed care means managing the processes of care, not managing
physicians and nurses.
3. The right data in the right format at the right time in the right hands- If
clinicians are going to manage care,
they definitely need data. They need the right data delivered in the right format
at the right time and in the
right place. And the data has to be delivered into the right hands—the clinicians
involved in operating and
improving any given process of care.
Activity:
(Note: the data used in the problems were just for computation purposes only)
Calculate:
a.Gross Death Rate
b. Net Death Rate
c. Fetal Mortality
2. A total of 12,000 Filipinos were diagnosed with HIV in 2000 of which 4,500
are females and the rest are males. If the Philippines population at that time
was 85 million, determine the following:
a. Prevalence of HIV for every 1,000.
b. Prevalence of HIV in males for every 1,000.
c. Prevalence of HIV in females for every 1,000
Overview:
Testing the significance of difference between two means, between two standard deviation,
two proportions, or two percentages , is an important area of inferential Statistics.
Comparison between two or more variables often arises in research or in experiments and
to make valid conclusions regarding the result of the study, one has to apply an appropriate
test statistic. This chapter deals with the discussion of the different test statistics that are
commonly used in research studies.
Objectives:
1. formulate hypothesis.
2. discuss the level of significance probability of committing an error.
3. Compute the z and t- testing
4. Analyze and interpret the result of statistical testing.
Overview:
A statistical hypothesis is a preconceived idea about the value of a population parameter
which can be validated or verified through statistical procedure or tests. It is an assertion,
presumption, or tentative theory which aims to explain facts about the real world. In
attempting to reach decisions, it is advantageous to make assumptions about the target
populations. Such assumptions, which may be correct or not are called statistical
hypotheses.
Objectives:
1. Formulate the null and alternative hypothesis.
2. Differentiate the type 1 and type II error.
3. Identify the steps in hypothesis testing
4. Discuss the three types of alternative hypothesis.
Content:
Alternative hypothesis- any hypothesis that differs from a given null hypothesis is called an
alternative hypothesis, denoted as Ha, sometimes it is considered as the researcher’s
working hypothesis.
For example: If Ho: p=0.5, alternative hypothesis might be Ha: P≠ 0.5 or Ha: p¿ 0.5 or Ha:
p¿0.5.
Alternative hypothesis is denoted by Ha. Rejection of the null hypothesis leads to the
acceptance of the alternative hypothesis.
1. Type I error or alpha error (α). A type I error is committed when the
researcher rejects a null hypothesis when in fact it is true.
2. Type II error or beta error (β). A type II error I committed when the
researcher accepts a null hypothesis when in fact it is false.
Level of Significance
When a researcher tests the hypothesis, he is not certain that the decision is 100%
correct. However, he is confident at a certain level that the decision is correct, say 99% of
the decision he made is a correct one. The confidence level is 99% or the level of significance
is 1%. When the confidence level is 95%, the level of significance is 5%. On the other hand,
when the confidence level is 90%, the level of significance is 10%. In this case, the higher the
confidence level, the more certain that the decision od rejecting the null hypothesis is
correct.
Level of significance is the probability of committing a Type I error or alpha (α) error
or the probability of rejecting the correct null hypothesis.
Power of a Test
Power of a test is the probability of not committing a Type II error or beta (β) error.
Test Statistics
The test statistic is a mathematical formula that allows researchers to determine the
likelihood of obtaining sample outcomes if the null hypothesis were true. The value of the
test statistic is used to make a decision regarding the null hypothesis. The test statistic is
used as a basis for deciding whether to reject or accept the null hypothesis. The rejection lies
at either the left or right tail of the normal curve of one-tailed test is being used. On the
other hand, the rejection region lies at both end tails of the normal curve if two-tailed test
Non-Rejection
Regionẋ Rejection Region
will be utilized.
Rejection Region. When the test statistics lies on the rejection region, then
the null hypothesis will be rejected.
Non-Rejection Region. The non-rejection region is the probability of making a
Type I error equals to the level of significance. Non-rejection region is also
known as the acceptance region. When the test statistic lies within the non-
rejection region, the null hypothesis will be accepted
or the critical value is greater than the computed value of the test statistic.
Critical Value. The critical value is a value that separates the non- rejection
region and the rejection region.
The use of one-tailed test or two-tailed test will depend on how the alternative is
formulated. If the alternative hypothesis is expressed in non- directional, it will utilize the
two-tailed test. However, use the one-tailed test if the alternative hypothesis is directional.
In two-tailed test, the two rejection regions lie at both end tails of the normal curve; each
part will be half of the alpha value. If α = 0.05, the area at both end tail is α = 0.025. in one-
tailed test, the rejection region lies either at the left or right end tail of the normal curve.
Non-Rejection Region
1-α
Rejection
Region α
ẋ Z = 1.645
Critical Value
Ha: M¿Mo
Non-Rejection
Region
1-α
Ha: M≠Mo
Non-Rejection
Region, α-1
Rejection Region, α
Z = 1.645 ẋ
Ha: M< Mo
1. State or formulate the null hypothesis (Ho) and the alternative hypothesis (Ha).
2. Specify the level od significance (α) to be used. The level of significance is the statistical standard
which is specified for rejecting the null hypothesis (Ho). If there is 5% level of significance is
used, there is probability of 0.05 of rejecting the Ho when it is true. The most frequent used
level of significance in hypothesis testing are the 5% and the 1% level.
3. Select the most appropriate test statistic or statistical tool. There is specific statistical tool or test
statistic that is appropriate for each kind of statistical hypothesis. Identify also the type of
statistical test as either one –tailed test or two-tailed test depending how the alternative
hypothesis is being expressed.
4. Compute the actual value of the test statistic from the sample data (i.e. z-test or t-test or F-test,
etc.
5. Establish the critical (rejection) region or the tabular value for the selected test statistic from the
statistical table based on the degree of freedom (for t-test and F-test) and level of
significance(α). Take note the type of statistical test to be used whether it is a one-tailed test or
a two- tailed test as elaborated in step 1 and 3.
6. Making decision, conclusion and recommendation/s.The computed or observed value of the
sample statistic is compared with the tabular or critical value (or values) of the test statistic. This
is the basis whether to accept or reject the null hypothesis. Accepting the Ho implies rejecting
the alternative hypothesis (Ha), in like manner, rejecting Ho means accepting the Ha. Given
below are guidelines in making a decision for a given null hypothesis:
6.1. Reject the null hypothesis (Ho) if the computed value is greater than or equal (≥) to the
tabular value.
6.2. Accept the null hypothesis (Ho) if the computed value is less than (<) the tabular value.
Making conclusion and recommendation are the last part in hypothesis testing. At this point, the
researcher will explain his decision based on the result of his statistical analysis.
Summary:
Interpreting the outcome of the research may not just end by simply saying the null hypothesis is
accepted or rejected.
It is the primary obligation of the researcher to further explain the implication of the result and drawing
conclusion by answering the original problem and to make the necessary recommendation, in some
Type equation here .instances, this should be supported by related review of literature.
Activity:
A. Formulate the Null hypothesis (Ho) and 3 possible Alternative hypothesis (Ha) of the following
problems and identify the hypothesis test is left, right or non directional. Illustrate the rejection
region and indicate the % of acceptance.
1. A doctor wants to know if the average recovery time of a patient taking a particular
medication is one month. Consider a 5% level of significance.
3. A drug store wants to know if the average sale of paracetamol is more than 100 per day.
Lesson 5.2. z- test : Testing Hypothesis
Overview:
A Z-test is a type of hypothesis test—a way for you to figure out if results from a test are valid or
repeatable.
For example, if someone said they had found a new drug that cures cancer, you would want to be sure it
was probably true. A hypothesis test will tell you if it’s probably true, or probably not true. A Z test, is
used when your data is approximately normally distributed (i.e. the data has the shape of a bell
curve when you graph it).
Objectives:
Content:
Z-test on the Comparison Between the Population Mean and Sample Mean
A significance test can be applied to test whether a mean based on the sample size n, differ
significantly, or otherwise, from a population mean, μ. The one sample z-test is a statistical test for the
mean of a population and can be used when the following requirements are satisfied:
1. When we want to test significant difference between the population mean (μ) and the sample
mean.
4. When the population variance (σ2) or population standard deviation (σ) is known. However, if
the population standard deviation is not known, a z- test is still applicable provided that the
sample size is sufficiently large (n>30) and the distribution of the sample data are normally
distributed.
The tabular values or critical values of = z is obtained from the following table
Example:
A company , who makes children’s battery- operated toy cars, claims that its products have a mean life span of 5
years with a standard deviation of 2 years. Test the hypothesis that μ is not equal 5 years against the alternative
hypothesis that μ≠ 5 years if a random sample of 40 toy cars is tested and found to have a mean life span for only
3 years. Use 0.05 level significance.
1. H0 : The mean lifespan of the battery- operated to cars is 5 years
(H0:: μ=5years)
Ha : The mean lifespan of the battery- operated to cars is not 5 years
(Ha:: μ≠ 5years)
√
2. α = 5% or 0.05 ; two-tailed
3.Use z-test as Test statistics
4. Computation:
Given: ˉ x = 3
μ=5
σ=2
n = 40
ˉ x−μ
ˉ x−μ
Zc = Z0 =
σ √
σ or n
√n
3−5
Zc = /√ 40
2
=--6.32
6. Decision; Since the computed value ( Zc=6.32) > Ztab= ±1.96 ;Reject the Ho: μ=5years at
σ=5% level of significant (two-tailed test), and accept the alternative hypothesis. Thus,
there is enough evidence to accept the fact that the mean life span of the toys is not equal
to 5 years.
z-test: Testing the Differences between Two means (large Independent samples)
ˉx 1−ˉx 2
Zc =
√ σ 12 ˉ σ 22 ; if population standard deviation are known
n1
+
n2
ˉx 1−ˉx 2
Zc =
√
2 2
s1 ˉ s2 if population standard deviation (σ) are unknown but n> 30.
+
n1 n2
Example:
A tissue culture propagator of rare cardboard ornamental plant wants to validate the claim that there is
no significant difference in the mean survival of cardboard in culture media A and culture media B. He
randomly selected 55 samples in each media. The mean survival and standard deviation are shown in
table below. At 1% level of significance, determine if there is enough evidence to reject the claim of no
difference in the mean survival in media A and media B.
Media A Media B
ˉ x= 94 ˉ x=¿ 90
n1 n2
Step 1. Ho: ˉx 1=ˉx 2 (Claim) ; There is no significant difference between the mean survival of media A and
Media B.
. Ha: ˉx 1≠ ˉx 2: There is a significant difference between the mean survival of media A and
Media B.
Step 2. The level of significance is α =1% level.
Step 3. Z-test (two sample case or two means). This is a two-tailed sided test (. Ha: ˉx 1≠ ˉx 2 ) .
Step 4. Compute the test value using the formula for an unknown population standard deviation (σ) with n>30.
ˉx1−ˉx 2
Zc =
√ s 12 ˉ s 22
+
n1 n2
94−90
Zc =
√ 4.22 ˉ 3.82
55
+
55
= 5.24
Step5. Determine the rejection or critical value from table 5.2.1. Take note that this is two
tailed test so, the level of significance should be divided by 2(α/2).
Therefore, we conclude that there is significant difference between the survival of cardboard
not true.In this case , the mean survival of cardboard tissue culture using media A is significantly
higher than using tissue culture media B.
Summary:
1.When we want to test significant difference between the population mean (μ) and the sample
mean.
4. When the population variance (σ2) or population standard deviation (σ) is known. However, if
the population standard deviation is not known, a z- test is still applicable provided that the
sample size is sufficiently large (n>30) and the distribution of the sample data are normally
distributed.
1.The mean yield of rice per hectare in Mindanao was established as 4 tons with a standard deviation of
350 kgs. A group of agriculture students from the college of agriculture of a certain SUC claims that the
mean harvest this year is less due to unfavorable weather conditions. A sample of randomly selected
100 hectares averages 3,750 kgs per hectare. Test the hypothesis that the mean yield this year is no
different from then established mean using 1% level of significance. Assume that the population is
normal.
2. Suppose that the standardized test for College Biology exists with a mean of 125 and standard
deviation of 8. A random sample of 40 college students from a normal population takes this
standardized test, and the resulting mean is 121. Do the randomly selected students perform below the
normal group? Use alpha 1% level.
Overview:
A t-test is a type of inferential statistic used to determine if there is a significant difference between the
means of two groups, which may be related in certain features. A t-test is used as a hypothesis testing
tool, which allows testing of an assumption applicable to a population.
Objectives:
Context:
t-test: Testing the Differences between the Population Mean and the Sample Mean .
In applying t-test for comparing two means, certain requirements or assumptions should be satisfied
such as:
5. The population standard deviation (σ) is unknown ( hence, the sample standard deviation is
used instead).
t-test formula:
(ˉx −μ)
t= s
√n
Ex. The average length of time for people to vote using the old procedure during the
presidential election in precinct A is 55 minutes. Using computerization as a new election
method, a random sample of 20 registrants was used and found to have a mean length of
voting time of 30 minutes with standard deviation of 1.5 minutes. Test the hypothesis in which
the population mean is greater than the sample mean in 5% level of confidence.
Solution: 1. Ho: ˉx=¿μ (There is no significant difference in the voting length of traditional and
computerization method.)
Ha: ˉx<¿ μ (The Traditional method of voting have longer length of voting than the
computerization method)
2. α = 5%; one-tailed
(ˉx −μ)
4. Computation: t= √n
sd
Sample mean= 30 min.
μ = 55 min.
s d= 1.5 min.
n = 20
(ˉx −μ) (30−55)
t= √n t= √ 20 tcomputed= 29.82
sd 1.5
5.Determine the critical value or the tabular value (from percentage points of the t distribution table)
df=n-1; df=20-1=19, one-tailed in 5% level of significance.
ttab = 1.729 or refer as t- critical value
6.Decision rule:
Since the absolute value of the computed t (tc = 29.82) is greater than the absolute value of t-critical value,
which is ttab0.05,df=19 =29.82, therefore Reject the Ho and accept the Ha.
Statistical analysis: There is significant difference between the means.
When the two samples are drawn from normally distributed populations with the assumption that their
variances are equal, the t-test with the following formula will be used:
ˉx1−ˉx 2
t= √¿ ¿ ¿
where:
ˉ x 1 , ˉ x2 - means
n1, n2 - sample sizes
2 2
s1 , s2 -variances
Example:
A course in physics is taught to 10 students by the traditional method. Another group of 11 students was given
the same course by means of another method. At the end of the semester, the same test was administered to
each group. The 10 students under method A made an average of 82 with a standard deviation of 5, while the 11
students under method B made an average of 78 with a standard deviation of 6. Test the null hypothesis of no
significant difference in the performance of the two groups of students at 5% level of significance.
Solution:
1. Ho: ˉ x 1 ,=ˉ x 2 , (There is no significant difference between the average scores of the two groups of
students)
Ha: ˉ x 1>ˉ x 2 , ( The mean score of the first group is higher than the mean score of the second group)
2. Α = 5%; one- tailed
3. Use the t- test as test statistics.
4. Computation:
ˉx−ˉx
t= √¿ ¿¿
82−78
t=
√¿¿¿
4
t= =1.65
2.4245
5. Df=n1 +n2 -2 , df= 10=11-2= 19 ; ttab.05,df=19 = 1.729
6. Tc=1.65 > t tab = 1.729, Since t tabular value is higher than the tcomputed, Null
hypotheis (Ho)is accepted.
Analysis: The difference between the means is not significant at 5% level of significance . It implies that method
A is as effective as method B.
d
n
t= sd
√n
n= number of observations
Example:
To determine whether the student’s performance in college algebra will improve after enrolling in the subject
for one term at i% level of confidence, a 60 item pre-test and post test are administered to them on the first day
and last day of classes respectively, the same test is given as pre-test and post-test. The are as follows:
Student Pre-test score Post-test Score Difference, d d2
A 34 45 -11 121
B 23 32 -9 81
C 40 46 -6 36
D 31 57 -26 676
E 24 39 -15 225
F 45 48 -3 9
G 27 27 0 0
H 32 33 -1 1
I 12 18 -6 36
J 45 45 0 0
2
∑d = 77 ∑ d =1185
Solution:
2. α = 1 % ; one- tailed
3. t-test will be used
4. Computations:
d
n
t= sd
√n
n ( ∑ d 2 )− (∑ d ) 2
Sd2 =
n(n−1)
10 (1185 )−(−77 ) 2
= =65.79
10 (9)
Sd = 8.1111
√n = 3.33
−7.7
t= 8.1111 = 3.16
3.33
6. Reject the Ho, because the computed value (tc=3.16) is greater than the
tabular value (ttab =2.821). And accept the Ha.
Summary:
5. The population standard deviation (σ) is unknown ( hence, the sample standard deviation is
used instead).
Activity:
1. The data below are assumed to be the result of an experiment on the culture of mud crab Scylla
serreta in the fishpond with and without pellets as supplemental feed. Find out if there is
significant difference in kilogram weight at 5% level of significance. Note: compute the standard
deviation of without pellet (control )and with pellet ( experimental group ). Complete the table
and follow the 6 steps in hypothesis testing of t-test. Consider the formula of sd below.
Control Sd=
√ ∑(x−ˉx)2
n−1
Experimental Sd=
√ ∑(x−ˉx)2
n−1
2. Use t-test to determine if there is a significant difference between the scores of the students in
Statistics in the pre-test and post test. Use 1% level of significance.
Method
Scores
Pre-test 4 4 4 5 5 5 5 5 5 4
5 4 0 6 5 2 1 0 3 7
Post test 4 5 5 5 6 6 5 5 5 5
8 5 1 9 4 2 4 2 4 0
Chapter V1. ANALYSIS OF VARIANCE
Overview:
The analysis of variance (ANOVA) is a method for dividing the variation observed in
experimental data into different parts; each part assign able to a known source, cause,
or factor. If investigator relates different parts of the variation to particular causal
circumstances, experiments must be designed to allow this to occur in a methodical
manner. The analysis of variance is inextricably associated with the design of the
experiments. It is used to determine the significance of the difference between the
means of a number of different populations. Analysis of variance is the most common
statistical tool used in experimental researches.
Objectives:
Overview:
A one- way ANOVA F-test involves one independent variable as a basis for classification.
This is usually applied in complete randomized design ( CRD ). Experiments where only a
single factor varies while all others are kept constant are known as single-factor or one-
way classification experiments. In such experiments, the treatments consist solely of
the different levels of the single variable factor
Objectives:
Content:
d. Experimental Design- includes the plan and the actual procedure of laying out the
experiment The three basic principles involved in designing experiments. They also
known as the elements of Experimental Design
2. Two-factor experiment
3. Three-factor experiment
CRD is one whire the treatments are assigned completely at random so that each experimental
unit has the same chance of receiving any one treatment. For the CRD, any difference among
experimental units receiving the same treatment is considered as experimental units, such as laboratory
experiments where environmental effects are relatively easy to control.
. acturer of bicycle tires has developed a new design that he claims has an average life span of 5 years with
a standard deviation of 1.2 years. A dealer of the product claims that the average life span of 150 samples of the
tires is only 3.5 years. Test the difference of the population and sample means at 5% level of significance.
Solution: follow the steps of hypothesis testing
Signicance level
To test the null hypothesis, one must set the level of significance first. The level of significance is the probability of
making type 1 error. And it is denoted by the symbol α. Type 1 error is the probability of accepting the
alternative hypothesis Ho when in fact the null hypothesis Ho is true. The Type II error and it is denoted by the
symbol β . The most commonly used level of significance is 5%.
Since the alternative hypothesis Ha is formulated to be different from the Ho , then we can consider these three
types of alternative hypothesis
One- tailed
a. Ha: M¿ Mo
(one-tailed/ sided test to the right
Non-Rejection
Region
Rejection Region
ẋ Z = 1.645
Step 1: Type your data into Excel, in a single column. For example, if you have ten
items in your data set, type them into cells A1 through A10.
Step 2: Click the “Data” tab and then click “Data Analysis” in the Analysis
group.
Step 5: Check the “Labels in first row” check box if you have titled the
column in row 1, otherwise leave the box unchecked.
Step 6: Type a cell location into the “Output Range” box. For example, type
“C1.” Make sure that two adjacent columns do not have data in
them.
Step 7: Click the “Summary Statistics” check box and then click “OK” to display Excel
descriptive statistics. A list of descriptive statistics will be returned in the
column you selected as the Output Range.
Example:
The following examples will illustrate how MS Excel will help you resolve exercises in a very
convenient way.
Step 2 & 3
Click “OK”
Mean 81.35
Standard Error 1.973875429
Median 80.5
Mode 90
Standard Deviation 8.827439278
Sample Variance 77.92368421
Kurtosis -0.88771227
Skewness 0.02342738
Range 32
Minimum 65
Maximum 97
Sum 1627
Count 20
You can use other statistical functions to solve other problems like z-test, t-test,
correlation, regression, and many more.