Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 74

Chapter I Basic Concept in Statistics

Overview:
Statistics affects many facets of our lives. In every life, whether at home or at work,
we usually keep records and read reports. An item is a record, or report is a fact that
expressed in terms of a numerical value or described by its quality or kind. The single item
or fact is referred to as a datum, such as color of the leaves, the number of students in the
class, the height and width and the number of bacterial colonies are all example of data.
And how to deal with it is the major concern of statistics.
Objectives:
1. Define biostatistics and identify its importance.
2. Explain the methods of collecting statistical data and variables.
3. Discuss different sampling techniques

Lesson 1.1. Biotatistics and its Importance


Overview:
For most people the word “statistics” is a scary thing that must be avoided as much as
possible. They think of statistics as collection of numbers and formulas that have vague
meanings. Actually, without noticing it, people often apply statistics in their everyday life.
When a clinician records the result of a physical examination of the patient, he is collecting
data to aid the physician in diagnosing the patients’ illness and to determine the appropriate
medical treatment to be prescribed to the patient.
Objectives:
The students should be able to:
1. Define statistics and biostatistics
2. Discuss the inductive and deductive reasoning in medical diagnoses.
3. Explain the scientific methods employ in medical research.

Content:
Statistics is a science that deals with the collection, organization, analysis, interpretation and
presentation information that can be stated numerically.
Major areas of Statistics:
1. Descriptive Statistics- this includes anything done to the data which is designed to
summarize or describe, without going any further; that is without attempting to infer
anything that goes beyond the data themselves.
2. Statistical Inference- comprises the methods concerned with the analysis of a subset
of data leading to predictions or inferences about he entire set of data.. analysis
requires the generalization which go beyond the data.
Biostatistics Is statistics applied to the biological sciences.
Perhaps the most difficult of statistics is the logic associated with inductive Inferences, yet all
scientific evidence is based on this type of statistical inference. The same logic is used, though not
always explicitly, when a physician practices medicine: what is observed for a large group of patients
to make a specific decision about that particular patient.

When taking a clinical history, conducting a physical examination, or requesting laboratory analyses,
radiographic evaluations or test, a physician is collecting information (data) to help choose
diagnostic and therapeutic actions. The decisions reached are based on knowledge obtained from
training, from literature, from experience, or from some similar sources.
General principles are applied to specific situation at hand in order to reach the best decision
possible for a particular patient. Much of the basic medical training centers around deductive
reasoning
This type of reasoning- from the general to the specific- is called deductive reasoning.

We conduct experiments and comparative studies to focus on questions that arise from our work.
We study few patients ( or experimental animals), and from what we observe we try to make
rational inferences about what happens in general. This type of reasoning- from the specific
subject(s) at hand to general. This type of reasoning is called Inductive Reasoning. This approach to
medical research- pushing back the bounds of knowledge concerning human health- follows what is
known as the Scientific Method, which has four basic steps.
1. Making observation………i.e.., gathering data
2. Generating a hypothesis………the underlying law and order suggested by the data
3. Deciding how to test the hypothesis……what critical data required?
4. Experimenting ( or observing) –this leads to an inference that either to rejects or affirms the
hypothesis. If the hypothesis is rejected, then we go back to step 2.
If it is affirmed, this does not necessary mean it is true, only that in the light of current
knowledge and methods it appears to be so. The hypothesis is constantly refined and tested
as more knowledge becomes available.

All data collected from biological system have variability, the statistician is concerned with
summarizing trends in data and drawing conclusions in spite of the uncertainty by variability in the
data. An understanding of statistics will enhance your ability to interpret data, whether for the
purpose of treating a particular patient or for drawing general conclusions from a research study, as
well as enable you to distinguish fact from fancy in everyday life.

Summary:

Biostatistics deals with the collection, organization, presentation, analysis and interpretation of
biological information that can be stated numerically.

Activity:

A. Suppose that a set of measurement represent the total rainfall in the province of Sultan
Kudarat during the month of July has been recorded for the past 15 years. Any values
describing the data.
Write descriptive or Inferential statistics in the following value based on the data above.
1. The average rainfall within 15 yrs is 3.0 cm.
2. For 15 years , Month of July have rain.
3. Next July we expect a rain.
4. This July 2021 we will expect between 3.2 and 3.4 cm of rain.

B. 1.Decide what reasoning must be employ in the situation below in order to give diagnoses
and treatment. Discuss why?
a. Stroke patient
b. Yellowing of leaves of your potted plant.
c. Swelling of gums and painful tooth.
C. Differentiate the following:
1. Statistics and Biostatistics
2. Deductive and Inductive reasoning
3. Descriptive and Inferential statistics

Lesson 1.2. Statistical data and Variables

Overview:

The basic unit of statistical analysis is data. There are generally two types of data and there is no
formula for selecting the best method to be used in gathering data. It depends on the researcher’s
design of the study, the type of data, the time available to complete the study, and the financial
capacity.

Objectives:

1. Identify the types and kinds of data.


2. Explain the methods in collecting data.
3. Discuss the types of variables.
4. Determine the scales of measurement.

Content:

Classification of Data

1. Quantitative Data-data that can be expressed in numbers. These are the things that can be
measured like weight, length, number of colonies, mortality rate and etc.
2. Qualitative Data- are facts for which no numerical measure exists. They are usually
expressed in categories or kind. Example are color of the skin, which could be black, brown
or white; a person’s sex, which is male or female; It may be presence or absence of metallic
sheen in the colony of the bacteria; and others

In order to assure the accuracy of data, one must know the right sources and methods of collecting
them.

Types of data according to sources

1. Primary Data- it refers to the information which are gathered directly from an original
source, or which are based on the direct or firsthand experiences.
2. Secondary Data- refer to the information which are taken from published data which are
previously gathered by other individuals or agencies or data which comes from other
sources other than the respondents.

Methods of Collecting Data.

1. Interview Method- person to person exchange between the interviewer and interviewee.
2. Questionnaire Method- written response are given to prepared questions. A questionnaire is
a list of questions which are intended to elicit answers to the problem of a study.
Questionnaire may be mailed, send online or hand carried.
3. Registration Method- method of gathering information is enforced by certain laws. Examples
are the registration of birth, deaths, motor vehicles, marriages and licenses.
4. Observation method- the investigator observes the behavior of persons or organisms and
their outcomes. This is usually used when the subjects cannot talk and write.
5. Experimental Method- this method is used when the objective is to determine the cause and
effect relationship of certain phenomena under controlled condition. Scientific researchers
usually use the experimental method.

Collected data must be organized in order to show significant characteristics. They can be
presented in three forms

1. Textual- when data is presented in paragraph


2. Tabular – data is presented in rows and columns.
3. Graphical – data is presented in visual form.

Kinds of graphs
a.Bar graph
b. Pie graph
c. Line graph

Variable is a numerical characteristic or attribute associated with the population being studied.

Types of Variables
1. Categorical or qualitative variables are classified according to some attributes or categories
Ex. Gender, religion, blood type, civil status…
Categories may be ordered which may or may not assigned specific numerical values
such as: Performance Rating ( poor, fair, good, very good, excellent). IQ score ( low, average,
high)
2. Numerical – valued or quantitative variables are variables that are classified according to
numerical characteristics such as height, age, pulse rate, number of children,
speed.Numerical-valued variables are often grouped into class intervals.
Ex. Age in year- 5-9, 10-14, 15-19 and 20& above.
Height in cm- 100-149, 150-199, 200-249

Numerical-valued variables are classified as:


1. Discrete – is a variable whose values are obtained by counting.
Ex. Number of children, number of persons with blue eyes, number of patients with T.B.,
Number of males and females in a Statistics class.
2. Continuous – is a variable whose values are obtained by measuring such as temperature,
distance, area, density, age, height. All of which cannot be put into a list because they can
have any value in some interval of real numbers.

Scales of Measurement

In selecting the statistical tool to be used for drawing inferences on a random sample, the type of
measurement scale must be carefully chosen. Measurements are classified into four.

1.Nominal scale - is a measurement scale that classified elements into two or more categories or
classes, the numbers indicating that the elements are different but not according to order or
magnitude.
Ex.
Table 1. Distribution of Medical Students of University of the Philippines Grouped According to Race
And Civil Status
Race Single Married Widow/er Separated Total
American 10 5 0 1 16
Chinese 29 8 5 10 52
Japanese 18 11 1 3 33
Filipino 32 3 4 20 59
Total 89 27 10 34 160
The medicals are classified according to race and civil status.

2.Ordinal Scale - is a measurement scale that ranks individuals in terms of the degree to which they
possess a characteristic of interest.
Ex.
Table2. Anxiety Level of Patients with Mental Disorder on Hospital Q.
Sex 0 1 2 3 Total
Male 9 16 2 1 28
Female 21 10 4 7 42
Toatal 30 26 6 8 70
Legend: 0 = not anxious
1 = low anxiety level
2 = moderate anxiety level
3 = high anxiety level
3. Interval Scale – Interval is a measurement scale, in addition to ordering scores from high to
low. It also establishes a uniform unit in the scale so that any equal distance between two
scores is of equal magnitude. Aptitude scores from 80 to 90 are of equal difference as
aptitude scores from 90-100 ( both being equal to 10.)

4. Ratio Scale – Ratio is a measurement scale in addition to being an interval scale, that also has
absolute zero in the scale.

Summary:
SCALE of Measurement

Each number represents a category


Nominal
Ordinal
Greater than and less than relationships

and Units of measurement Interval

and and Absolute Zero Ratio

Application:

A. Evaluate the data below write qualitative or quantitative.

1. 25 ft.
2. Medium size
3. 30%
4. 6 meter
5. 4 colonies
6. Male
7. Absent
8. 100 seeds
9. Blue eyes
10. 500 acre

B. Write Primary or secondary data .

1. number of public vehicles in the city of Tacurong.


2. Enrollees of SKSU from 2010 to 2020
3. Information from diary
4. Data from the Daily Inquirer
5. TB patients of St. Louise Hospital from Jan. to June 2020.
6. information from police investigator
7. Information from the victim of accident.
8.Response from your respondent
9. data from the State of Nation Address of the Philippines.
10. data from the research journal.

C. Write D if discrete and C if continuous

1. Number of foreigners migrating to the Philippines


2. Length of hair
3. Boiling point of water 1000C
4. John’s height is 160cm
5. number of children in Brgy Sebu with missing tooth.
6. Average speed of UB express along National High ways.
7. Number of online students present in zoom meeting.
8. number of leaves affected by leaf rot.
9. leaf width and length
10. Number of vaccinated Filipino.

D.Write the advantages and disadvantages of each method in collecting da


Lesson: 1.3 Sampling Techniques

Overview:
Analysis of data in research work requires that the number of population should be determined and
specified if possible, so that the required sample size can easily be calculated based on sampling
techniques and research designs. If the population is small, it is sometimes convienient to obtain the
information by collecting the data for the whole of the population (total enumeration). However, if
the population is large, more time and money can be saved by measuring only a sample drawn from
the population. When the measurement is destructive, sampling is of course unavoidable for
obvious reason.

Objectives:
At the end of the lesson, you should be able to:
1. compute the sample size;
2. enumerate the different sampling methods;
3. identify the use of different sampling methods in data collection.

Content:

Population – is the group of all study units about which a particular investigation may provide
information. Population is denoted by “μ”
Target population – is the whole group of study units to which we are interested in applying our
conclusions.
Study population - is the group of study unit to which we legitimately apply our conclusion.
Sample – a subset or a representative part of the population; hence, the sample must possess the
same characteristics of the population. Sample size is denoted by “n”.
Sampling

Sample
Population Inference

Types of Sampling:
1. Non- Probability or Judgment sampling
Sampling is based on a judgment selection of “typical” or representative elements of the
population under study considering an arbitrarily set criteria.
1.1. Purposive Sampling – a sample is drawn from the population where what constitute the
representative elements or sample is already a preconceived idea.
1.2. Quota Sampling – sample is drawn for convenient and on the basis of a quota.
1.3. Sampling is done haphazardly
1.4. Sampling which involves volunteers
1.5. Convenience or Accidental sampling – Sampling where elements of the sample are those
that are readily accessible to the sampler.
2. Probability Sampling – Sampling with a definite set of rules and procedures for drawing the
sample is being followed. It allows one to evaluate the probability of each element to be
part of the sample, even prior to drawing the actual sample. Probability samples are suitable
to statistical analysis and scientific research.
2.1. Simple Random Sampling – sampling actually drawn from the the whole population,
without replacement and with equal probability of selection for every possible sample.
Methods of simple random sampling are:
a. The box method
b. Use of the table of random number
c. Use of computer software package of random number generated.
2.2. Systematic Sampling – a method of sampling wherein a sample is drawn by taking say
every K- the unit in the population starting from the ith unit drawn at random. This is
used when there is ready list of the total population. Most practical way of sampling.

2.3. Stratified Sampling – a sampling procedure wherein the population is divided into non
overlapping strata. These strata is homogeneous and a random sample is drawn
independently from each stratum. This scheme is used to that different groups of a
population are adequately represented in the sample.

2.4. Cluster Sampling – the total population is divided into a number of relatively small
subdivision and some of these subdivisions or clusters are randomly selected for
inclusion in the overall sample.

2.5. Multi-stage Sampling the technique uses several stages or phases in getting the sample
from the general population. However selection of the sample is still done at random. It
is useful in conducting nation - wide survey involving a large universe.

Determination of Sample Size (n)

Important criteria in determining the sample size (n).


1. Variability of the population(N)
2. Error will be tolerated / accepted. This is the desired precesion
3. Degree of confidence desired attached to the estimate of the parameter. That is, one needs
to specify the confidence coefficient, (1-α x 100%) desired.
4. Resources available to obtain the data and the time diration to produce output.
5. Safety/risk of the enumerators.

Sample size is advisable if the population is equal to or more than 100. But it is inapplicable to a
population less than 100. Total population or census is advisable for population less than 100 for
categorization purposes. To have a scientific determination of sample size, the formula below was
suggested by Calmorin and Calmorin(1997 )

Ss= NV + { S2 + (1-p) }
NSe + { V2 + p(1-p) }
Where:
Ss = Sample size
N = Total number of population
V = The standard value (2.58) of 1 percent level of probability with 0.99 reliability.
Se = Sampling Error (0.01)
P = The largest possible proportion (0.50)
For instance, if the total population is 500, the standard value at 1% level of probability is 2.58 with
99% reliability with a sampling error of 1% or 0.01, and the proportion of a target population is 50%
or 0.05; then the sample size is computed as follows:

Given:
N = 500
V = 2.58
Se =0.01
P = 0.50
Ss= NV + { S2 + (1-p) }
NSe + { V2 + p(1-p) }

Ss= 500(2.58) + { (0.01)2 + (1-.50) }


500(.01) + { 2.582 + .50(1-.50) }

Ss= 1290+ { (0.0001) (0.50) }


5 + (6.6564) (.50) (.50)

S= 193.57 or 194
The sample size of 500 is 194 which represents the subject of the study.

Summary:
In gathering statistical information for data analysis, the researcher:
1.must identify first the subject of the study.
2. delimit of determine the scope and coverage of the subject of the study.
3. determine their population and sampling size.
4. determine the sampling methods or techniques to be utilized.
5. prepare the necessary data gathering instruments for purposes of investigation.

There are two types of samples: the probability sample and the nonprobability sample.
Activity:
Choose the best answer among the choices.
1.The best random sampling design because every individual in the population has equal chance of
inclusion in the sample is
a. Stratified random sampling
b. Simple random sampling
c. Restricted random sampling

2.The sampling design in which all individuals in the population are arranged in methodical manner
and the nth name may be chosen in the construction of the sample is
a. Systematic sampling
b. Stratified random sampling
c. Unrestricted random sampling

3.The sampling design based on selecting the individuals as samples according to the criteria of the
researcher which serve as controls is
a. Quota sampling
b. Incidental sampling
c. Purposive sampling
d. Cluster sampling

4.The sampling design which is intended to improve the validity of the sample and is applicable
when the population being studied is homogeneous is
a. Cluster sampling
b.Simple random sampling
c. stratified sampling

5 A population of 900 has a sample size of


a.218
b.217
c.219
d.220

6.Sampling is inapplicable to the population of


a.100
b.110
c.99

7.Which of the following does not belong to the group?


a. Quota
b. Incidental sampling
c. Cluster Sampling

8. The sample size of 750 population is


a. 210
b. 211
c.208

9. A 2000 population has a sample size of


a.236
b.238
c.232

10.Sampling design in which the population is grouped into small units such as blocks or districts is
a.Purposive sampling
b. Quota sampling
c. Cluster sampling

11.Which of the following does not belong to the group?


a. Purposive sampling
b. Multi-stage sampling
c. Cluster sampling

12.Sampling design in which the researcher simply takes the closest individuals as subjects of the
study because they are most available is
a. Quota sampling
b. Purposive sampling
c.Cluster sampling

13. A population of 300 has a samle size of


a.181
b.166
c.165

14. The sampling design which is popular in the field of opinion research is
a. Incidental sampling
b. Cluster sampling
c. Quota sampling

15. The sample size of 550 population is


a.196
b.194
c. 192

II. Compute the sample size of the following population. Show your solution.
1.230
2. 340
3. 570
4.890
5. 2,300
CHAPTER II Organization and Presentation of Data

Overview:
Gathered data can be made more interesting by presenting them in the form of graphs and
tables. For instance, the readers do not appreciate reading a statistical report on the current
population of the different countries in the world. If the report is just a list of numbers from a
paragraph to another paragraph.
Data types that are tabulated are the frequency distribution, correlated data and time series data.
There is no need to construct the frequency distribution if the table of observations is less than 30.
Data that are presented in frequency distribution table form are called grouped data and those that
are not are ungrouped data.

Objectives
1.present the data into different forms.
2. determine the appropriate graph for a particular information.
3. construct a frequency table.

Lesson 2.1. Data Presentation

Overview:
After applying the different methods of collecting data, the raw data gathered from primary or
secondary sources should be organized and presented in summarized form. This lesson focuses on
the different forms of data presentation, and the different types of graphs and charts.

Objectives :

At the end of the lesson, you should be able to:


1. differentiate the different forms of data presentation;
2. familiarize different types of graphs and charts; and
3. appreciate the use of tables, graphs, and charts in data presentation.

Different forms of data presentation


1.Textual. This form of presentation combines text and numerical facts in paragraphs to
explain the summary of data gathered. It usually discusses the highlights of the data.

2.Tabular. This form of presentation uses statistical table that shows the data in a
more concise and systematic manner. The table facilitates the analysis of
relationships of data.

Advantages of Tabular Presentation


a.It provides the reader a good grasp of the meaning of the quantitative
relationship of the data presented in the report.
b.The systematic arrangement of columns and rows makes the table
understandable by the reader.
c.The rows and columns facilitate comparison.
d.It gives a vivid picture of the whole data; thus, decision-making will be easier.
e.It saves time for the reader to analyze and interpret data.

Example of Tabular Form

Summary of Number of Students Enrolled in ABC School for SY 2019-2020

Curriculum Year Number of Students


Total
Level
Boys Girls
First Year 500 685 1,185
Second Year 490 670 1,160
Third Year 450 650 1,100
Fourth Year 400 625 1,025
Total 1,840 2,630 4,470

3.Graphical: This form of presentation is the most interesting and the most effective means of
organizing and presenting statistical data. The important relationships of data can be easily seen
merely looking at colorful figures that are creatively designed
Different types of graphs/charts
A. Area. This type of chart displays graphically quantitative data. It is based on the line chart.
The area between axis and line are commonly emphasized with colors, texture, and
hatchings. Commonly one compares two or more quantities with an area chart
B. Bar. This type of data presentation is composed of bars or rectangular prisms of equal
widths. It can be horizontally or vertically in single or paired bar graphs. The length of each
rectangle is proportional to the frequency of observed item or magnitude of class under
interval of item being studied. Information can easily be drawn by reading this graph in a
two-way dimension. It can be made more interesting especially if different colors will be
used or different shades will be applied to give distinction for each bar. In some cases, bars
can be drawn in opposite directions to illustrate contrasting situation.

Bar chart with vertical bars. Bar chart with horizontal bars.
Categories are on the x-axis Categories are on the y-axis

b.Column. This is a data visualization where each category is represented by a rectangle, with the
height of the rectangle being proportional to the values being plotted. Column charts are also known
as vertical bar column.
Pie Chart. This represents relationships of the different components of a data. It is the ideal graph if you want to
show the partition of a whole. The angles or sectors should be proportional to the percentage components of the
data. The use of different color or legends will be helpful to identify each component easily.

a.Doughnut. This is a built-in chart type. Doughnut charts are meant to express a “part-to-whole” relationship,
where all pieces together represent 100%. Doughnut chart work best to display data with a small number of

categories.

C.Line Graph. This type of data presentation shows relationships between two sets of quantities. This type is
often used to predict growth trends such as sales and population for a long period o\\\\\\\\\\
D.Scatter. This type illustrates the relationships between two variables, points are plotted in a Cartesian plane. It
is like making a line graph except that there is no need to connect the points.

Ice Cream

©sweetspot.com

To facilitate in making the graphs, you can use the Microsoft Excel to create your chart.
This will guide you through the steps of selecting the chart type, adding chart titles and
labels. Before starting to use the Microsoft excel select the data, or range that you want
to convert into chart. The following discussion is a step-by- step procedure on how to
create a chart.

Example:
Six Months birth of Female and Male babies.
X Y
20 35
30 25
40 65
50 45
60 50
70 80
1. Select the range A1:A7. Hold down the Ctrl key and then select the
range B1:B7. (Both ranges of data will appear on the chart)
2. Click the Insert button on the formulating toolbar. Then click the
recommended charts box will open as shown in figure 2.1.
3. Click the All Charts if you want to view all the types of charts. Click the
Column or any type of chart you want to use in the Chart type list,
and then select the first chart sub- type in the second row. Click the
Press and Hold to View Sample button inbox will open as shown in
the dialog box. At this point you will see how your chart will look like.

Step 1 – 3 Chart Type Dialogue


4. Release the mouse button and click OK. You can see a preview of the
chart. You are free to edit and improve the chart by selecting the Quick
Layout, Change Colors, and Change Chart Types. You can also edit
or delete the chart title if you want.

Step 4, Chart Dialogue Box

Note: You can select the data you want in the chart and press ALT+F1 to
create a chart immediately, but it might not be the best chart for the data,
if you don’t see a chart you like or want to use, select the Change Chart
Type or All Charts tab to see all charts types.
STEPS IN INTERPRETING GRAPHS, CHARTS, AND TABLES
1.Read the title of the graph, chart, or table. The title tells what information is being displayed.
2. Look at the legend of the graph, chart, or table. It will explain symbols and color use in the graph or chart.

3.Read the label of the graph, chart, or table. The labels tell you what variables or parameters are being
displayed.

4. Draw conclusions based on the data. You can reach conclusions faster with graphs or
charts than using a data table or a written description of the data.
Summary:
Chart Type with Description

Chart type Description


Area Trends can be emphasized effectively
because it illustrates the magnitude of
change over time.
Bar This chart type is ideal if you want to make
comparisons among individual
items with two- way reading.
Column This is useful in showing changes over
a period. It has the same function as with
the bar chart.
Pie This type of chart compares the sizes of
each sector as they relate to the whole
unit. It illustrates the partition of parts with
a total of 100% and applicable if there is
only one kind of
data to be analyzed.
Doughnut It also shows the comparisons between
the whole and the parts, but this type can
be used to show more
than one set of data.
Line It illustrates the trends in data with
equal intervals. It is two-way reading.
Scatter It illustrates the relationship between
two variables.

Lesson 2.2. Frequency Distribution


Overview:

Important characteristic of a large mass of data can be readily assessed by grouping the data into different classes
and then determining the number of observations that fall in each of the classes. To obtain information quickly
from the numerical data, the data must be organized in some systematic fashion such as in a form of frequency
distribution.

A frequency table is a device for organizing and representing grouped data. When the data contains more than 30
cases, a frequency distribution table is constructed to make the task more manageable and to save time in
calculating different statistics. Following steps in constructing frequency table is helpful.

Objectives:

1. construct a frequency distribution table

2. construct the cumulative frequency and relative frequency table .

3.graph the frequency distribution, ogive and relative frequency.

Content:

General Rules for performing Frequency Distribution:

Step 1: Determine the range (R) of the distribution


The range refers to the difference between the highest and the lowest number.
Range = Highest number – Lowest number
R=H–L

Step 2: Determine the class size (𝑖) by dividing the range by the described number of class intervals. The
number of classes for a frequency distribution table varies from 5 to 20, depending mainly on the number
of observations in the data set. It is preferable to have more classes as the size of a data increases. The
decision about the number of classes is arbitrarily made by the data organizer.

Step 3. Determine the number of observation falling into each class interval, find the class frequencies.
This is done by using a tally or score sheet.

Example:

Construct the frequency distribution table of the data of the ages of patients in Hospital Q, May 2000

Age of Patients of Hospital Q, May 2000

25 28 27 30 32 25 31 26 29 6
31 20 21 32 18 50 53 60 50 54
45 40 37 25 20 27 32 24 29 30
25 24 10 12 15 28

Solution:
Steps:

1. Find the Range.


Range =Highest number – Lowest number
= 60 – 6
=54

2. Determine I (class interval). Divide the range by convenient number of classes having the same size.
Example we chose 6 as the number of classes(n).
i= 54/6
i=9
3. Construct a frequency distribution table having a class size of 6 and class interval of 9.
4. Tally the data and determine the frequency of each class.
5. Determine the class mark or mid-point and class boundaries.

Table 1.1 Ages of Patients in Hospital Q, May2000

Age in Year Tally Marks Frequency Class Class boundaries


(class interval) Mark/Midpoint
60-68 / 1 64 59.5 – 69.5
51-59 // 2 55 50.5 – 59.5
42-50 /// 3 46 41.5 – 50.5
33-41 // 2 37 32.5 - 41.5
24-32 /////-/////-/////-///// 20 28 23.5 - 32.5
15-23 ///// 5 19 14.5 - 23.5
6-14 /// 3 10 5.5 - 14.5
N=36

Since the lowest number is 6, this becomes the lower limit of the first class interval. 14 is the higher limit of this
interval (9-1) or 8 then add to 6. The lower limit of the next class interval is to add the value of the class interval
(i), in this case i is 9. Therefore add 9 ( 6+9=15+9=24…33,42,51,60. The same procedure also to find the upper
limit of the next class intervals 14+9=23+9=32…41,50,59,68

a.Class Mark

It is the mid-point of the class interval. Add the lower limit and the upper limit then divide by two.

For 60-69…….(60+68 )/2 =64

51-59…….(51+59)/2 =55 and so on and so forth.

b. Class Boundary

It is also known as the exact limit, and can be obtained by subtracting 0.5 from the lower limit of each interval
and adding 0.5 to the upper limit. (Refer to table 1.1).

60 – 68 class boundaries of 59.5- 68.5

After the data has been collected and tabulated, the next step is to sketch the graph to make the data more
presentable, easier to understand and more appealing and pleasing to the reader.

FREQUENCY Histogram ( refer to table 1.1).The frequency is represented by points in the vertical axis and the
class interval in the horizontal axis. The ordered pair of points in the vertical and horizontal axes is plotted by
placing the bars in the graph area.

Table 1.2. Age of Patients of Hospital Q, May 2000


Age in Year Frequency
(class interval)
60-68 1
51-59 2
42-50 3
33-41 2
24-32 20
15-23 5
6-14 3
N=36

20
18
FREQUENCY

16
14
12
10
8
6
4
2
0

AGE
6 to 14 15 to 23 24 to 32 33 to 41 42 to 50 51 to 59 60 t0 68

FREQUENCY POLYGON- Points connected by line segments are utilized in the Frequency Polygon.( Refers to the
table 1.2. above)

25

20
FREQUENCY

15

10

0
6 to 14 15 to 23 24 to 32 33 to 41 42 to 50 51 to 59 60 t0 68

AGE
Cumulative Frequency Ogive is commonly used in statistical reports and text.

Table 1.3. Age of Patient at Hospital Q, May 2000

Age in Year Frequency Class <CF Greater than


(class interval) boundaries CF
60-68 1 59.5 – 69.5 36 1
51-59 2 50.5 – 59.5 35 3
42-50 3 41.5 – 50.5 33 6
33-41 2 32.5 - 41.5 30 8
24-32 20 23.5 - 32.5 28 28
15-23 5 14.5 - 23.5 8 33
6-14 3 5.5 - 14.5 3 36
<CF= Less than Cumulative Frequency, Each entry now is obtained by accumulating the frequencies starting
from the frequency of the interval containing the lowest score up to the frequency of the highest score. If the
(greater than Cf ) is used, then accumulate the frequencies starting from highest score or number down to
lowest. Example for <CF, 3for the first entry then 3+5 is 8 the next entry then 8+20=28 … and so on.
Cumulative Frequency Ogive is presented below

40
35
Frequency

30
25
20
15
10
5
0
6 to 14 15 to 23 24 to 32 33 to 41 42 to 50 51 to 59 60 t0 68

Age

Relative Frequency Table:

Age in Year Frequency (f) Relative


(class interval) Frequency (RF)
60-68 1 2.78
51-59 2 5.5
42-50 3 8.33
33-41 2 5.5
24-32 20 55.56
15-23 5 13.89
6-14 3 8.33
N=36 99.89

RF = Frequency of the class interval / total number of data X 100%


Relative Frequency graph

60
relative frequency in percent

50

40

30

20

10

0
6 to 14 15 to 23 24 to 32 33 to 41 42 to 50 51 to 59 60 t0 68

Age in years

Summary:

Definition and advantage of frequency distribution is to condense and simplify data without losing the
essential details. The frequency distribution achieves condensation of data by losing the idetity of the
individual values. Despite this loss of identity, a great deal has been gained by this condensation:
1. It can be seen that all the information revealed by the array can be obtained from the frequency
distribution with greater ease.
2. The distribution not only shows clearly the condensation of the individual values to variables and
below the concentration.
3. The data formed into frequency distribution, comparison between two or more series can be made
more readily, frequency tables are indispensable for speeding up competitions of many other
descriptive measures.

Activity:
Construct a distribution of the following amounts of sulfur oxide(Kg) emitted by an industrial plant on 80
days
158 264 112 110 204 147 162 205 208 133
181 248 261 209 214 180 243 118 179 187
128 155 192 77 225 193 94 139 286 194
216 135 246 200 241 90 176 167 169 235
184 257 201 132 237 107 190 145 181 318
285 266 201 170 223 275 239 175 98 227
152 230 296 219 105 173 62 180 229 246
194 123 159 227 268 191 185 144 83 259

1. Construct a frequency distribution table and graph: Frequency polygon and Frequency Histogram
2. Construct the less than and greater than cumulative frequency ogive
3. Construct a relative frequency table and graph relative frequency polygon
Note: use 7 as a convenient number for number of classes
Chapter 3. Descriptive Statistics
Overview:

Descriptive statistics are used to describe the basic features of the data in a study. They provide
simple summaries about the sample and the measures. Together with simple graphics analysis,
they form the basis of virtually every quantitative analysis of data.

Objectives:

1. To Determine the measure of central tendency of the gathered data


2. To measure the position and location of the scores
3. To evaluate the data variation

Lesson 3.1 . Measures of Central Tendency


Overview:
Finding your average grade , average expenses, middle score, or your classmates in the middle based
on your heights ; or a mod fashion in a certain party you attended? All these are common procedures
done by each one of us and without knowing , we have been doing statistical tasks quite often. There
are three types of measure of central tendency namely the mean, median and mode.
In this chapter ,we are working on data that give information in each member of the population or
sample individually that are called ungrouped data, whereas grouped data are presented in the form of
a frequency distribution table.

Objectives:
1. Determine the characteristics of the mean, median and mode.
2. Compute the mean, median and mode of the ungrouped data and grouped data.
3. Determine the use of mean, median and mode

Content:

A. Measures of Central Tendency of Ungrouped Data


Mean
The mean is the most frequently used measure of central tendency because it is subject to less error, it
is rigidly defined; it is also easily calculated.
To find the mean, add all the items or observations then divide the sum by the total number of
observations. In symbols.
Example: (Population Mean)
The number of Covid patients in 5 countries which have affected by the latest strain of the virus:
Country A= 500, B = 450, C = 460, D= 450, and E = 400. Find the population mean of patients
affected by virus .

Solution: ∑x
μ = ------
N
=500+450+460+450+400
5
= 2260 /5
= 452 patients

Example: (Sample Mean)


A sample of 12 high school students was asked on how many hours had they spent on watching
the television last week. The responses are listed below.
5 12 15 20 15 24 0 24 30 10 16 18

∑x
Sample mean = ------
n
=5+12+15+20+15+24+0+24+30+10+16+18
12
= 189/12
=15.75 hours

Median:
The median is the value of the middle term in a data set that has been ranked in decreasing or
decreasing order. As is obvious from the definition of the median, it divides a ranked data set
into two equal parts. The calculation of the median consists of the following two steps:
1. Rank the data set in increasing or decreasing order.
2. Find the middle term. The value of this term is the median.
Note that if the number of observations in a data set is odd, then the median is given by the
value of the middle term in the ranked data. However, if the number of observations is even,
then the median is given by the average of the values of the two middle terms. To illustrates this,
consider the following values:
Examples:
1.When the number of observations is odd, say n= 9. Find the median of: 13 43 23 20 51
64 49 80 55
Solution:
Arrange the data in ascending or descending order.
13 20 23 43 49 51 55 64 80
The median is the middle score of the data; therefore, 49 is the median.
2. When the number of observations is even, say n=10.
Find the median of: 34 56 89 42 26 14 28 56 78 98
Solution:
Arrange the data in ascending or descending order.
14 26 28 34 42 56 56 78 89 98
The median is the average of the two middle scores; therefore, getting the
average of the 5th and 6th values, 49 is the median.
Median = 42+56
2
Median = 49

Mode
In statistics, the mode is the value that occurs with the highest frequency in a
data set. If there is no common score, the said data has no mode. A
distribution with only one mode is said to be unimodal while a distribution
with two or more modes is described as multi-modal.

Examples:
1. Find the mode of the following scores:
0 1 3 5 3 3 8 9 3 4

Solution:
Simply get the value of the most frequent appearing value. The mode of the
given data is 3.

2. Find the mode of:


2 4 8 6 4 6 8 6 7 0 4 6 6 4
4
Solution:
There are two values that appeared four times; therefore, the modes of the
given data are
4 and 6.

B. Measuring Mean.Median and Mode of a Grouped Data


Data which are arranged in a frequency distribution are called Grouped Data.
When the number or items is too large, it is best to compute for the
measures of central Tendency and variability using the frequency
distribution.

∑fx
Mean = ------
N
Where f is the frequency, x is the class mark or the midpoint, and N is the

total observation or frequency.


Example:
Table 3.1 Scores of students in Statistics

Class Interval f X Fx

40 – 44 4 42 168
45 – 49 3 47 141

50 – 54 4 52 208
55 – 59 3 57 171
60 – 64 10 62 620
65 – 69 2 67 134
70 – 74 5 72 360
75 – 79 8 77 616
80 – 84 3 82 246
85 – 89 6 87 522
90 – 94 2 92 184
N=50 ∑fx=3,370

ˉx = ∑fx

3,370
=
50
= 67.4 is the mean of the scores of 50 students.

Median:

To compute the median from grouped data we also must determine the “less than”
cumulative frequency. The median is the sum of the lower limit of the median class and a
fractional part of the class interval size.

Md=Lm +¿)i
Where:
md = Median
𝐿𝑚 = Lower boundary of the lower limit of the median class
N = total frequency
¿ CF = Less than cumulative frequency below the median class
f = frequency of the median class
i = class interval
Table 3.1.1 Scores of students in Statistics

Class Interval F X Fx <CF

40 – 44 4 42 168 4

45 – 49 3 47 141 7

50 – 54 4 52 208 11

55 – 59 3 57 171 14

60 – 64 10 62 620 24

65 – 69 2 67 134 26

70 – 74 5 72 360 31

75 – 79 8 77 616 39

80 – 84 3 82 246 42

85 – 89 6 87 522 48

90 – 94 2 92 184 50

N=50 ∑fx=3,37
0

What is the median of the scores?


Step1. Construct the <CF of the table.
Step2 Compute N/2 =50/2 = 25
Step3. Locate where 25 belongs in the <CF.
Step4. Identify the median class. 65-69 is the
median class….64.5- lower boundary limit, 2 is the f.
Step5. Compute the median.

Md=Lm +¿)i

Md=64.5+¿)5

Md=64.5+¿)5

Md=64.5+(0.5)5
Md=¿67

Mode
The mode in a frequency distribution is within the class interval with the highest frequency.
The class interval with the frequency is known as the modal class. A crude mode may be
determined by taking the class mark with the highest frequency. However, this rough
approximation may be improved by considering the frequencies adjoining the modal class.

∆1
Mo = Lm + ( ¿i
∆ 1+∆ 2
Where:
Lm is the lower limit of the modal class (this is the class interval with
the highest frequency)
∆1 is the difference between the highest frequency and the frequency
above it.
∆2 is the difference between the highest frequency and the frequency
below it.
𝑖 is the class interval
Using the table 3.1.1
Class Interval f

40 – 44 4
45 – 49 3
50 – 54 4 ∆2
55 – 59 3
60 – 64 10
65 – 69 2
Modal Class
70 – 74 5
75 – 79 8
80 – 84 3 ∆1
85 – 89 6
90 – 94 2

∆1
Mo = Lm + ( ¿i
∆ 1+∆ 2
8
Mo = 59.5 + ( ¿5
8+7

Mo = 59.5 +2.67
Mo = 62.17

Summary:
A measure of central tendency is a summary statistic that
represents the center point or typical value of a dataset. These measures
indicate where values in a distribution fall and are also referred to as the
central location of a distribution. You can think of it as the tendency of
data to cluster around a middle value. In statistics, the three most
common measures of central tendency are the mean, median, and mode.
Each of these measures calculates the location of the central point using a
different method. Choosing the best measure of central tendency depends
on the type of data you have.

Activity:
Refer to table 3.1.2
A..Find the mean of AIDS cases of four hospitals.
2. Which hospital has the highest mean?
3. What year have the highest AIDS cases?
4. What year have the lowest case?
Table 3.1.2. AIDS Cases
Hospital A B C D
Yr. 1999 500 200 211 100
2000 400 350 250 100
2001 100 140 620 250
2002 80 140 401 300
2003 50 175 200 180

B.Find the median and the mode . Show the solution.


1. 22 25 22 20 23 24 23 21 20 20 28 29 30
2. 100 150 450 455 1000 6000 811 455 1000
3. 5 5/5 0 -12 2/5 3/5 5
4. 90 45 70 50 90 28
5. -30 -20 -20 -1 -10
C.Calculate the mean ,median and mode of the distribution below. Show the
formula and the solution.

Class Interval F
95-99 10
90-94 20
85-89 25
80-84 28
75-79 12
70-74 8

Lesson 3.2 . Measures of Position


Overview:
Measures of position include not only central location but also any
position depending on the number of equal divisions in a given
distribution. If we divide the distribution into four equal divisions then we
have quartiles denoted by Q1 Q2, Q3, and Q4.. The most commonly used
measures of position are the quartile, deciles and percentiles.
Objectives:
1. Determine the formula of the measures of position
2. Describe the use and importance of the measures of position.
3. Compute the quartiles,deciles and percentage of the grouped and
ungrouped data.
Content:
A. Ungrouped Data
Quartiles
Quartiles divide a distribution into four equal parts Q1 or the first quartile
locates the point which is greater than 25% of the items in a distribution
3N
Q3 is 3rd quartile Q 3= th item (means 75% of the observations
4
lie below this value)
2N
Q2 is 2nd quartile Q 2= th item or the median
4
1N
Q1 is 1st quartile Q 1= th item
4
Deciles
Deciles are values that divide a distribution into 10 equal parts. (D 1 D2,,
D3, D4… D10 )
1N
D1 is the first Decile D 1= th item
10
5N
D5 is the first Decile D 5= th item or the median
10
8N
D8 is the first Decile D 8= th item
10
Percentiles
Percentiles are values that divide the distribution into 100 equal parts. P 10 or
tenth
percentile locates the point that is greater than 10 percent of the items in
the distribution.
1N
P1 is the first Percentage P1= th item
100
50 N
P50 is the fiftieh Percentage P50= th item or the median
100
62 N
P62 is the sixty 2nd Percentage P62= th item
100
B.Grouped data

Quartiles
K in the formula = to 1,2,3 and 4.

[ ]
kN
−¿ CF
4 i
Qk =LQk +
f Qk

Where: LQk = is the lower limit of the class interval where the quartile class is found.
¿ CF = is the less than cumulative frequency before the quartile class
f Qk = is the frequency of the quartile class
i = is the class interval
Deciles:
k in the formula = 1,2,3,…10.

[ ]
kN
−¿ CF
10
Dk = LDk + i
f Dk

Percentiles:
K in the percentile formula =1, 2, 3,…100.
[ ]
kN
−¿ CF
100
Pk =L Dk + i
f Pk

Summary:
The most common and most widely used point measure is the percentile.
In order that the obtained values from any set of observations have common frame
of reference that is meaningful, any of the foregoing point measures is applicable. If
an individual is given a percentage value of 75, this means that in a typical sample of
100, he would excel above 75 individuals of lower rank. If the researcher wants to
determine in the class limit fall one-fourth of the case, quartile is used and decile if it
is within tenth parts.

Activity:
A.Find the IQ scores belong to the following position: Q2, D4, D6, D3 , P10, P75 , P90

IQ Scores of 22 students.
87 90 95 96 97 98 99 100 100 100 100
88 102 102 103 105 105 105 107 108 110

B.1.Find the Q3, Q1, D5, D8, P25, P75, P50 of the distribution below. Write the
formula and the solution.
Class Interval Frequency
7-9 13
10-12 18
13-15 25
16-18 20
19-21 17
22-24 10

Lesson 3.3.Measures of Variability

Overview:
The measures of variation enable us to know how varied the observations
are, whether they are extreme value in the distribution, or whether their
values are very close to each other. If the measure is zero, it means that there
is no variation at all. The observations are all alike, or homogeneous.
Otherwise, they are heterogeneous. The common measures of variation are
the range, variance, standard deviation and coefficient of variation.

Objectives:
1. Determine the measures of variability.
2. Describe the characteristics of each measures of variability.
3. Compute range, Mean absolute deviation, variance, sd, and CV.

Content:

Range
Range is the simplest form of measuring variation of a distribution. To get
the range, substract
the lowest score or observation from the highest score.

R = Highest observation – Lowest observation

Mean Absolute Deviation


To find the mean absolute deviation, subtract the mean score from each raw
score then using the absolute values of the differences, get the sum of the results.
The sum is called the sum of the deviations from the mean. Next, divide this number
by N, the total number of cases. In symbols:

MAD=
∑ |x−ˉ x| for ungrouped data
N

MAD=
∑ f |x−ˉ x| for grouped data
N

Ex. Find the MAD of the ages of scientists: 34,35,45,56,32,25 and 40.
Solution:
Find the mean. 34+35+45+56+32+25+ 40/ 7 =38.1
X x−ˉ x |x−ˉ x|
34 -4.14 4.14
35 -3.14 3.14
45 6.86 6.86
56 17.86 17.86
32 -6.14 6.14
25 -13.14 13.14
40 1.86 1.86
Total 53.14
53.14
MAD = =7.59 … it means, that the age of scientist is 7.59yrs older or
7
younger from 38.1 yr. old scientist.

Variance
Variance is the other measure of variation which can be used instead of the range.
The variance considers the deviation of each observation from the mean. To obtain
the variance of a distribution, compute the deviation from the mean of each raw
score. Then, square the deviations from the mean and add them. Finally, divide the
resulting sum by N, or the total number of cases.
A. Grouped Data:
1. Population Variance for Grouped Data

2
Σ f ( x −ˉ x )
σ 2n =
N

2. Sample Variance for Grouped data

2 2
2 N ∑ f x −( ∑ fx )
s N-1 =
N ( N −1 )

B. Ungrouped Data:
1. Population Variance for Ungrouped Data

2
Σ ( x−ˉ x )
σ 2N=
N

2. Sample Variance for Ungrouped Data

2
Σ ( x−ˉ x )
σ 2N-1 =
N−1

Application: Fill up the table and compute the population and


sample variance of the
data.
Table 3.3.1. IQ Score
IQ F X Fx X2 fx2 x−ˉ x ( x−ˉ x )2 f ( x−ˉ x )2
Score
75-79 10
80-84 12
85-89 25
90-94 34
95-99 19
100- 15
104
N=11 ∑fx= ∑fx2 2
∑ f (x ˉ x ) =
5 =
Standard Deviation
Standard deviation is another measure of variability, the most commonly used
indicator of the degree of dispersion and is also the most dependable measure to
estimate the variability in a
total population from which the sample came.

a. Population Standard Deviation (σ)

σ =√ σ 2 N

b. Sample Standard Deviation(s)

S=√ s 2n−1

Coefficient of Variation
Coefficient of variation is the measure of relative variability. It may defined
as the
ratio of standard deviation to the arithmetic mean. It is expressed in
percentage. This
measure is used to compare two sets of data to determine if they are
similarly or differently
“scattered”. The CV formula is:
standard deviation
CV= x 100%
mean

Application: Suppose two group of students are to be compared in


terms of height.
Group Mean Standard CV
height deviation
Male 162cm 10cm 6.1% Solution:
Female 148cm 4cm 2.70% 10
CVmale =
162
x100% =6.1%

4
CVfemale = x100% =2.70%
148
Comparing the relative variation in the height of the male and female
students. It can be
seen that the male students have higher CV than the female
students. Thus, male
student’s height are more varied

Summary:
To facilitate easy and accurate computation of the mean,standard deviation
and the variance,
the scientific calculator may be used. Note: Every calculator work differently so
you should know
how your calculator works to perform the task.

Steps in finding the mean and the standard deviation for ungrouped data:
1. Clear the memory
The memory of your calculator may be cleared by picking thr reset button at
theback or by pressing the shift key then AC/ON , then the = button.
2. Set the calculator to the SD mode or its equivalent (Stat, Stat0, or Stat 1
mode).
3. Input the data one at a time followed by the M+ key.
4. When all the data have been stored, press the shift-keys followed by the =
button for the mean.
5. Using the shift-, the standard deviation is obtained.
6. The variance is simply calculated by squaring the standard deviation.

For grouped Data:


1.Clear the memory.
2. Set the calculator to the LR mode or its equivalent (Stat, Stat0, or REG-Lin mode)
3.Input the data one at a time as follows,
Classmark- ;key- M+key
4.When all the data have been stored, press the shift-keys followed by the = button
for the mean.
5. Using the shift-, the standard deviation is obtained.
6. The variance is simply calculated by squaring the standard deviation.

Activity:
1. At SKSU-Tacurong Campus, upon the release of the CHED allowance of the
scholars, Twenty scholars pledge the following donations in pesos to the
PPA of nurses during the pandemic.
50 100 75 40 60 50 200 25 30 35
50 45 60 100 200 200 150 150 100 45
Compute the range ,MAD, variance, standard deviation,and Coefficient
variation.
Show the formula and solution. Check your computation using the
calculator program.
2. Calculate the population and sample standard deviation of the following
scores:
15 15 25 13 32 30 23 26 65 45 44

3. Which group is the most heterogeneous?


Group1 scores: 100 123 122 150 146 141 132 122
Group2 scores: 102 102 132 154 124 136 125 135
Group3 scores: 150 120 130 114 112 105 136 104
Show your solution as proof of your answer.
4. Find the variance, standard deviation and the CV of the Bacterial colonies
after 24 h of exposure.
Bacterial f
colony
0-5 2
6-11 3
12-17 5
18-23 8
24-29 7
30-35 4
36-41 5
42-47 3
48-53 3
Chapter 4. Health Care Statistics

Overview :
This special topic is included to provide health science students especially student in
nursing program and biology students conducting health related research with basic
information on essential health care statistics, the use of statistical formulae, and the
interpretation of statistical calculations for the analysis of patient health condition.

Objectives:
1. To identify the parameters in the health research.
2. To determine the data sampling and collection in health research.
3. calculate and process health data for analysis.

Lesson 4.1.Fundamental Counting Techniques

Overview:
Health statistics data are data that are collected from hospital in-patients and out-
patients and they are recorded by those who work in the health care industry.
Quantitative research guides health care decision makers with statistics--numerical
data collected from measurements or observation that describe the characteristics
of specific population samples. Descriptive statistics summarize the utility, efficacy
and costs of medical goods and services. Increasingly, health care organizations
employ statistical analysis to measure their performance outcomes.

Objectives:
1. define the health data
2. identify the vital statistics
3. determine the importance of the vital statistics.

Content:
Morbidity and Mortality
Morbility pertains to disease cases and it is obtained to supply data on the
occurrence of disease.While,
Mortality refers to the death cases to provide data on the occurrence of death.

Demographic Variables
While health care is concerned with human health conditions, characteristics of
human population must be studied. Data which describe the human population such
as age, gender, income, and health status need to be considered in health care
analysis. These variables are referred to as demographic variables. The size of the
human population is also considered as a demographic variables and how it changes
over time.

Vital Statistics
Data which show significant records of events and dates a human life are vital
statistics. A few vital statistics that are important to be recorded are birth, death,
marriage, mortality and morbidity.

Types of Health Statistics Data


Dates, test results, diagnoses, treatment procedures, treatment outcomes, and
assessments are few of the types of health statistics data. These data can be found in
the patient’s medical records, admission and discharge reports, transfer, and census
and these are very useful to physicians to correctly diagnose a patient and give
precise treatment.

Data Requestors
Health statistics data are also important to the hospital administration to determine
and assess the quality of service rendered by its staff to the patients. The following
are the usual requestors;
1.administation and governing board
2. medical staff
3. outside agencies – DOH,LGU, Researchers from academe, pharmaceutical,
economist, and others
4. other organization-

Importance of Health Statistics:

The ten great public health achievements identified by the CDC only made possible
by health statistics and research are:
1. Routine immunization of children
2. Motor-vehicle safety
3. Workplace safety
4. Control of infectious diseases
5. Declines in deaths from heart disease and stroke
6. Safer and healthier foods
7. Healthier mothers and babies
8. Family planning
9. Fluoridation of drinking water
10. Recognition of tobacco as a health hazard
Two major achievements in 21st century made possible by health statistics and
research are:
 Personalized medicine: In personalized medicine, an individual's genetic
profile and his or her unique biochemistry are used to customize treatment.
For example, which medicines are likely to provide the best results with the
fewest side effects?
 Disease modification: If a person is diagnosed early enough, it might be
possible to inhibit the disease so that it never debilitates the person..
The other benefits of health statistics and research are in the fields of prolonging life;
preventing diseases – identifying lifestyle risk factors; preventing infectious diseases
– vaccines and randomized controlled trials; preventing disabilities; access to health
services and lifestyle and understanding cultural norms and working around it.

Summary:

Health statistics data must be carefully handled with full knowledge. These must be
properly process and analyze to come up a valid truth for success result and
purpose. Hospitals and other large provider service organizations implement data-
driven, continuous quality improvement programs to maximize efficiency.
Government health and human service agencies gauge the overall health and well-
being of populations with statistical information. Researchers employ scientific
methods to gather data on human population samples. The health care industry
benefits from knowing consumer market characteristics such as age, sex, race,
income and disabilities. These "demographic" statistics can predict the types of
services that people are using and the level of care that is affordable to them.

Activity:
1. Where can a researcher collect vital health statistics of patients positive of
Covid virus in region 12.
2. Suppose you are researching on the medicines effective against pneumonia,
Give at least 5 possible data you will gather in the hospitals to satisfy your
research problem.
3. Give the importance of demographic health profiles.

Lesson 4.2. Rates

Overview:

Health care facilities device and use rates to determine the percentage of an event.
The formulae in the following section are based on the rate formula where n is the
number of times something happens and N is the number of times it could have
happened.
Objectives:
1. Compute the death rates and morbidity rates.
2. Identify the data needed for death and morbidity rates.
3. Analyze the implication of the result of the computed rates.

Content:

Death and Mortality Rate

Death or Mortality rate maybe classified as gross or net death rate (GDR). Gross
death rate represents the death
Rate including all death while net death rate (NDR) represents death rate excluding
death under 48 hours after
Admission. The following formulae are used.

number of deaths
a. GDR = x 100%
total number of discharges

number of deaths−deaths under 48 hrs .


b. NDR = X 100%
total number of discharge−deathunder 48 hrs .

The formula below is the formula applied to determine the Newborn Death rate
NBDR)

number of newborn deaths


c. NBDR = x100 %
total number of newborn discharges including deaths

Ex. The following data was obtained from MCU-FDTMF. Calculate the GDR and NDR
and NBDR.
Admission discharge Death
<48hrs ¿ 48 hrs
Adult/children 285 301 2 13
Newborn 12 19 1 3

Solution:
19
GDR= x100 = 5.94%
320

19−3 16
NDR = x 100 = x 100 = 5.05%
320−3 317
Thus GDR is 5.94% for every 100 population.

4
NBDR = x 100= 17.39.
23
Hence, 17.39% is the rate of infant mortality.
Morbidity rate
Morbidity pertain to disease. Morbidity rate could be measure or calcuted in four
ways according to its prevalence, incidence, complications and fatality.

a. Prevalence rate
Prevalence rate is the ratio between cases of known disease and the entire
population.

number of cases
Prevalence= x100%
population

Example
A total of 42,325 tuberculosis cases are known in the country. Compute for the
prevalence of the disease if the current Philippine population is 87 millon.

Solution:
42,325
Prevalence= x100% = .0486% , Therefore the
87,000,000
prevalence of tuberculosis in the country is .0486% per 100 population or
4.86%per 10,000 population.

b. Incidence Rate:
If the prevalence refers to the existence of a known case of a disease. Incidence
Rate refers to the rate newly reported cases of disease.

newly reported cases


Incidence = x 100%
population at the midperiod

Of the 87 millon population of the country in the recent year, 150,000 newly
reported cases of diabetes mellitus were reported to the Health department.
Determine the incidence of diabetes mellitus.

Solution:
150,000
Incidence = x 100% = .001724% , the incidence of
87,000,000
diabetes mellitus is 1.724 per
100,000 population.

c. Complication Rate
Complication refer to a disorder which resulted after admission and modifies the
patient’s condition. Medical malpractice may result to a complication or even
death to a patient.

complication cases
Complications = x 100%
population at risk
In January 2006, 3 out of the 62 cancer patients had undergone a surgical
procedure due to complications, Calculate the complication rate.

Solution:
3
Complications = x 100% = .0484% The complication rate ,
62
therefore , is 4.84% for every
10,000 population.

d. Fatality Rate
Fatality rate is the rate of death cases due to a particular disease.

number of a givendisease death


Fatality = x 100%
number of disease cases reported

Example: If there were 75,895 hypertension cases reported in the year 2000 and
3,612 died, what is the fatality of hypertension that year?

Solution:
3,612
Fatality = x 100% = 4.76%. The data revealed fatality
75,895
of 4.76% per 100 population
due to the disease
hypertension.

Summary:
1.If you cannot measure it…You cannot improve it - Meaningful quality
improvement must be data-driven.
2. Managed care means managing the processes of care, not managing
physicians and nurses.
3. The right data in the right format at the right time in the right hands- If
clinicians are going to manage care,
they definitely need data. They need the right data delivered in the right format
at the right time and in the
right place. And the data has to be delivered into the right hands—the clinicians
involved in operating and
improving any given process of care.

Activity:
(Note: the data used in the problems were just for computation purposes only)

1. The following is reported by the South Cotabato Provincial Hospital in


January of 2006.
Death
Services Admission Discharges <48hrs ¿ 48 hrs
Surgical 98 101 8 144
Pediatric 28 27 2 7
Newborn 12 9 1 1

Calculate:
a.Gross Death Rate
b. Net Death Rate
c. Fetal Mortality

2. A total of 12,000 Filipinos were diagnosed with HIV in 2000 of which 4,500
are females and the rest are males. If the Philippines population at that time
was 85 million, determine the following:
a. Prevalence of HIV for every 1,000.
b. Prevalence of HIV in males for every 1,000.
c. Prevalence of HIV in females for every 1,000

3. The Department of Health reported the following cases:

Disease New Death


Diagnosis
Covid 19 152 8
Breast 71 5
Cancer
Syphilis 324 3

Determine the following:


a. Incidence of Covid 19
b. Incidence of Breast Cancer
c. Incidence of Syphilis
d. Fatality of Breast Cancer
e. Fatality of Covid 19

Chapter 5. Hypothesis Testing

Overview:
Testing the significance of difference between two means, between two standard deviation,
two proportions, or two percentages , is an important area of inferential Statistics.
Comparison between two or more variables often arises in research or in experiments and
to make valid conclusions regarding the result of the study, one has to apply an appropriate
test statistic. This chapter deals with the discussion of the different test statistics that are
commonly used in research studies.
Objectives:
1. formulate hypothesis.
2. discuss the level of significance probability of committing an error.
3. Compute the z and t- testing
4. Analyze and interpret the result of statistical testing.

Lesson 5.1. Statistical Hypothesis:

Overview:
A statistical hypothesis is a preconceived idea about the value of a population parameter
which can be validated or verified through statistical procedure or tests. It is an assertion,
presumption, or tentative theory which aims to explain facts about the real world. In
attempting to reach decisions, it is advantageous to make assumptions about the target
populations. Such assumptions, which may be correct or not are called statistical
hypotheses.

Objectives:
1. Formulate the null and alternative hypothesis.
2. Differentiate the type 1 and type II error.
3. Identify the steps in hypothesis testing
4. Discuss the three types of alternative hypothesis.

Content:

Null Hypothesis, denoted by Ho is a statement which states that there is no significant


relationship or no significant difference between two or more variables, or one variable does
not affect another variable. In statistical research, hypotheses should be written in null form.
In many instances, we formulate statistical hypothesis for the sole purpose of rejecting or
nullifying it.
For example, suppose we want to know whether method A is more effective than method B
in teaching high school Mathematics. Ho: there is no significant difference between the
effectiveness of Method A and Method B.

Alternative hypothesis- any hypothesis that differs from a given null hypothesis is called an
alternative hypothesis, denoted as Ha, sometimes it is considered as the researcher’s
working hypothesis.
For example: If Ho: p=0.5, alternative hypothesis might be Ha: P≠ 0.5 or Ha: p¿ 0.5 or Ha:
p¿0.5.
Alternative hypothesis is denoted by Ha. Rejection of the null hypothesis leads to the
acceptance of the alternative hypothesis.

Types of Decision errors


When dealing with hypothesis tests, there are four possible outcomes: the two
outcomes lead to incorrect decision and the other two lead to correct decision. The
outcomes are described in the given table.
Table 5.1 Possible Outcomes for a Hypothesis Test

Fact H0 is true H0 is false


Decision

Failed to reject H0 Correct decision Type II error


Accept Ho (Probability = 1-α (Probability = β
Reject H0 Type 1 error Correct decision
(probability= α ) Probability =1-β

Based on the given table, a researcher commits an error if a true H 0 is rejected or


accepted by a false H0. When a researcher rejects a true H0, he commits a Type I error or
alpha error (α). When a researcher accepts a false H0, he commits a Type II error or beta
error (β).

1. Type I error or alpha error (α). A type I error is committed when the
researcher rejects a null hypothesis when in fact it is true.
2. Type II error or beta error (β). A type II error I committed when the
researcher accepts a null hypothesis when in fact it is false.

Level of Significance
When a researcher tests the hypothesis, he is not certain that the decision is 100%
correct. However, he is confident at a certain level that the decision is correct, say 99% of
the decision he made is a correct one. The confidence level is 99% or the level of significance
is 1%. When the confidence level is 95%, the level of significance is 5%. On the other hand,
when the confidence level is 90%, the level of significance is 10%. In this case, the higher the
confidence level, the more certain that the decision od rejecting the null hypothesis is
correct.

Level of significance is the probability of committing a Type I error or alpha (α) error
or the probability of rejecting the correct null hypothesis.
Power of a Test
Power of a test is the probability of not committing a Type II error or beta (β) error.

Test Statistics
The test statistic is a mathematical formula that allows researchers to determine the
likelihood of obtaining sample outcomes if the null hypothesis were true. The value of the
test statistic is used to make a decision regarding the null hypothesis. The test statistic is
used as a basis for deciding whether to reject or accept the null hypothesis. The rejection lies
at either the left or right tail of the normal curve of one-tailed test is being used. On the
other hand, the rejection region lies at both end tails of the normal curve if two-tailed test

Non-Rejection
Regionẋ Rejection Region

will be utilized.

 Rejection Region. When the test statistics lies on the rejection region, then
the null hypothesis will be rejected.
 Non-Rejection Region. The non-rejection region is the probability of making a
Type I error equals to the level of significance. Non-rejection region is also
known as the acceptance region. When the test statistic lies within the non-
rejection region, the null hypothesis will be accepted
or the critical value is greater than the computed value of the test statistic.

 Critical Value. The critical value is a value that separates the non- rejection
region and the rejection region.

One tailed and Two tailed-test

The use of one-tailed test or two-tailed test will depend on how the alternative is
formulated. If the alternative hypothesis is expressed in non- directional, it will utilize the
two-tailed test. However, use the one-tailed test if the alternative hypothesis is directional.
In two-tailed test, the two rejection regions lie at both end tails of the normal curve; each
part will be half of the alpha value. If α = 0.05, the area at both end tail is α = 0.025. in one-
tailed test, the rejection region lies either at the left or right end tail of the normal curve.

Non-Rejection Region
1-α
Rejection
Region α
ẋ Z = 1.645
Critical Value

Ha: M¿Mo

Non-Rejection
Region

1-α

Rejection Region Rejection Region


α/2

Ha: M≠Mo
Non-Rejection
Region, α-1
Rejection Region, α

Z = 1.645 ẋ

Ha: M< Mo

Steps in Hypothesis Testing


The goal of hypothesis testing is to determine the likelihood that a population parameter, such as the
mean, is likely to be true. The following are the four steps of hypothesis testing:

1. State or formulate the null hypothesis (Ho) and the alternative hypothesis (Ha).

2. Specify the level od significance (α) to be used. The level of significance is the statistical standard
which is specified for rejecting the null hypothesis (Ho). If there is 5% level of significance is
used, there is probability of 0.05 of rejecting the Ho when it is true. The most frequent used
level of significance in hypothesis testing are the 5% and the 1% level.

3. Select the most appropriate test statistic or statistical tool. There is specific statistical tool or test
statistic that is appropriate for each kind of statistical hypothesis. Identify also the type of
statistical test as either one –tailed test or two-tailed test depending how the alternative
hypothesis is being expressed.

4. Compute the actual value of the test statistic from the sample data (i.e. z-test or t-test or F-test,
etc.

5. Establish the critical (rejection) region or the tabular value for the selected test statistic from the
statistical table based on the degree of freedom (for t-test and F-test) and level of
significance(α). Take note the type of statistical test to be used whether it is a one-tailed test or
a two- tailed test as elaborated in step 1 and 3.
6. Making decision, conclusion and recommendation/s.The computed or observed value of the
sample statistic is compared with the tabular or critical value (or values) of the test statistic. This
is the basis whether to accept or reject the null hypothesis. Accepting the Ho implies rejecting
the alternative hypothesis (Ha), in like manner, rejecting Ho means accepting the Ha. Given
below are guidelines in making a decision for a given null hypothesis:

6.1. Reject the null hypothesis (Ho) if the computed value is greater than or equal (≥) to the
tabular value.

6.2. Accept the null hypothesis (Ho) if the computed value is less than (<) the tabular value.

Making conclusion and recommendation are the last part in hypothesis testing. At this point, the
researcher will explain his decision based on the result of his statistical analysis.

Summary:

Interpreting the outcome of the research may not just end by simply saying the null hypothesis is
accepted or rejected.

It is the primary obligation of the researcher to further explain the implication of the result and drawing
conclusion by answering the original problem and to make the necessary recommendation, in some
Type equation here .instances, this should be supported by related review of literature.

Activity:

A. Formulate the Null hypothesis (Ho) and 3 possible Alternative hypothesis (Ha) of the following
problems and identify the hypothesis test is left, right or non directional. Illustrate the rejection
region and indicate the % of acceptance.

1. A doctor wants to know if the average recovery time of a patient taking a particular
medication is one month. Consider a 5% level of significance.

2. The researcher wants to determine if there is significant difference between the


performance of the faculty from the different colleges in 1% level of significance.

3. A drug store wants to know if the average sale of paracetamol is more than 100 per day.
Lesson 5.2. z- test : Testing Hypothesis

Overview:

A Z-test is a type of hypothesis test—a way for you to figure out if results from a test are valid or
repeatable.

For example, if someone said they had found a new drug that cures cancer, you would want to be sure it
was probably true. A hypothesis test will tell you if it’s probably true, or probably not true. A Z test, is
used when your data is approximately normally distributed (i.e. the data has the shape of a bell
curve when you graph it).

Objectives:

1. Identify the requirements and characteristics of z-test.

2. Compute and process the data in z-test.

3. Follow the steps of hypothesis testing in z-test analysis.

4. Analyze the z result of the data.

Content:

Z-test on the Comparison Between the Population Mean and Sample Mean

A significance test can be applied to test whether a mean based on the sample size n, differ
significantly, or otherwise, from a population mean, μ. The one sample z-test is a statistical test for the
mean of a population and can be used when the following requirements are satisfied:

1. When we want to test significant difference between the population mean (μ) and the sample
mean.

2. When sample size is large (n≥ 30).

3. When the population (N) is normally distributed.

4. When the population variance (σ2) or population standard deviation (σ) is known. However, if
the population standard deviation is not known, a z- test is still applicable provided that the
sample size is sufficiently large (n>30) and the distribution of the sample data are normally
distributed.

5. When the samples are independent or taken at random.


ˉ x−μ
Z0 = σ (if σ is known)
√n
ˉ x−μ
z0 = sd (if σ is unknown, with large samples, n>30 and the distribution of sample data
√n
are normally distributed.
Where Z0 = = Z- computed value (one sample case)
ˉ x = sample mean
μ = population mean
σ = population standard deviation
n = sample size
sd = sample standard deviation
σ/√ n or sd/ √ n = represents standard error of the mean or standard deviation of the
mean.

The tabular values or critical values of = z is obtained from the following table

Table 5.2.1 Critical Values of z.


Level of Significance
Test Type 0.10 0.05 0.025 0.01
One-tailed test ±1.28 ±1.645 ±1.96 ±2.33
Two-tailed test ±1.645 ±1.96 ±2.33 ±2.58

Example:
A company , who makes children’s battery- operated toy cars, claims that its products have a mean life span of 5
years with a standard deviation of 2 years. Test the hypothesis that μ is not equal 5 years against the alternative
hypothesis that μ≠ 5 years if a random sample of 40 toy cars is tested and found to have a mean life span for only
3 years. Use 0.05 level significance.
1. H0 : The mean lifespan of the battery- operated to cars is 5 years
(H0:: μ=5years)
Ha : The mean lifespan of the battery- operated to cars is not 5 years
(Ha:: μ≠ 5years)


2. α = 5% or 0.05 ; two-tailed
3.Use z-test as Test statistics
4. Computation:
Given: ˉ x = 3
μ=5
σ=2
n = 40
ˉ x−μ
ˉ x−μ
Zc = Z0 =
σ √
σ or n
√n
3−5
Zc = /√ 40
2

=--6.32

5. Critical regions: Ztab= ±1.96

6. Decision; Since the computed value ( Zc=6.32) > Ztab= ±1.96 ;Reject the Ho: μ=5years at
σ=5% level of significant (two-tailed test), and accept the alternative hypothesis. Thus,
there is enough evidence to accept the fact that the mean life span of the toys is not equal
to 5 years.

z-test: Testing the Differences between Two means (large Independent samples)

ˉx 1−ˉx 2
Zc =
√ σ 12 ˉ σ 22 ; if population standard deviation are known
n1
+
n2

ˉx 1−ˉx 2
Zc =

2 2
s1 ˉ s2 if population standard deviation (σ) are unknown but n> 30.
+
n1 n2

Example:

A tissue culture propagator of rare cardboard ornamental plant wants to validate the claim that there is
no significant difference in the mean survival of cardboard in culture media A and culture media B. He
randomly selected 55 samples in each media. The mean survival and standard deviation are shown in
table below. At 1% level of significance, determine if there is enough evidence to reject the claim of no
difference in the mean survival in media A and media B.

Media A Media B

ˉ x= 94 ˉ x=¿ 90

Sd1= 4.2 Sd2 =3.8

n1 n2

Solution (following the steps on hypothesis testing)

Step 1. Ho: ˉx 1=ˉx 2 (Claim) ; There is no significant difference between the mean survival of media A and
Media B.
. Ha: ˉx 1≠ ˉx 2: There is a significant difference between the mean survival of media A and
Media B.
Step 2. The level of significance is α =1% level.
Step 3. Z-test (two sample case or two means). This is a two-tailed sided test (. Ha: ˉx 1≠ ˉx 2 ) .
Step 4. Compute the test value using the formula for an unknown population standard deviation (σ) with n>30.
ˉx1−ˉx 2
Zc =
√ s 12 ˉ s 22
+
n1 n2

94−90
Zc =
√ 4.22 ˉ 3.82
55
+
55
= 5.24

Step5. Determine the rejection or critical value from table 5.2.1. Take note that this is two
tailed test so, the level of significance should be divided by 2(α/2).

Ztab .oo5 = 2.58

Step 6. Decision rule and Conclusion.


Since zc = 5.24 is greater than Ztab .oo5 = 2.58, we need to reject the null hypothesis and
accept the alternative hypothesis.

Therefore, we conclude that there is significant difference between the survival of cardboard
not true.In this case , the mean survival of cardboard tissue culture using media A is significantly
higher than using tissue culture media B.

Summary:

The following is the requirements to be consider in employing z-test.

1.When we want to test significant difference between the population mean (μ) and the sample
mean.

2. When sample size is large (n≥ 30).

3. When the population (N) is normally distributed.

4. When the population variance (σ2) or population standard deviation (σ) is known. However, if
the population standard deviation is not known, a z- test is still applicable provided that the
sample size is sufficiently large (n>30) and the distribution of the sample data are normally
distributed.

5. When the samples are independent or taken at random.


Activity:

Show your solution following the 6 steps in hypothesis testing.

1.The mean yield of rice per hectare in Mindanao was established as 4 tons with a standard deviation of
350 kgs. A group of agriculture students from the college of agriculture of a certain SUC claims that the
mean harvest this year is less due to unfavorable weather conditions. A sample of randomly selected
100 hectares averages 3,750 kgs per hectare. Test the hypothesis that the mean yield this year is no
different from then established mean using 1% level of significance. Assume that the population is
normal.

2. Suppose that the standardized test for College Biology exists with a mean of 125 and standard
deviation of 8. A random sample of 40 college students from a normal population takes this
standardized test, and the resulting mean is 121. Do the randomly selected students perform below the
normal group? Use alpha 1% level.

Lesson 5.3 t-test and Analysis

Overview:

A t-test is a type of inferential statistic used to determine if there is a significant difference between the
means of two groups, which may be related in certain features. A t-test is used as a hypothesis testing
tool, which allows testing of an assumption applicable to a population.

Objectives:

1.Identify the requirements and characteristics of z-test.

2.Compute and process the data in z-test.

3.Follow the steps of hypothesis testing in z-test analysis.

4.Analyze the z result of the data.

Context:

t-test: Testing the Differences between the Population Mean and the Sample Mean .
In applying t-test for comparing two means, certain requirements or assumptions should be satisfied
such as:

1. The population must be at least approximately or nearly normally distributed.

2. The population should be independent ( samples are taken at random)

3. The population variances are homogeneous or equal.

4. The samples are small, less than 30 (n<30).

5. The population standard deviation (σ) is unknown ( hence, the sample standard deviation is
used instead).

6. Interval or ratio scale of measurement is use.

t-test formula:

(ˉx −μ)
t= s
√n
Ex. The average length of time for people to vote using the old procedure during the
presidential election in precinct A is 55 minutes. Using computerization as a new election
method, a random sample of 20 registrants was used and found to have a mean length of
voting time of 30 minutes with standard deviation of 1.5 minutes. Test the hypothesis in which
the population mean is greater than the sample mean in 5% level of confidence.

Solution: 1. Ho: ˉx=¿μ (There is no significant difference in the voting length of traditional and
computerization method.)

Ha: ˉx<¿ μ (The Traditional method of voting have longer length of voting than the
computerization method)
2. α = 5%; one-tailed

3. t – test is the appropriate test statistics

(ˉx −μ)
4. Computation: t= √n
sd
Sample mean= 30 min.

μ = 55 min.

s d= 1.5 min.

n = 20
(ˉx −μ) (30−55)
t= √n t= √ 20 tcomputed= 29.82
sd 1.5

5.Determine the critical value or the tabular value (from percentage points of the t distribution table)
df=n-1; df=20-1=19, one-tailed in 5% level of significance.
ttab = 1.729 or refer as t- critical value
6.Decision rule:
Since the absolute value of the computed t (tc = 29.82) is greater than the absolute value of t-critical value,
which is ttab0.05,df=19 =29.82, therefore Reject the Ho and accept the Ha.
Statistical analysis: There is significant difference between the means.
When the two samples are drawn from normally distributed populations with the assumption that their
variances are equal, the t-test with the following formula will be used:

ˉx1−ˉx 2
t= √¿ ¿ ¿

where:
ˉ x 1 , ˉ x2 - means
n1, n2 - sample sizes
2 2
s1 , s2 -variances

Example:
A course in physics is taught to 10 students by the traditional method. Another group of 11 students was given
the same course by means of another method. At the end of the semester, the same test was administered to
each group. The 10 students under method A made an average of 82 with a standard deviation of 5, while the 11
students under method B made an average of 78 with a standard deviation of 6. Test the null hypothesis of no
significant difference in the performance of the two groups of students at 5% level of significance.

Solution:
1. Ho: ˉ x 1 ,=ˉ x 2 , (There is no significant difference between the average scores of the two groups of
students)
Ha: ˉ x 1>ˉ x 2 , ( The mean score of the first group is higher than the mean score of the second group)
2. Α = 5%; one- tailed
3. Use the t- test as test statistics.
4. Computation:

ˉx−ˉx
t= √¿ ¿¿

82−78
t=
√¿¿¿

4
t= =1.65
2.4245
5. Df=n1 +n2 -2 , df= 10=11-2= 19 ; ttab.05,df=19 = 1.729

6. Tc=1.65 > t tab = 1.729, Since t tabular value is higher than the tcomputed, Null
hypotheis (Ho)is accepted.

Analysis: The difference between the means is not significant at 5% level of significance . It implies that method
A is as effective as method B.

t-test on the Significance of the Difference Between Two Correlated Means


When comparing two correlated means, the t-test is the appropriate test statistics. A
typical example is comparing the results of the pre-test and the post – test administered
to a group of individuals. The two tests must be the same.

d
n
t= sd
√n

where: d= difference between the pre-test and post-test scores

n= number of observations

sd = standard deviation of the differences

Example:
To determine whether the student’s performance in college algebra will improve after enrolling in the subject
for one term at i% level of confidence, a 60 item pre-test and post test are administered to them on the first day
and last day of classes respectively, the same test is given as pre-test and post-test. The are as follows:
Student Pre-test score Post-test Score Difference, d d2
A 34 45 -11 121
B 23 32 -9 81
C 40 46 -6 36
D 31 57 -26 676
E 24 39 -15 225
F 45 48 -3 9
G 27 27 0 0
H 32 33 -1 1
I 12 18 -6 36
J 45 45 0 0
2
∑d = 77 ∑ d =1185
Solution:

1. Ho: Student’s performances in Algebra did not impove. (μ1 = μ2)

Ha: Student’s performance in algebra did improve. . (μ1 < μ2)

2. α = 1 % ; one- tailed
3. t-test will be used

4. Computations:

d
n
t= sd
√n

*Compute the Sample variance:

n ( ∑ d 2 )− (∑ d ) 2
Sd2 =
n(n−1)

10 (1185 )−(−77 ) 2
= =65.79
10 (9)

Sd = 8.1111

d/n = -77/10 = -7.7

√n = 3.33

−7.7
t= 8.1111 = 3.16
3.33

5. df= n-1= 10-1 =9, ttab .01,df=9 =2.821 (one-tailed)

6. Reject the Ho, because the computed value (tc=3.16) is greater than the
tabular value (ttab =2.821). And accept the Ha.

Analysis: there is significant difference in the performance of students in


algebra. It means the performance of the students in college algebra
significantly improved.

Summary:

t-test is utilized to determine the significant difference of 2 variables , certain


requirements or assumption should be satisfied

1. The population must be at least approximately or nearly normally distributed.


2. The population should be independent ( samples are taken at random)

3. The population variances are homogeneous or equal.

4. The samples are small, less than 30 (n<30).

5. The population standard deviation (σ) is unknown ( hence, the sample standard deviation is
used instead).

6. Interval or ratio scale of measurement is use.

Activity:

1. The data below are assumed to be the result of an experiment on the culture of mud crab Scylla
serreta in the fishpond with and without pellets as supplemental feed. Find out if there is
significant difference in kilogram weight at 5% level of significance. Note: compute the standard
deviation of without pellet (control )and with pellet ( experimental group ). Complete the table
and follow the 6 steps in hypothesis testing of t-test. Consider the formula of sd below.

Control Sd=

√ ∑(x−ˉx)2
n−1
Experimental Sd=

√ ∑(x−ˉx)2
n−1

2. Use t-test to determine if there is a significant difference between the scores of the students in
Statistics in the pre-test and post test. Use 1% level of significance.

Method
Scores

Pre-test 4 4 4 5 5 5 5 5 5 4
5 4 0 6 5 2 1 0 3 7

Post test 4 5 5 5 6 6 5 5 5 5
8 5 1 9 4 2 4 2 4 0
Chapter V1. ANALYSIS OF VARIANCE

Overview:

The analysis of variance (ANOVA) is a method for dividing the variation observed in
experimental data into different parts; each part assign able to a known source, cause,
or factor. If investigator relates different parts of the variation to particular causal
circumstances, experiments must be designed to allow this to occur in a methodical
manner. The analysis of variance is inextricably associated with the design of the
experiments. It is used to determine the significance of the difference between the
means of a number of different populations. Analysis of variance is the most common
statistical tool used in experimental researches.

Objectives:

1. Identify the characteristics of particular F-test.

2. Determine when to use a certain F-test design and analysis.

3. To interpret the statistical analysis of F-test.

4. Appreciate the importance of F test in the statistical endeavor.

Lesson 6.1. One- Way Analysis of Variance ( CRD )

Overview:

A one- way ANOVA F-test involves one independent variable as a basis for classification.
This is usually applied in complete randomized design ( CRD ). Experiments where only a
single factor varies while all others are kept constant are known as single-factor or one-
way classification experiments. In such experiments, the treatments consist solely of
the different levels of the single variable factor

Objectives:

Content:

Common terms used in experimental designs:


a. Experiment- a planned inquiry to obtain new facts or confirm or deny the results of a
previous experiment.

b. Treatment- denotes any procedure the effect of which is to be measured and


compared with the effects of other treatments. E.g. medicinal plants, concentration
of extract, time of exposure,etc.

c. Experimental unit- is the unit of material to which one application of treatment is


applied, e,g., bacteria, fungi, tomato,fish, etc.

d. Experimental Design- includes the plan and the actual procedure of laying out the
experiment The three basic principles involved in designing experiments. They also
known as the elements of Experimental Design

1. Replication- refers to the repetition of the same experiment at a given period of


time. A treatment is said to be replicated when it is applied to a number of
experimental units in the experiment. The functions of replication are to:

 Provide an estimate of experimental error used for tests of significance.

 Improve the precision of the experiment by reducing the standard error


of the mean;

 Increase the scope of inference of the experiment;

 Effect control of error variance.

2. Randomization-refers to the manner in which the treatments are assigned to the


experimental units.

3. Local or Error control- Experimental variance or error is the measure of the


variation or difference among experimental units/ plots treated alike.

h. Blocking- putting experimental units that are homogeneous or similar togethere as


much as possible, in the same group and assigning all treatments into each group
separately and independently.

Types of research based on the number of Factors Involved

1. Single- factor experiment

2. Two-factor experiment
3. Three-factor experiment

4. Four or more factor

Completely Randomized Designed (CRD)

CRD is one whire the treatments are assigned completely at random so that each experimental
unit has the same chance of receiving any one treatment. For the CRD, any difference among
experimental units receiving the same treatment is considered as experimental units, such as laboratory
experiments where environmental effects are relatively easy to control.
. acturer of bicycle tires has developed a new design that he claims has an average life span of 5 years with
a standard deviation of 1.2 years. A dealer of the product claims that the average life span of 150 samples of the
tires is only 3.5 years. Test the difference of the population and sample means at 5% level of significance.
Solution: follow the steps of hypothesis testing

Signicance level
To test the null hypothesis, one must set the level of significance first. The level of significance is the probability of
making type 1 error. And it is denoted by the symbol α. Type 1 error is the probability of accepting the
alternative hypothesis Ho when in fact the null hypothesis Ho is true. The Type II error and it is denoted by the
symbol β . The most commonly used level of significance is 5%.

One tailed and two tailed tests


A test is called a one-tailed test if the rejection lies on one extreme side of the distribution and two- tailed if the
rejection lies on one extreme side of the distribution and two tailed if the rejection region is located on both ends
of the distribution.

Since the alternative hypothesis Ha is formulated to be different from the Ho , then we can consider these three
types of alternative hypothesis
One- tailed
a. Ha: M¿ Mo
(one-tailed/ sided test to the right

Non-Rejection
Region
Rejection Region

ẋ Z = 1.645

Lesson 3.4. COMPUTER APPLICATION USING MS EXCEL


Introduction:
Statistics is a mathematical science involving the collection, interpretation,
measurement, enumerations or estimation analysis, and presentation of natural or social
phenomena, through application of various tools and technique the raw data becomes
meaningful and generates the information’s for decision making purpose. It is the systematic
arrangement of data and information exhibits their inner relation between the things. Now
statistics holds a central position in almost every field of research like Industry, Commerce,
Trade, Physics, Chemistry, Economics, Mathematics, Biology, Botany, Psychology,
Astronomy, management of decision making etc. in this lesson, we will discuss on the
statistical tools with the use of computer application specifically the Microsoft Excel which
can help the calculation and interpretation of data in a very efficient and effective manner.

At the end of the lesson, you should be able to:

1. identify the steps in computing descriptive statistics using MS Excel; and


2. appreciate the ease of computing problems with the aid of computer.

STEPS ON HOW TO COMPUTE THE DESCRIPTIVE STATISTICS USING


MS EXCEL ANALYSIS TOOLPAK:

Step 1: Type your data into Excel, in a single column. For example, if you have ten
items in your data set, type them into cells A1 through A10.

Step 2: Click the “Data” tab and then click “Data Analysis” in the Analysis
group.

Step 3: Highlight “Descriptive Statistics” in the pop-up Data Analysis window


and click “OK”.
Step 4: Type an input range into the “Input Range” text box. For this
example, type “A1:A10” into the box.

Step 5: Check the “Labels in first row” check box if you have titled the
column in row 1, otherwise leave the box unchecked.

Step 6: Type a cell location into the “Output Range” box. For example, type
“C1.” Make sure that two adjacent columns do not have data in
them.

Step 7: Click the “Summary Statistics” check box and then click “OK” to display Excel
descriptive statistics. A list of descriptive statistics will be returned in the
column you selected as the Output Range.

Example:
The following examples will illustrate how MS Excel will help you resolve exercises in a very
convenient way.

Consider the scores obtained by 20 students in their college entrance


test.

Student Score Student Score


A 81 K 90
B 90 L 90
C 78 M 90
D 65 N 88
E 85 O 76
F 79 P 94
G 80 Q 97
H 81 R 73
I 74 S 72
J 75 T 69
Solution

Step 1. Input the Given


Data
to the Excel

Step 2 & 3

Click Descriptive Statistic Click Data Analysis


Click Data
Step 7 & 8

Click “OK”

Type a cell into the


input range
“B2:B21”

Check the column


Check the Type a cell into the Check the or rows
“Labels in first row” output range “Summary Statistics”
“D4”
TABLE 4.1 Results of the Statistical Function

Statistical Function Results

Mean 81.35
Standard Error 1.973875429
Median 80.5
Mode 90
Standard Deviation 8.827439278
Sample Variance 77.92368421
Kurtosis -0.88771227
Skewness 0.02342738
Range 32
Minimum 65
Maximum 97
Sum 1627

Count 20

You can use other statistical functions to solve other problems like z-test, t-test,
correlation, regression, and many more.

You might also like