Biostat - Descriptive Statistics - Lec 2

1st SEMESTER | 2022-2023 SEPTEMBER 5, 2022
LESSON 2: DESCRIPTIVE STATISTICS

LECTURER: Mr. Lordiel M. Miasco, LPT
Why study statistics?

TOPIC OUTLINE - a good working knowledge of statistics
is useful for:
INTRODUCTION ➢ summarizing and analyzing data to
provide information that is useful
ORGANIZING DATA and is supportive of decision
making;
NUMERICAL MEASURES ➢ making valid comparisons and to
predict the outcomes of decisions;
➢ making professional and personal
decisions;
INTRODUCTION
➢ understanding the world and to be
What is meant by statistics? conversant in whatever career you
o STATISTIC are in.
- Is a number used to communicate a
piece of information What are the uses and application of statistics?
- e.g. inflation rate is 2% or your grade - Statistical techniques are applied and
point average is 2.5 or the price of a new used in all fields of research like in
iPhone is P70,000 marketing, accounting, business,
- each of these statistics is a numerical quality control, politics, sports, health
fact and communicates a very limited administration, education, etc.
piece of information that is not very - Statistical techniques or tools are used
useful by itself. However, if we recognize extensively in quantitative research
that each of these statistics is part of a
larger discussion, then the question of TYPES OF STATISTICS
“What is Statistics?” is applicable. - When we use statistics to generate
information for decision making from
o STATISTICS data, we use either:
- is a set of knowledge and skills used to ➢ Descriptive statistics
organize, summarize, and analyze data. ▪ It is the method of
- e.g. the inflation rate for the calendar organizing,
year was 0.7% summarizing, and
- by applying statistics, we could providing a description
compare this year’s inflation rate to the of the sample data in
past observations of inflation if it is an informative way.
higher, lower, or about the same. ➢ Inferential statistics
- this shows that statistics is more than the • Making inferences,
presentation of numerical information. hypothesis testing,
- it is about collecting and processing determining
information to create a conversation, to relationships, and
stimulate additional questions, and to making predictions.
provide a basis for making decisions. - Their applications depends on the
- specifically, we define statistics as the questions asked and the type of data
science of collecting, organizing, available.
presenting, analyzing, and interpreting - Sometimes, masses of unorganized
data to assist in making more effective data such as the census of the
decisions. population, are of little value as it is.
However, descriptive statistics can be
TAGUIBAO | BSMT 2D
used to organize data into a EXAMPLE: Scores of 20 students on a 20-item
meaningful form. quiz: 15, 8, 9, 14, 18, 7, 17, 3, 15, 19, 7, 20, 13, 15,
18, 15, 2, 9, 2, 5
o DESCRIPTIVE DATA ➢ Grouped data
• it is organized or
EXAMPLE 1: A poll found that 49% of the people processed data that is
in a survey knew the name of the first book of the presented in a
Bible. The statistic 49 describes the percentage frequency table.
(proportion) of persons who knew the first book EXAMPLE: Rating of 70 students on their
of the Bible. performance task.
- In the example, the statistic 49 is simply RATING FREQUENCY
the proportion of persons. We are not Superior 6
drawing conclusions but just describing Good 28
the percentage. Average 21
Poor 12
EXAMPLE 2: According to consumer reports, Inferior 3
Sharp washing machine owners reported 9
problems per 100 machines in 2011. The statistic
9 describes the number of problems out of every ORGANIZING DATA
100 reported machines. - we use statistics and analytics to
- The statistics being used here simply summarize and analyze data and
describes a proportion of 9 out of every information to support our decisions.
100. There is no conclusion being made, And with this, graphical representation
only the actual data is being described. of data is important to easily visualize the
amount of data that we process.
“Sometimes, we must make decisions with a - Descriptive statistics organize data to
limited set of data. For example, we would like to show the general pattern of the data to
know the operating characteristics such as fuel identify where values tend to
efficiency measured by km/L of SUV’s currently concentrate and to expose extreme
in use. If we spent a lot of time, money, and effort, data values.
all the owners of SUV’s could be surveyed. In this
case, our goal would be to survey the population CONSTRUCTING FREQUENCY TABLES
of SUV owners. However, based on Inferential - A frequency table is a grouping of
statistics, we can survey a limited number of SUV qualitative data into mutually exclusive
owners and collect a sample from the and collectively exhaustive classes
population. Samples are often used to obtain showing the number of observations in
reliable estimates of population parameters. In each class.
the process, we can save time, money, and EXAMPLE: Suppose that you are a car sales
effort in collecting the data.” agent, and you want to summarize last month's
sale by location.
TYPES OF DATA
- “datum” is the singular form and it is just
one information.
- “data” is the plural form and it is a group
of information, facts, observations,
statistics, records, and reports.
- Data can be classified into two:
➢ Ungrouped Data
• Raw and unorganized
information
TAGUIBAO | BSMT 2D
➢ The first step is to sort out the we can describe profit by using
vehicles sold last month frequency distribution.
according to their location.
➢ Then tally or count the vehicles What is a frequency distribution?
that are sold in each of the four - It is a grouping of quantitative data into
locations. The four locations are mutually exclusive and collectively
used to develop a frequency exhaustive classes showing the number
table with four mutually of observations in each class.
exclusive or distinctive classes.
Mutually exclusive classes How do we develop a frequency distribution?
means that a particular vehicle - The following examples will show the
can be assigned to only one steps on how to develop a frequency
class. In addition, the frequency distribution.
table must be collectively - Our goal is to summarize quantitative
exhaustive that is every vehicle variable profit with a frequency
sold last month is accounted for distribution and display it using charts
in the table. If every vehicle is and graphs. With this information, we
included in table, then the can easily answer the following
table is collectively exhaustive, questions:
and the total number of ➢ What is the typical profit for
vehicles will be equal to all each sale?
vehicles sold last month. ➢ What is the largest or maximum
profit for any sale?
What is a relative class frequency? ➢ What is the smallest or minimum
- You can convert class frequencies into profit for any sale?
relative class frequencies to show the ➢ Around what value do the
fraction of the total number of profits tend to cluster?
observations in each class.
- A relative frequency captures the
relationship between a class frequency
and the total number of observations.
EXAMPLE:
➢ In the example we want to

know the percentage of the
total car sold at each of the four
locations.
➢ To convert a frequency table to
a relative frequency table, ➢ We show the profits of each of
each of the class frequencies is the 180 vehicle sales. Here we
divided by the total number of have a data set of 180 vehicle
observations. sales. Here you can see that the
➢ Now suppose you want to information is raw or ungrouped
summarize last month's sale by because it is just simply a listing
profit earned by each vehicle, of the individual observed
profits.
TAGUIBAO | BSMT 2D
➢ And it is possible to search the *in practice, the interval size is rounded up to
list to find the smallest or some convenient number such as a multiple of
minimum profit at $294. As for 10 or 100. For the example, we will round up the
the largest or the maximum class interval to 400 because it is a reasonable
profit would be $3292. Those are choice to round off to multiple of 100.
just the two we can see or
search in the table. Set the individual class limits.
➢ It is difficult to determine the We state clear class limits so that we
typical profit or to visualize can put each observation into only
where the profits tend to cluster. one category. This means we must
➢ To do this, the raw data must be STEP 3 avoid overlapping or unclear class
summarized with a frequency limits. We must always remember
distribution table to be easily that the data must be mutually
interpreted. exclusive which means that it can
only belong in one class.
STEPS ON HOW TO MAKE A DISTRIBUTION TABLE: Number of classes = 8
Decide on the number of classes. Class interval = 400
STEP 1
To do this, we use the rule 2k. *we base the lower limit
In the example, there were 180 vehicles sold. of the first class from the
n= 180 minimum value of
k = 7 (7 classes) datum in our set data.
27 = 128 (less than 180) Since our minimum
*we could not use 7 as our k because it gives us datum is 294, we use
a result less than the total vehicles sold which is 200 as our lower limit.
180. We will try a number greater than 7 to give Since our class interval
us a result greater than 180. is 400, we add this to the
lower limit which will give us 600 as the upper
k = 8 (8 classes) limit in the first class. For the rest of the classes,
28 = 256 (greater than 180) you continue to add 400. From 600, add 400, you
get 1000 which is for the second class. From 1000,
Determine the class interval. add 400, you get 1400 for the third class and so
Generally, the class interval is the on and so forth. Remember to check if the
same for all classes. The classes maximum value fits in a class. Since we have
taken altogether, must cover at decided that there are 8 classes, take note on
least the distance from the minimum the 8th class if your maximum value should fit
value in the data up to the there.
STEP 2
maximum value. We have the
formula: Tally.
Tally the vehicle profits into classes
and determine the number of
STEP 4
i = class interval observations in each class. This is just
k = number of classes basically tallying the data into which
Minimum value = 294 class they belong or fall under.
Maximum value = 3292
k = 8 classes
i=
TAGUIBAO | BSMT 2D
When the profits are tallied, the table will appear the lower limit and the upper limit of
like this: each class.
→ The class midpoint can be determined
by adding the lower limit and the upper
limit and the answer to that would be
divided to 2. For Class 1, we add 200 and
600, we get 800. 800 would be divided
into 2 so the class midpoint of class 1
would be 400. The class midpoint of 400
best represents the profit for eight
vehicles in Class 1. The largest
*the number of observations in each class is concentration or the highest frequency
called the frequency. For the first class which is of vehicles sold is within the class of
from 200 up to 600, the frequency would be 8. In $1800 to $2200 or the 5th class with the
the second class, which is 600 up to 1000, the frequency of 45. The class midpoint of
frequency would be 11. Since the total number the 5th class will be determined by
of profits is 180, then the frequency should sum adding $2200 and $1800 which will give
up to 180. us $4000 we divide this by two then we
get the class midpoint of $2000.
FREQUENCY DISTRIBUTION TABLE Therefore, the typical profit with the
class the has the highest frequency
would be $2000.
RELATIVE FREQUENCY DISTRIBUTION TABLE

- With the use of frequency distribution,
we can make a clear presentation and
summary of last month’s profit data.
- With relative frequency we can be able
to determine the proportion of sales in
each class.
*now we have made the frequency distribution

table. We have organized the data into a
frequency distribution wherein we can
summarize the profits of the vehicles. From the
table above, we can observe the following:
→ The profits from vehicle sales range
- We may want to know what percentage
between $200 to $3400.
of the vehicle profits are in the 1000 up
→ The vehicle profits are classified using a
to 1400 class. To convert a frequency
class interval of 400.
distribution to a relative frequency
→ We can see that the profits are
distribution, each of the class
concentrated between $1000 and $3000
frequencies is divided by the total
which is basically between the 3rd class
number of observations which is 180. So,
and the 7th class. This will give us a total
the relative frequency of the class of
of 157 vehicles which is 87% of the
1000 up to 1400 is 0.128. We can get this
vehicles falls under this range.
by dividing the frequency of that class
→ For each class, we can determine the
which is 23 over 180 then there we get
typical profit or as what we call “the
0.128. Therefore, in the class 1400 the
class midpoint”. This is halfway between
percentage of the vehicle profits is
12.8%.
TAGUIBAO | BSMT 2D
enrollment of all medical technology
NUMERICAL MEASURES
students at Velez College.
- This section is concerned with two - For raw data—that is, data that have
numerical ways on describing not been grouped in a frequency
quantitative variables namely: distribution—the population mean is the
➢ Measures of location sum of all the values in the population
• this is often referred to divided by the number of values in the
as “averages”. population.
• The purpose of this is to
pinpoint the center of
the distribution of data. To find the population mean, we use the formula:
• An average is a
measure of location. It
shows the central value
of the data. If we only
consider the measures
of location in the data, where:
we may draw an μ = represents the population mean. It is the
erroneous conclusion. Greek lowercase letter “mu.”
➢ Measures of dispersion N = is the number of values in the population.
• This is often called “the x = represents any particular value.
variation or the spread Σ = is the Greek capital letter “sigma” and
in the data.” indicates the operation of adding.
• To describe the Σx = is the sum of the x values in the population.
dispersion, we would - Any measurable characteristic of a
consider the range, the population is called a parameter. The
variants, and the mean of a population is an example of
standard deviation. a parameter. PARAMETER is a
o MEASURES OF LOCATION characteristic of a population.
- there is not just one measure of location, in fact
there are many. We will consider four measures EXERCISE:
of location and the following are:
1. Arithmetic mean
• This is commonly called
the “mean” or
“average”
• This is the most widely ➢ This problem is considered a
used and widely population because we are
reported measure of considering the number of exits
location on the I-75.
• We study the mean as ➢ We add the distances between
both a population the 42 exits which is a total of
parameter and a 192 miles.
sample statistic. ➢ To find the arithmetic mean, we
➢ Population mean divide this total by 42.
- Many studies involve all the values in a
population. For example, there are 1000
medical technology students enrolled
at Velez College. This is a population
value because we considered the
TAGUIBAO | BSMT 2D
➢ Sample mean → To compute a mean, the data
- We often select a sample of the must be at the interval or ratio
population to estimate a specific level. Ratio level data include
characteristic of the population. such data as ages, income,
EXAMPLE: A quality assurance department and weight.
needs to be assured that the amount of → All the values are included in
orange marmalade in a jar labeled 12 computing the mean.
ounces contains the amount. But it would be → The mean is unique. That is
very expensive and time consuming to there is only one mean in a set
check the weight of each jar. Therefore, a of data.
sample of 20 jars is selected and the mean → The sum of the deviations of
of the sample is determined, and the value each value from the mean is 0.
is used to estimate the amount in each jar. e.g.
- For raw data—that is, data that have
not been grouped in a frequency
distribution— the mean is the sum of all
the sample values and then it is divided
by the total number of the sample *the mean of 3, 8 and 4 is 5. If we subtract the
values. mean from the data, such as 3 minus 5 we get 2,
8 minus 5 we get 3, and 4 minus 5 we get -1. If we
add all these together, they said that the sum of
the deviations of each value is equal to 0.
- To find the mean of a sample, we use • We can now consider the mean as the
the formula: balance point of the data. However, the
mean does have a weakness.
Considering that the mean is the sum of
all the values in a set of data, if one or
two values are extremely large or
where: extremely small compared to the
x̄ = represents the sample mean. majority of the data, the mean might not
n = is the number of values in the sample. be an appropriate average to represent
x = represents any particular value. the data.
Σ = is the Greek capital letter “sigma” and
indicates the operation of adding. 2. Median
Σx = the summation of the x values. • We have stressed that one or two data
- Any measurable characteristic of a is extremely large or extremely small, the
sample is called a “statistic.” arithmetic mean might not be a
representative. The center for such data,
EXERCISE: is better described by a measure of the
location called the median.
• The median is the midpoint of the values
after they have been ordered from the
minimum to the maximum values.
• The data must be atleast an ordinal
level of measurement.
• It is not affected by extremely low or
extremely high values.
1.A. PROPERTIES OF ARITHMETIC MEAN

- The arithmetic mean has several
different properties:
TAGUIBAO | BSMT 2D
EXAMPLE: Suppose that you are seeking to buy a ➢ The arithmetic mean of the two
condominium in California. medial observations gives us
the median hours.
➢ The median is found by
averaging the two middle
values. The middle values are 5
hours and 7 hours. And the
mean of these two values is 6.
➢ So, we conclude that adults
Your real estate agent says that the typical price spend 6 hours every month in
of the units currently available is $110,000. Would Facebook.
you still want to look if your budgeted price is at
$75,000? This may be out of your price range 2.A. PROPERTIES OF MEDIAN
however, looking at the price of the individual → It is not affected by extremely large or
units you found out that the prices are $60,000, extremely small values. Therefore, the
$65,000, $70,000, $80,000, and the Super deluxe median is a valuable measure of
penthouse unit is worth $275,000. The arithmetic location when such values do occur.
mean price is $110,000 as what your real estate → It can be computed for ordinal level
agent reported. But the $275,000 price is pulling data or higher. Ordinal level data can
the arithmetic mean upward causing it to be an be ranked low to high.
unrepresentative average. It does seem that the
price around $70,000 is a more typical 3. Mode
representative average. In cases like this, the • The mode is especially useful for
midpoint gives a more accurate measure of summarizing nominal level data.
location. EXAMPLE:
• For larger data sets, finding the median
is by manually listing the values in
ascending or descending order would
prove to be difficult. In such case, a
formula for the median is given:
A company has developed 5 bath oils. The bar

where: chart shows the results of a marketing survey to
n = total number of data in the set find which bath oil consumers prefer. The largest
EXERCISE: number of respondents favored Lamoure as
evidenced by the highest bar and it is
considered the mode.
➢ First step is to order the number

of hours according to
ascending order.
➢ And then we identify the two
middle times.
TAGUIBAO | BSMT 2D
EXERCISE: Conversely for some data sets,
What is the modal distance of the following there is more than one mode.
values shown below:
SYMMETRY AND SKEWNESS
- It has two types:
➢ Normal or Symmetrical
Distribution
➢ The first step is to organize the

distances into a frequency
table. This will help us determine
the distance that occurred most
frequently.
- It is a symmetrical distribution, and it is
also mound shaped. This distribution has
the same shape at either side from the
center.
- For any symmetrical distribution, the
mean, median, and mode are located
at the center and are always equal.
- Logically, any of the three measures
would be appropriate to represent the
center.
➢ Nonsymmetrical or Skewed
➢ The distance that occurs most Distribution
often is 1 mile. This happened 8 - The distribution is skewed if the values of
times. That is that there are 8 each of the three measures are
exits with 1 mile apart. So, the different.
modal distance between the • Positively skewed
exits is 1 mile.
➢ In summary we can determine
the mode for all levels of data:
nominal, ordinal, interval, and
ratio.
➢ The mode also has an
advantage of not being
affected by extremely high or
low values.
➢ The mode also has its
disadvantages that is why it is
being used less frequently than
the mean and median. For - The arithmetic mean has the largest
many sets of data there is no value out of the three measures. This is
mode because no value because the mean is influenced by a
appears more than once. number of extremely high values. The
TAGUIBAO | BSMT 2D
median we could have the next largest ➢ The formula of the weighted
value and the mode is the smallest of mean is:
the three measures.
- If the distribution is highly skewed, the
mean would not be a good measure to
use. And the median and mode would
be more representative.
• Negatively skewed
Note: the denominator of the weighted mean is
always the sum of the weights.
➢ Using the formula in the
problem:
EXERCISE:
➢ To find the mean hourly rate, we

- The mean is the lowest of all the three multiply the rate of per hour to
measures. This is because the mean is the number of employees
influenced by a number of extremely earning that rate.
low values. The median is larger than the
mean while the mode is the largest
among the three measures.
Why do we have to study dispersion?
4. Weighted mean - The measures of location only describes
• It is a convenient way to the center of the data. It does not tell us
compute the arithmetic mean anything about the spread of the data.
when there are several - This is to compare the spread in two or
observations in the same value. more distributions.
EXAMPLE: EXAMPLE:
Suppose a restaurant sold medium, large, and
extra-large size softdrinks for 1.84 dollars, 2.07
dollars, and 2.40 dollars respectively. Of the last
10 drinks sold, 3 were medium, 4 were large, and
3 were extra-large. To find the mean price of the
last 10 drinks sold, we can use the similar formula
for the arithmetic mean.
➢ An easier way to find the mean

selling price, is to determine the A computer monitor is assembled in Baton
weighted mean. That is, we Rouge and also in Tucson. The arithmetic mean
multiply each observation by hourly output in both plants is 50. Based on the
the number of times it occurs. two means, you might conclude that the
distributions are identical. Production records for
9 hours of the two plants however revealed that
the conclusion is not correct. The Baton Rouge
TAGUIBAO | BSMT 2D
production varies from 48 to 52 assemblies per by the difference between the
hour. While production at the Tucson plant, is maximum number of 60 and the
more erratic ranging from 40 to 60 per hour. minimum number of 40.
Therefore, the hourly output for Baton Rouge is ➢ Therefore, there is less
clustered near the mean of 50 while for Tucson, it dispersion in the hourly
is more dispersed. production of the Baton Rouge
- We will consider several measures of plant than in the Tucson plant
dispersion. because the range of 4
- The variants and the standard deviation computer monitors is less than
use all the values in the data set and are the range of 20.
based on deviation on the arithmetic • A limitation of the range is
mean. that it is based on only 2
values. It does not take into
o MEASURES OF DISPERSION consideration all the values.
1. Range
• It is the difference between 2. Variance
the maximum and • It measures the mean
minimum values in a data amount by which the
set. values in the population or
• Sometimes, the range is sample vary from their
interpreted as an interval. mean.
• It is the arithmetic mean of
the squared deviations
• It is widely used in from the mean.
production management EXAMPLE:
and control applications
because it is very easy to
calculate and understand.
EXERCISE:
Find the range of the number of computer
monitors produced per hour for the Baton Rouge
and the Tucson plant and interpret the two
ranges.
The chart shows the number of cappuccinos sold

at a Starbucks in Orange County and Ontario
between 4-5PM for the sample of 5 days. We can
see that the three measures for the sets of data
are exactly the same for both locations. We can
get a clear difference of the two sets of data by
calculating the Variance.
➢ The range for the Baton Rouge
➢ For Orange County, the
plant would be 4 which was
variance is 400.
found by the difference
between the maximum number
of 52 and the minimum number ➢ For Ontario, the variance is 370.
of 48.
➢ The range for the Tucson plant
would be 20 which was found
TAGUIBAO | BSMT 2D
➢ If we compare the variance for
Orange County and Ontario,
we conclude that the
dispersion of the sales of the
coffee distribution in Ontario is
more concentrated which
means it is nearer to the mean
of 50.
• The variance shows the closeness or
clustering of the data relative to the
mean or center of the distribution.
• The variance has an important
advantage over the range because it
uses all the values in the computation.
➢ Population Variance
- It is the mean of the squared difference
between each value and the mean.
- The formula is as follows:
where:
σ2 = the population variance. The symbol is a
Greek letter sigma which is read as “sigma - The variance can be used to compare
squared.” the dispersion in two or more sets of
x = value of a particular observation in the observation.
population - The smaller the variance, the more
μ = is the arithmetic mean of the population clustered the data are closer to the
N = number of observations in the population. mean. The larger the variance, the set of
- For populations whose values are near data are scattered far away from the
the mean, the population variance will mean.
be small. ➢ Population Standard Deviation
- For populations whose values are - By taking the square root of the
dispersed from the mean, the variance, we can transform it to the
population variance will be large. same unit of measurement used for the
- The variance overcomes the weakness original data.
of the range by using all the values in the - This is the square root of the population
population. variance.
- The formula for this is as follows:
EXERCISE:
The number of traffic citations issued last year is
posted. Determine the population variance.
➢ Sample Variance
TAGUIBAO | BSMT 2D
where: - This is just simply the square root of the
s2 = the sample variance. sample variance.
x = value of a particular observation in the - The formula of this is as follows:
sample
x̄ = is the sample mean.
n = number of observations in the sample.
- The conversion from the population

variance to the sample variance is not
as direct. It requires a change in the
denominator. The denominator is n-1 for
the sample variance instead of only n.
- The use of n-1 in the denominator
provides the appropriate correction for ➢ Since this is the sample variance
this tendency. The primary use of sample of the example, we will just
square root this 10 dollars which
statistics like s2 is to estimate population
will give us 3.16 dollars as the
parameters like σ2 . So, n-1 is preferred sample standard deviation.
to describe the sample variance.
INTERPRETATION AND USES OF STANDARD
EXERCISE: DEVIATION
➢ The first step is to find the - The standard deviation is used to

average or the mean. compare the spread in 2 or more sets of
observations.
THE EMPIRICAL RULE
➢ It is 10 dollar squared because

this is still the computation for
the variance.
➢ Sample Standard Deviation

- It is used as an estimator of the - The distribution of values can have any
population standard deviation. shape. However, for a symmetrical, bell-
TAGUIBAO | BSMT 2D
shaped frequency distribution, we can - One method is to determine the
be more precise in explaining the location of values that divide a set of
dispersion about the mean. These observations into equal parts.
relationships involving the standard - These measures include:
deviation and the mean, are described ➢ Quartiles
by the Empirical Rule sometimes called • These are values of an
the “Normal Rule.” ordered data set (min.
- The empirical rule states that for a to max.) that divide the
symmetrical, bell-shaped frequency data into four intervals.
distribution, approximately 68% of the • The first quartile usually
observations lie within plus & minus one labeled as Q1, is the
standard deviation of the mean; about value below in which
95% of the observations will lie within plus 25% of the observations
& minus two standard deviations of the occur.
mean; and practically all 99.7% will lie • The third quartile
within plus & minus three standard usually labeled as Q3, is
deviations of the mean. the value which 75% of
- The relationships are found in the picture the observations occur.
above with a mean of 100 and a ➢ Deciles
standard deviation of 10. • These are values of an
- Applying the empirical rule where in it is ordered data set that
symmetrical and bell-shaped, divide the data into 10
practically all the observations lie intervals or equal parts.
between the mean plus and minus three ➢ Percentiles
standard deviations. Thus, if the mean is • These are values of
equal to 100, and standard deviation is unordered data set
equal to 10, practically all the that divide the data
observations lie between 100 and 130 into 100 intervals or
and also between 100 and 70. equal parts.
• Percentile scores are
EXERCISE: frequently used to
The monthly apartment rental rates near a report results of such
university approximate a symmetrical, bell- national standardized
shaped distribution. The sample mean is $500; test such as NMAT and
the standard deviation is $20. LSAT.
Using the Empirical Rule, answer these questions: - To formalize the computation of
1. About 68% of the monthly rentals are procedure,
between what two amounts?
- Between $480 and $520
2. About 95% of the monthly rentals are
between what two amounts?
- Between $460 and $540 EXERCISE:
3. Almost all of the monthly rentals are Morgan Stanley Is an investment company
between what two amounts? with offices located throughout the United
- Between $440 and $560 States. Listed below are the commissions
earned last month by a sample of 15 brokers
o MEASURES OF POSITION at the Morgan Stanley office in Oakland CA.
- There are other ways to describe the Locate the median, the first quartile, and the
variation and spread in a set of data. third quartile for the commissions earned.
TAGUIBAO | BSMT 2D
We know that 5.25 is between
the 5th and 6th position. What
we should do is to calculate the
distance between the 5th and
6th position and the result is 29.
With the result of 29 we will
➢ The first step is to arrange the
multiply it with .25 so that we will
data into the smallest
get 7.25. We will then add 7.25
commission to the biggest one.
to the 5th position which is
$1758. Therefore, the exact
value in the fifth position
➢ The median value is the
considering the 0.25 is $1765.25
observation in the center which
is the same as the 50th
PRACTICE EXERCISES
percentile. So, P is equal to 50.
Substituting the values, L50 = (15
+ 1)50/100 which will result to 8.
The number 8 is the 8th position
from the largest number which
will give us 2,038 dollars as our
median.
➢ To locate the 1st quartile:
We will now get 1,721 dollars

which is located in the 4th
position from the smallest
number, and it is the first
quartile.
➢ To locate the 3rd quartile:
We now got 2,205 dollars which

is in the 12th position from the
smallest number, and it is the
third quartile.
➢ If there will be an instance
where in the given result would
be a decimal in finding the first
quartile:
We will first locate the 5th

position in the order array and
move .25 to get the exact result
or position of 5.25. To get the
position, we must first locate the
fifth position which is $1758 and
the sixth position which is $1787.
TAGUIBAO | BSMT 2D
TAGUIBAO | BSMT 2D
TAGUIBAO | BSMT 2D

Biostat - Descriptive Statistics - Lec 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Biostat - Descriptive Statistics - Lec 2

Uploaded by

Copyright:

Available Formats

1st SEMESTER | 2022-2023 SEPTEMBER 5, 2022

LESSON 2: DESCRIPTIVE STATISTICS

Why study statistics?

➢ In the example we want to

RELATIVE FREQUENCY DISTRIBUTION TABLE

*now we have made the frequency distribution

1.A. PROPERTIES OF ARITHMETIC MEAN

A company has developed 5 bath oils. The bar

➢ First step is to order the number

➢ The first step is to organize the

➢ To find the mean hourly rate, we

➢ An easier way to find the mean

The chart shows the number of cappuccinos sold

- The conversion from the population

➢ The first step is to find the - The standard deviation is used to

THE EMPIRICAL RULE

➢ It is 10 dollar squared because

➢ Sample Standard Deviation

We will now get 1,721 dollars

We now got 2,205 dollars which

We will first locate the 5th

You might also like