Statistics A Review

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 47

STATISTICS

A REVIEW
Learning Outcomes
1. Use a variety of statistical tools to process and manage
numerical data.

2. Use methods of linear regression and correlations to predict the


value of a variable given certain conditions.

3. Advocate the use of statistical data in making important


decisions.
Definition
Statistics is a science of collecting, organizing, summarizing, presenting,
analyzing, and interpreting data to assist in making more effective decisions.

Definition Descriptive Statistics - methods of collecting, organizing,


summarizing, and presenting data in an informative way.

Definition Inferential Statistics - methods concerned with the analysis of a


subset of data leading to predictions or inferences about the entire set of data,
that is, to generalize result beyond the data collected.
Population - entire set of individuals or objects that are being studied.

Sample - portion or part of the population of interest.

Parameter - descriptive measures or characteristics of population.

Statistic - descriptive measures or characteristics of sample.

Census - process of gathering information from every element of the


population.
Survey - process of gathering information from every element of a sample.

Variable - observable characteristics of a person or object that can assume


different values.

Data - measurements or observations for a variable.


Types of Variable
1. Qualitative - non numeric
2. 2. Quantitative - can be reported numerically
Qualitative Quantitative
Civil Status Age
Nationality No. of children
Religion Ounces of ice cream
Degree earned
Gender
Letter Grade
Job
Types of Quantitative
1.Discrete Variables
- can assume values that can be counted
2. Continuous Variables
- can assume an infinite number of values between two
specific values

Levels of Measurement
Definition
Nominal Level - characterized by data that consist of names, labels, or
categories only. The data cannot be arranged in an ordering scheme (such
as low to high).
Example:
Survey responses as yes, no, undecided
Ordinal Level - involves data that may be arranged in some order, but
differences between data values either cannot be determined or are
meaningless.
Example:
Course grades A, B+, B, C+, C, D, F or AF

Interval Level - like the ordinal level, with the additional property that
the difference between any two data values is meaningful. However, there
is no natural zero starting point (where none of the quantity is present).
Example:
IQ, Temperature
Ratio Level - possesses the characteristic of interval level, and there exist
a true zero. Differences and ratios are meaningful.
Example:
Height, Salary, Time

Table 1: Examples of Levels of Measurement


Nominal Level Ordinal Level Interval Level Ratio Level

Zip code Judging SAT score Weight


Gender (1st, 2nd, 3rd) SAI Age
Eye color Rating scale
Political affiliation (Average, good, VG)
Major field
Nationality
METHODS OF COLLECTING DATA
1. Direct or Interview method- this is one of the most effective methods of collecting original
data. To obtain accurate responses, well-trained interviewers may do the interview. The
interviewers can be of great help to the respondents in answering questions that the respondents
could not understand.
2. Indirect or Questionnaire method- this one of the easiest methods of data gathering. It takes
time to prepare because questionnaires need to be attractive. It can include illustrations, pictures
and sketches. Its contents, especially the directions, must be precise , clear and self-explanatory.
3. Registration Method- Through this method, the respondents provide information in
compliance with certain laws, policies, rules, regulations, decrees or standard practices.
Examples are marriage contracts, birth certificates, motor registrations, license of firearms,
registration of corporations, real estates, voters , etc.
4. OTHER METHODS
4.1. Observation- This method utilized to gather data regarding attitudes, behavior, values
and cultural patterns of the samples under investigation.
4.2. Telephone Interview- This method is employed if the question to be asked are brief and
few. An example is the check made on listeners to certain radio programs like asking what
program his radio is turned in to.
4.3. Experiments- This method is applied to collect or gather data if the investigator wants to
control the factors affecting the variable being studied. An example is when the researcher
aims to determine the different factors affecting the academic performance of the students
such as methods or approaches used in teaching, etc.
PRESENTING AND DESCRIBING DATA
1. Textual form-the presentation is in narrative or paragraph form. The data are within the text of
the paragraph. This form may not the immediate interest of the reader. However, it can present a
more comprehensive picture of the data because of further written explanation of its nature.
2. Tabular-makes use of rows and columns like frequency table or frequency distribution. The
data are presented in a systematic and orderly manner, which catches one’s attention and may
facilitate the comprehension and analysis of the data presented.
3. Graphical form- the numerical data provided in a frequency distribution can be made more
interesting and easier to understand when depicted in graphical form. A graph is a pictorial or
geometrical representation of a given data.
Organizing Data
Categorical Distribution
Twenty five inductees were given a blood test to determine their blood
type.

A B B AB O O O
B AB B B B O A
O A O O O AB AB
A O B A
Categorical Frequency Distribution
A B B AB O O O
B AB B B B O A
O A O O O AB AB
A O B A

Categorical Frequency Distribution Table


Class Tally Frequency Percent
A IIIII 5 20%

B IIIII-II 7 28%

O IIIII-IIII 9 36%

AB IIII 4 16%

more people have type O blood than any other type


Grouped Frequency Distribution

Ages of Patients in Hospital A


25 28 27 30 32 25 31 26 29
31 20 21 32 18 50 53 60 50
45 40 37 25 20 27 32 24 29
25 24 10 12 15 28 6 54 30
Grouped Frequency Distribution Table
Class Limits Class Boundaries Tally Frequency

6 – 14 5.5 – 14.5 III 3

15 – 23 14.5 – 23.5 IIIII 5

24 – 32 23.5 – 32.5 20 tallies 20

33 – 41 32.5 – 41.5 II 2

42 - 50 41.4 – 50.5 III 3

51 - 59 50.0 – 59.5 II 2

60 - 68 59.5 – 68.5 I 1

Total 36 36
Data Presentation
Histogram - a bar graph in which the horizontal scale represents classes of data values
and the vertical scale represents frequencies. The heights of the bars correspond to the
frequency values, and the bars are drawn adjacent to each other (without gaps).

Figure 1: Histogram Figure 2: Relative Frequency


Frequency Polygon
- uses line segments connected to points located directly above class
midpoint values.
Ogive
- a line graph that depicts cumulative frequencies, just as the cumulative
frequency distribution.
- uses class boundaries along the horizontal scale and the
graph begins with the lower boundary of the first class and ends
with the upper boundary of the last class.
Figure 3: Ogive
Pie Graph
- used to visually depict qualitative data
- a circle divided into sections according to the percentage of
frequencies in each category of the distribution
Bar Graph
- represents the data by using vertical or horizontal bars whose
heights or lengths represent the frequencies of the data
Time Series Graph
- data that have been collected at different points in time
Line Graph
- used to show trends and increases or decreases in sales, scores,
population per year etc.
Relative Frequency Graph
- also known as percentage frequency
Pareto Chart
- a type of chart that contains both bars and a line graph, where
individual values are represented in descending order by bars, and the
cumulative total is represented by the line
Figure 4: Pareto Chart
Measures of Central Tendency
Measures of Central Tendency
Once data are collected, it is useful to summarize the data set by identifying a value
around which data are centered.

Arithmetic Mean or Average


- numerical balancing point of the data set. It is calculated by adding all the data values
and dividing the sum by the total number of data points.

Sample Mean ()
1 + 2 + 3+ … + n
= =
Population Mean (µ)
1+ 2 + 3+ … + N

µ= =

Example: 6 test scores in Statistics


12 10 15 8 5 18

12 + 10 + 15 + 8 + 5 68
= = = 11.333…
6 6

the mean score for statistic is 11.3


Median (Md)
- simply the middle number in an ordered set of data

if is odd if is even
2
= middle score =

Example:
1. Six students borrowed these numbers of books in a library:
1, 2, 3, 4, 6, 7 (3 and 4 are the middle scores)

= = 3.5
2
2. Five students borrowed these numbers of books in a library:
2, 2, 3, 3, 3
=3

Mode (Mo)
- most frequently occurring number in a data set

Examples:
1. 2 5 8 3 2 2 3 1 (Mode is 2)
2. 1 1 3 5 7 8 3 4 (Bimodal, Mode is 1 and 3)
3. 2 4 6 8 9 3 1 5 (No mode)
Midrange (MR)
- the value midway between the highest and lowest values in the original data
set
highest score + lowest score
MR =
2
Weighted Mean
- multiply each value by its corresponding weight and dividing the sum of
the products by the sum of the weights.

11 + 22+ … + nn
= =
1 + 2 + n …+

where w1, w2, w3, ..., wn are the weights and X1, X2, X3, ..., Xn are the values.
Example:
Numerical
Subject Units (w) Letter Grade Value (X)
PE 2 A 4
Calculus 5 C+ 2.5
English 3 B 3

2 . 4 + 5 .2.5 + 3 . 3
X= = 2.95
10
the QPI is 2.95
Measures of Dispersions
• Range
- difference between highest and lowest value

• Standard Deviation
- always positive - zero if all the data are the same - uses the same
units as the original data set - can be influenced by outliers

• Variance
- does not use the same units as the data
Population Standard Deviation (σ)

σ= =

where - individual value


µ - population mean
N - population size
Sample Standard Deviation (s)
s= =

where - individual value


- sample mean
- sample size
Let A = 5, 5, 5, 5, 5, 5, 5, 5, 5, 5
B = 4, 4, 4, 5, 5, 5, 5, 6, 6, 6
C = 0, 0, 0, 0, 0, 10, 10, 10, 10, 10
D = 0, 5, 5, 5, 5, 5, 5, 5, 5, 10
• Find the mean and median for each data set.
• Find the range and the standard deviation of each data set.
Measures of Positions
• Percentiles - divide the data set into 100 equal groups

1
• Deciles - divide the data set into 10 equal groups

2
1
Image taken from the internet
2
Image taken from the internet
• Quartiles - divide the data set into quarters

3
• z-score (standard score)
- the z-score for a given data value x is the number of
standard deviations that x is above or below the mean of the data
population: sample:

z= z=
3
Image taken from the internet
Example:
A basketball player Carl is 78 inches tall and a volleyball player Jane is 76
inches tall. Carl is obviously taller by 2 inches, but which player is relatively
taller? Does Carl’s height among men exceed Jane’s height among women?
Men have mean height of 68 inches and a standard deviation of 2.8 inches
while women have mean height of 63.6 inches and a standard deviation of 2.5
inches.

Carl : z = = = 3.21

Jane : z = = = 4.96
Carl’s height is 3.21 standard deviations above the mean, but Jane’s height is a
whopping 4.96 standard deviations above the mean.

∴ Jane’s height among women is relatively greater than Carl’s height among
men.

Exercises

1. The average teacher’s salary in a particular city is P54,166. If the standard


deviation is P10,200, find the salaries corresponding to the following z
scores.
a. 2
b. −1.6
c. 2.5
Exercises

2. The mean time to download pdf file is 12 min with a standard deviation of 4 min. Belle’s
download time is 20 min. John’s download time is 6 min. How can you compare Belle’s
download time compare with John?

3. Cheryl has taken two quizzes in her history class. She scored 15 on the first quiz, for which
the mean of all scores was 12 and the standard deviation was 2.4. Her score on the second
quiz, for which the mean of all scores was 11 and the standard deviation was 2.0, was 14.
In comparison to her classmates, did Cheryl do better on the first quiz or the second quiz?

4. Roland received a score of 70 on a test for which the mean score was 65.5. Roland has
learned that the z-score for his test is 0.6. What is the standard deviation for this set of test
scores?
Box and Whisker Plot (box plot)
- used to provide a visual summary of a set of data
- it shows the median, the 1st and 3rd quartiles, and the
minimum and maximum values of a data set

4
Image taken from the internet
Example:
Construct a box-and-whisker plot for the following data:
Number of Rooms Occupied in a Resort during an 18-day period

86 77 58 45 94 96 83 76 75
65 68 72 78 85 87 92 55 61
References:
1. Richard Aufmann, et. al. Mathematics Excursion, Third edition.
2. Allan Bluman, Elementary Statistics, A Step by Step Approach.
Eight Edition.
3. Raja Almukkahal, et. al. CK-12 Advanced Probability and
Statistics, Concept Collection.
God Bless!

You might also like