Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

CHAPTER 5: DATA MANAGEMENT

TOPICS:

• Gathering and organizing data


• Measures of central tendency
• Measures of dispersion
• Probabilities and normal distribution
• Linear regression and correlation

INTRODUCTION TO STATISTICS
STATISTICS

• A branch of Mathematics dealing with collection, organization, presentation, analysis and


interpretation of data.

USE OF STATISTICS

• Business & Economics: To forecast sales, stocks, etc.


• Marketing: to study consumer behavior
• Sports: to summarize performance of athletes
• MEDICINE: To determine the efficacy of a drug

“Medical students may not like statistics, but as doctors they will.” Prof. Martin Bland

• Psychology to study human behavior


• Criminology on crime and criminal justice is critical in order for governments to be effective in
implementing anti-crime programs and protect the well-being of the population and assess the
social impact of public expenditures and policies.

DESCRIPTIVE STATISTICS

• involves method of organizing, summarizing and presenting data and limited to the data set
• used to say something or describe a set of information collected.
o Tools: •Measures of Central Tendency • Measures of Variability
o Example: In a Math test, 32 out of 40 students were able to receive a passing mark. The
average score of the class is 82 out of 100.
INFERENTIAL STATISTICS

• Involves using the information from a sample to draw conclusions about the population
• used to say something about a larger group (population) using information collected from a
small part of the population (sample).
o Tools: • Hypothesis Testing • Regression Analysis
o Example: In a sample survey conducted, 65% of Filipino Generation Z prefer to drink
milk tea than coffee while only 34% of Filipino Millenials prefer to drink milk tea than
coffee.

KEY CONCEPTS OF STATISTIC

POPULATION SAMPLE
ALL the members of the subject of interest SELECTED members of the subject of interest
entire group people, things, or events having at small number of observation taken from the total
least one trait in common (Sprinthall ) number making up a population
adding too many common traits can limit the size As long as the observation or data is not the
of the population. totality of the entire population
Example: Example:
A group of students A group of male students. In a population of 100, then 1 is considered as a
A group of male students attending the Statistics sample.
class 30 is clearly a sample.
A group of male students attending the Statistics 99 taken from 100 is still considered a sample.
class with iPhone
A group of male students attending the Statistics Not until we include that last number (making it 100) could
class with iPhone and Earphone we claim that it is already a population and no longer a
sample.
Population is PARAMETER Sample is STATISTICS
• any measure obtained in gauging the • any measure obtained in gauging the
entire population, sample

STATISTICS is an estimate of the PARAMETER


KEY CONCEPTS OF STATISTICS
VARIABLE COSTANTS

• The ones that we measure held fixed

EXAMPLES
Scenario 1
When all the freshmen students of BSU were asked, it was found out that, on the average, they spend 3.7
hours of sleep per day during the exam week. But from thirty (30) randomly selected students, it was
found to be 3.6 hours per day.

POPULATION: all the freshmen students of BSU Parameter: 3.7 hours

SAMPLES: thirty (30) randomly selected students Statistics: 3.6 hours

Variables: number of hours of sleep Constant: year level of the BSU students

Scenario 2
From 100 randomly selected residents of Batangas City, it was found that 13% had COVID19 in 2020. But
according to the City Health Office, 11.9% of all the residents of Batangas City had COVID19 in 2020.

POPULATION: all the residents of Batangas City Parameter: 11.9%

SAMPLES: 100 randomly selected residents of Batangas City Statistics: 13%

Variables: occurrence of COVID19 Constant: disease, year

Scenario 3
5% of Asian men suffer from red-green color blindness. From 250 randomly selected men in the
Philippines, it was found that 3% suffer from this type of color blindness.

POPULATION: all Asian men Parameter: 5%


SAMPLES: 250 randomly selected men in the Philippines Statistics: 3%

Variables: occurrence of color blindness Constant: type of color blindness, race

DATA MANAGEMENT

• Data is everywhere
• It is observable or measurable.
• Data are the quantities (numbers) or qualities (attributes) measured or observed that are to be
collected and analyzed (Asaad, 2004)

DATA GATHERING METHODS

1. Direct or Interview Method


2. Indirect or Questionnaire Method
3. Registration Method
4. Observation Method
5. Experimental method

TWO TYPES OF DATA

1. Qualitative Data
• deals with categories or attributes
• Examples: Eye, color, Ethnicity, Brand of Ice cream

2. Quantitative Data
• numerical data

Examples: Discrete: obtained through counting (finite)

o The number of households in a particular community


o Number of students in a class

• Continuous: obtained by measuring, infinitely many corresponding to line interval


o Family income
o Weight/height
o Windspeed

FOUR LEVELS OF MEASUREMENT

1. Nominal Scale
• assigns names or labels to observation in purely arbitrary sequence
• Labels are used to classify the respondents or objects without ordering
• Examples:  Gender  Religion  Civil status
2. Ordinal Scale
• assigns numbers or labels to observations with implied ordering
• Ranking respondents preferences
• Examples: Stage of Cancer (I, II, III, IV)
o Size of t-shirt (small, medium, large)
o Educational Level (elementary, secondary, tertiary)
o Satisfaction level ( very satisfied, satisfied, neutral, dissatisfied, very dissatisfied)

3. Interval Scale
• reflect distance between rank position of the respondents or objects in equal units
• this scale gives the distance between any two numbers of known sizes
• No true zero point (zero has value)
• Can be manipulated algebraically by addition or subtraction but not division or
multiplication
o Examples:  Temperature  SAT scores

4. Ratio Scale
• Reflect the existence of true absolute zero point as its origin
• Doesn’t have negative number like interval
• Ratio of two scale point is independent of the unit measurement
o Examples:  Distance  Weight  HeighT
.

PART 2

MEAN

• most widely used measure of the central tendency


• Also known as average

Example

1. Six friends in a Math class of 20 students receives test grades of 92, 84, 65, 76, 88, and 90.
Find the mean of these test scores.
2. The ages of five candidates for graduation are the following: 18, 17, 18, 19, and 18. Find
their average age.

MEDIAN

• Is the midpoint of the data array.


• Arrange first the data from least – greatest(or vice versa)

Example

1. Seven mothers were selected and given a blood pressure check, their blood pressure were
recorded: 135, 121, 119, 116, 130, 121, 131 Find the median.

2. Eight novels were randomly selected and the number of pages were recorded as follows:
415, 398, 402, 400, 420, 415, 407, 425 Find the median
MODE

• most frequently occurring score in a distribution

Example

1. Find the mode of the given data set: 15, 28, 25, 48, 22, 43, 39, 44, 43, 49, 34, 22, 33, 27, 25, 22
and 30.

2. The speed of ten stenographers in typing words per minute are as follows: 121, 110, 120, 119,
112, 121, 118, 115, 107, 115

3. Find the mode of the given data: 2, 5, 8, 9, 11, 4, 7, 23


THERE IS NO MIDDLE
WEIGTED MEAN

EXAMPLE
FREQUENCY DISTRIBUTION TABLE

• used to organize raw data

TOPIC 2: MEASURES OF CENTRAL TENDENCY (GROUPED DATA)

MEAN
MEDIAN

MODE

EXAMPLE

Compute the mean, median, and mode of the scores of


the students in a basic statistic test.
MEAN
MEDIAN

MODE
TOPIC 3: MEASURES OF DISPERSION (UNFROUPED DATA)

MEASURES OF DISPERSION

• A measure of variability of a set of data is a number that conveys the idea of a spread for the
data set
o Range
o Standard Deviation
o Variation/ Variance

RANGE

• Symbolized by R, describes the variability of scores


by merely providing the width of the entire
distribution.
• can be found by simply determining the difference
between the highest score and the lowest score.
• This difference always has a single value answer.
R=HIGHEST VALUE-LOWEST VALUE
• EXAmple: Find the range of the number of ounces
dispensed by Machine 1 and Machine 2

VARIANCE

• The variance for a given data set is the square of the standard deviation of the data.

STANDARD DEVIATION

• The Standard Deviation is a measure of how spread out numbers are


• Symbol: 𝜎 (Greek letter sigma)
PROCEDUREAS IN COMPUTING STANDARD DEVIATION

FINDING THE VARIANCE AND STANDARD DEVIATION

STANDARD DEVIATION PROBLEM SOLVING


MEASURES OF DISPERSION GROUPED DATA
MEASURES OF RELATIVE POSITION (Z-SCORE)

Z-SCORE

• A z-score measures the distance between an observation and the mean, measured in units of
standard deviation

o z score (positive) – the positive score is above the mean


o z score (0) – the score is the same as the mean
o z score (negative) – the positive score is below the mean
TOPIC 4 NORMAL DISTRIBUTION

PROPERTIES OF A NORMAL DISTRIBUTION

• The distribution curve is bell curve


• The curve is symmetrical about its center
• The mean, median and mode coincide at the center
• The tails of the curve flatten out indefinitely along the
horizontal axis but never touch it (The curve is asymptomatic
to the base line)
• The area under the curve is 1, thus, it represents the
probability or proportion or the percentage associated with specific sets of measurement values

EMPIRICAL RULE FOR NORMAL DISTRIBUTION

In a normal distribution, approximately

• 68% of the data lie within 1 standard deviation of the


mean.
• 95 % of the data lie within 2 standard deviation of the
mean.
• 99.7 % of the data lie within 3 standard deviation of
the mean
FOUR STEP PROCESS

In Finding the Areas under the Normal Curve given a z - value

1. Express the given z – value into a 3-digit form


2. Using the z – table, find the first two digits on the left column
3. Match the third digit with the appropriate column on the right
4. Read the area (or probability) at the intersection of the row and column.

Find the area that corresponds to z = 2

Finding the area that corresponds to z = 2 is the same as finding the area between z = 0 and z =
2 (0 < 𝑧 < 2)
Find the area that corresponds to z = 2 .47

Finding the area that corresponds to z = 2 is the same as finding the area between z = 0 and z =
2.47 (0 < 𝑧 < 2.47)

Determine the area under the standard normal curve to the right of z = 1.63 (0 < 𝑧 < 1.63)
Find the area under the standard normal curve between z = 1.03 and z = -0.37 (−0.37 < 𝑧 < 1.03)

USING SCIENTIFIC CALCULATOR


TOPIC 5: LINEAR REGRESSION AND CORRELATION

SCATTERPLOT

• A scatter plot is a graph of ordered pair (x, y) of umbers consisting of the independent variable
x, and the dependent variable ,y.
• The independent variable is the variable that can be controlled and manipulated.
• The dependent variable is the variable that cannot be controlled and manipulated.
• The independent variable is plotted on the horizontal axis and the dependent variable on the
vertical axis.
• The purpose of this graph is to determine the nature of the relationship between the variables.
The relationship maybe positive linear, negative linear, curvilinear, or no discernible
relationship.

EXAMPLE

1. Construct a scatter plot for the given data


POSITIVE CORRELATION NEGATIVE CORRELATION NO CORRELATION

This illustrates a perfect positive relationship. When computed, the coefficient of the relation is equal to
1
CORRELATION

• is a statistical method used to determine if there is a relationship between variables and the
strength of the relationship
• Pearson Correlation Coefficient
o Degree of linear association/relationship between two variables (at least of the interval
scale)
o Measure by correlation coefficient (r)
TOPIC 5 REGRESSION

LINEAR REGRESSION ANALYSIS

• Linear Regression is the simplest and commonly used statistical measure for prediction studies.
• It is concerned with finding an equation that uses the known values of one or more variables,
called the independent or predictor variables, to estimate the unknown value of quantitative
variable called the dependent or criterion.
• A prediction when a variable (y) is dependent on the second variable (x) based on the regression
equation of a given set of data.
• After a scatter plot is
constructed and the value
of correlation coefficient is
deemed to be significant,
then an equation of the
regression line is
determined.
CHARACTERISTIC OF A REGRESSION LINE

POSITIVE LINEAR RELATIONSHIP

NEGATIVE LINEAR RELATIONSHIP


..

You might also like