MMW Reviewer Data Management

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Mathematics in the Modern World

Data Management________________________________
Introduction to Statistics

Statistics is a branch of science pertaining A pair of one measure of central tendency


to the methods of collecting/obtaining, and one measure of variation can be used
organizing, presenting, analyzing, to draw a conclusion, commonly used pair
interpreting data and then drawing are mean and standard deviation.
conclusions based on the data. Population is defined as the complete or
entire collection of elements (person or
Descriptive Statistics tries to summarize or things) to be studied while sample refers to
describe a collection of data. It is a set of the representative part or finite number of
methods to describe data that we have elements chosen from the population.
collected. In relation to population and sample, next is
to differentiate parameter from statistic.
Some of the most commonly used
statistical treatments used are Parameter is numerical value calculated
from a population.
percentages, measures of central tendency
Statistic is a number that describes a set of
(such as the mean, median and mode), observations in a sample.
measures of variation (such as range, Variables are characteristics or values that
vary across individuals. It can be
average deviation, standard deviation,
qualitative or quantitative.
variance and coefficient of variation) and
measures of skewness and kurtosis. Qualitative Variables, also known as
categorical variables, are used to
Inferential Statistics is use to draw represent character, class or kind but not
conclusions and make predictions based in amount. Some examples of qualitative
variables are gender, religion, nationality,
on the analysis of numeric data. It is a set
favorite color and birthplace.
of methods used to make a generalization,
estimate, prediction or decision. Quantitative Variables are variables that
Page 1 of 12can be measured on a numeric or
quantitative scale. It can be classified as or order between them. It implies ranking,
discrete or continuous. order or inequalities. Examples are class
rank, contest winners, degree of burn and
Discrete uses natural numbers or counting
cancer stages.
numbers. Some examples of discrete
variables are number of students enrolled
in STA111, number of iPad units in a store Interval level is the third level of
and number of buildings in Metro Manila. measurement. It refers to quantitative
measurements used to identify and rank
A quantitative variable is continuous if it but, in this scale, differences between two
uses decimals or fractions. Some examples items can be determined and operations
of continuous variables are height, weight, such as multiplication and division are
length, width and speed of a bullet. worthless. Interval scales do not have a
true zero point. Example of an interval
Levels of measurement are used to
data is temperature.
determine the statistical tool that can be
used to describe a data. There are four Lastly, fourth level of measurement is the
levels of measurement; these are Nominal, Ratio level. It is similar to interval scale but
Ordinal, Interval and Ratio. ratio has a true zero point and operations
such as multiplication and division are
The first level is called the Nominal level.
therefore significant. Examples of data
In this level, names are assigned to objects
under ratio are income, age, height, weight,
for the purpose of identifying or belonging
area and volume.
to a group or category. The data cannot be
1. Gender
arranged in an ordering system. Examples
2. Land area
of data under this level are religion,
3. Contest winners
nationality or race, gender, birthplace and
4. Kids height in cm
course.
5. Athletes age in years

The second level is the Ordinal level. In Exercise 17: State the level of measurement
this stage, the words or numbers are of each of the following.
assigned to objects to represent the rank 1. Blood type
Page 2 of 12
2. Doctors’ salary 4. Temperature in Fahrenheit
3. Latin honors 5. Student number

SAMPLING AND SAMPLING TecHNIQUes numbers to select the numbers for the elements
to include in the sample.
Sampling is the process of choosing elements,
such as person, objects or groups from a known In Systematic Skip Sampling, elements are listed
population of interest to be included in a study in numerically and then every “kth” element from
order to generate a fair result. Sampling is done the list is selected from a randomly selected
to reduce cost since it is less expensive conduct starting point.
survey in a sample than in whole population.
Another advantage of using a sample instead of a
population is that in sampling, data can be Stratified Random Sampling is a method where

obtained faster. Also, greater scope and accuracy the population is divided into sub-groups (called

are expected since the volume of work in strata) base on some well-known characteristics of

encoding and computing will be reduced. the population, such as age, gender or socio-

There are two types of sampling techniques: economic status; then take a random sample from
probability sampling and non-probability each strata. The selection of elements is then
sampling.
made separately from within each strata, usually
by random or systematic sampling methods.
Probability sampling or random sampling gives
all members of the population a known and In stratified random sampling, the number of
equal chance of being part in the sample. In samples per strata may be equal or proportional.
other words, the selection of individuals does not
affect the chance of anyone else in the Example 24: A study is conducted to 1,000

population being selected. college students of the University of the East.


Two hundred students will be selected to be part

Simple random sampling is also called the lottery of the study. How many samples are needed per

or the fishbowl method. Simple random sampling year level using equal distribution?

uses scientific calculator or computer program to


generate a random number or a table of random
Page 3 of 12
Year Level Population (Ni ) n
N = 1,000 (number of population) i

First Year 300 200  300


n = 4 (four groups: First Year, Second n I= = 60
1,000
Year, Third Year and Fourth Year)

Second year 250 200  250


n II = = 50
1,000

Third Year 250 200  250


n III = = 50
1,000

Each year level must be represented by 50


Fourth Year 200 200  200
students. n IV = = 40
1,000

Example 25: A study is conducted to 1,000


college students of the University of the East. Cluster Sampling is a method where the
Two hundred students will be selected to be part researcher divides the population into groups, or
of the study. The number of students per year clusters. Elements within a cluster are
level is presented on a table. How many samples heterogeneous or are dissimilar. Select clusters at
are needed per year level using proportional random then use all units in the selected clusters
allocation? as the sample.

Year Level Population (Ni )


Unlike probability sampling, non-probability
First Year 300
sampling does not give everyone an equal chance
Second year 250 of being selected to be part of the sample. Non-
Third Year 250 probability sampling procedures are much less
Fourth Year 200 desirable, as they will almost certainly contain
sampling biases.

Use the formula: ni n  Ni


= Some of the methods under non-probability
N
where ni is the number of sample per year level, sampling are quota, convenience and purposive
of student per year level, Ni is the population, N is sampling.
total number of population of the high school
students and n is the total sample needed.

Page 2 of 12
DescRIPTIVe MeASURES 3. 12, 10, 15, 14, 11, 18
4. 1, 9, 10, 2, 9, 4, 2, 1
Measures of Central Tendency are descriptive
5. 3, 6, 4, 4, 6, 3, 6, 3, 4
measures that are used to describe the center of a
Remark 34: Best use of the mean, median and
set of data, arranged numerically. The three
mode.
different types of “average” will be discussed, the
mean, the median and the mode. The mean is computed if the values are in
interval or ratio scale. The mean is influenced by
outliers that may be at the extremes of the data set.
The most commonly used to measure the central The median is used for ordinal scale. Unlike the
tendency is the mean. It is also called the mean, the median is not influenced by outliers at
computed average. It is defined as the sum of the the extremes of the data set. The mode is practical
values divided by the total number of items. for nominal data. In such cases, the mode may
not exist or may not be very meaningful.
The median is the middle value in a set of data.
The value which divides the distribution into two Now, consider the given set of data:
equal parts, with one half of the values is lower
Set A: 9, 12, 13, 15, 15, 17, 24
than the median and other half are higher than the
median. Set B: 7, 11, 15, 15, 17, 19, 21

The third measure on central tendency is the Set C: 11, 11, 15, 15, 15, 18, 20

mode. It is easily found by inspection. It is a point Using the measures of central tendency, it seems
on the distribution in which the frequency is that the sets are equal (that is, 15). But obviously,
higher than any other value. the sets of data are different. Like, the values of
Set A are more disperse or scattered than of Set B
A distribution with only one mode is called
and C. Using only these measures are not enough
unimodal while f it has two modes, then it is
to describe a given set of data, we need to use
called bimodal. If it has more than two modes, the
other descriptive measures to further describe a
distribution is called multimodal. The mode does
distribution.
not exist in a distribution if no value is repeated

Definition 35: Measures of Dispersion or


Exercise 33: Determine the mean, median and Variability describes the spread or the scatterings
mode of the given set of data.
of the values around the mean.
1. 8, 10, 13, 13, 16
2. 2, 5, 3, 8, 5, 7, 2
Page 3 of 12
Definition 36: The range is the difference Exercise 41: Using the above data, compute the
between the highest and lowest value/observation. variance and the standard deviation of set A.

Definition 42: In a symmetrical or normal


Example 37: Using the data above, the range of
Set A is 24 − 9 = 15. distribution the mean, median, and mode all fall
at the same point or equal.

Definition 38: The average deviation is the


measure of the distance of each value to the mean. Definition 43: In a positively skewed
distribution, the extreme scores are larger, thus
The formula is given by:
AD =
 x−x the mean is larger than the median.
n

where 𝑥̅ is the mean, 𝑥 are the values and 𝑛 is Definition 44: In negatively skewed
number
distribution, the order of the measures of central
of values. tendency would be the opposite of the positively
skewed distribution, with the mean being smaller
Exercise 39: Compute the average deviation of set
A in the data above. than the median, which is smaller than the mode.

Definition 40: Variance measures how much


variability there is in the entire distribution. The
standard deviation is the most commonly used
measure of dispersion. It is the positive square
root of the variance. The formulas are as follows:
Remark 46: When Sk = 0, the distribution is
Normal or Symmetrical, when Sk > 0, the
distribution is Positively Skewed and when Sk <
0, the distribution is Negatively Skewed

Variance
HYPOTHESIS TESTING

Definition 47: A statistical hypothesis is a


conjecture concerning one or more population
whose veracity can be stablished using sample

Standard Deviation data.

Definition 48: Parametric tests are applied to


Page 4 of 12
data that are normally distributed. Moreover, it is Moment Correlation Coefficient. The formula
assumed that the measurement of variables is is given by
either interval or ratio level.

Definition 49: Nonparametric tests do not


require a normal distribution and the variables of
interest are on nominal or ordinal level.

where 𝑋 = the observed data from the


Table 50: Below is the summary of some of the independent variable, 𝑌 = the observed data from
different statistical tests.
the dependent variable, 𝑁 = sample size and 𝑟 =
degree of relationship of x and y

Remark 53: The range of the correlation


coefficient is -1 and +1. If the value of the
coefficient is close to -1.00, it represents a perfect
negative correlation while a value of +1.00
represents a perfect positive correlation. If the
value is equal to 0.00, it means that there is no
relation between the variables.

Number of Final Grade


CORRELATION Absences (x) (y)
6 82
Definition 51: Correlation measures the strength 2 86
15 43
of the linear association between two quantitative
9 74
variables: the independent variable and the 12 58
dependent variable. The independent variables 5 90
8 78
are variables that can be manipulated or
controlled while dependent variables are those
that cannot be controlled.

Definition 52: The most commonly used


technique to calculate the coefficient of
correlation is by using the Pearson Product
Page 5 of 12
Page 6 of 12
Page 7 of 12
Page 8 of 12
Page 9 of 12
Page 10 of 12
Page 11 of 12
Page 12 of 12
.

Page 13 of 12
Page 14 of 12

You might also like