Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Chapter 1.

Nature of Statistics

1. Definition of Statistics
⮚ Statistics is a form of mathematical analysis that uses quantified models, representations and synopses for a
given set of experimental data or real-life studies. Statistics studies methodologies to gather, review, analyze
and draw conclusions from data, according to Grant and Kenton (2019).
⮚ Statistics is a collection of mathematical techniques that help to analyze and present data. Statistics is also used
in associated tasks such as designing experiments and surveys and planning the collection and analysis of data
from these (Kalla, 2008)
⮚ Statistics is the science concerned with developing and studying methods for collecting, analyzing, interpreting
and presenting empirical data (Bren, 2019)

According to Grant and Kenton (2019) Statistics is a term used to summarize a process that an analyst uses to
characterize a data set. If the data set depends on a sample of a larger population, then the analyst can develop
interpretations about the population primarily based on the statistical outcomes from the sample. Statistical analysis
involves the process of gathering and evaluating data and then summarizing the data into a mathematical form.

2. Branches of Statistics
Descriptive statistics are brief descriptive coefficients that summarize a given data set, which can be either a
representation of the entire or a sample of a population. Descriptive statistics are broken down into measures of central
tendency and measures of variability (spread). (Kenton, 2019)
Inferential statistics, the aim of the inferential statistics is to draw conclusions from a sample and generalize
them to the population. It determines the probability of the characteristics of the sample using probability theory. The
most common methodologies used are hypothesis tests, Analysis of variance etc. (Singh, 2018)

3. Types of Data
Quantitative Data, data are measures of values
or counts and are expressed as numbers.
Quantitative data are data about numeric variables (ABS,
2019). Discreet Data is quantitative data that can be
counted and has a finite number of possible values.
Whereas, continuous data is quantitative data that can
be measured and has an infinite number of possible
values within a selected range. While Qualitative Data,
data are measures of 'types' and may be represented by
a name, symbol, or a number code. Qualitative data are
data about categorical variables

4. Scale of Measurement
Scales of measurement refer to ways in which variables/numbers are defined and categorized. Each scale of
measurement has certain properties which in turn determines the appropriateness for use of certain statistical analyses.
The four scales of measurement are nominal, ordinal, interval, and ratio. (Cornell, 2016)
Nominal: Categorical data and numbers that are simply used as identifiers or names represent a nominal scale
of measurement.
In addition, this level of measurement, the numbers in the
variable is used only to classify the data. In this level of
measurement, words, letters, and alpha-numeric symbols can be
used.
Survey on Why People Travel %
Personal business 14.6
Visit Friends or Relatives 33
Work-related 22.5
Leisure 30
The second level of measurement is the ordinal level of measurement. This level of measurement depicts some
ordered relationship among the variable’s observations (SS, 2019). Furthermore, an ordinal scale of measurement
represents an ordered series of relationships or rank order.
Example: Individuals competing in a contest may be fortunate to achieve first, second, or third place. First,
second, and third place represent ordinal data.

Interval Scale, an interval scale has ordered numbers with meaningful divisions, the magnitude between the
consecutive intervals are equal. Interval scales do not have a true zero i.e In Celsius 0 degrees does not mean the
absence of heat (Bisht, 2019). Furthermore, Interval scales have the properties of: Identity, Magnitude and Equal
distance.
For example, temperature on Fahrenheit/Celsius thermometer i.e. 90° are hotter than 45° and the difference
between 10° and 30° are the same as the difference between 60° degrees and 80°.

Ratio scales have all of the characteristics of interval scales as well as a true zero, which refers to complete
absence of the characteristic being measured. Physical characteristics of persons and objects can be measured with ratio
scales, and, thus, height and weight are examples of ratio measurement (Lee, 2019)
Example: A score of 0 means there is complete absence of height or weight. A person who is 1.2 meters (4 feet)
tall is two-thirds as tall as a 1.8-metre- (6-foot-) tall person. Similarly, a person weighing 45.4 kg (100 pounds) is two-
thirds as heavy as a person who weighs 68 kg (150 pounds).

5. Methods of sampling from a population (Shantikumar, 2018)


There are several different sampling techniques available, and they can be subdivided into two groups:
probability sampling and non-probability sampling. In probability (random) sampling, you start with a complete sampling
frame of all eligible individuals from which you select your sample. In this way, all eligible individuals have a chance of
being chosen for the sample, and you will be more able to generalize the results from your study.
Probability sampling methods tend to be more time-consuming and expensive than non-probability sampling. In
non-probability (non-random) sampling, you do not start with a complete sampling frame, so some individuals have no
chance of being selected.
Consequently, you cannot estimate the effect of sampling error and there is a significant risk of ending up with a
non-representative sample which produces non-generalizable results. However, non-probability sampling methods tend
to be cheaper and more convenient, and they are useful for exploratory research and hypothesis generation.

Probability Sampling Methods

a. Simple random sampling


Each individual is chosen entirely by chance and each member of the population has an equal chance, or
probability, of being selected. For example, if you have a sampling frame of 1000 individuals, labelled 0 to 999, use
groups of three digits from the random number table to pick your sample. So, if the first three numbers from the
random number table were 094, select the individual labelled “94”, and so on.

b. Systematic sampling
Individuals are selected at regular intervals from the sampling frame. The intervals are chosen to ensure an
adequate sample size. If you need a sample size n from a population of size x, you should select every x/nth individual
for the sample. For example, if you wanted a sample size of 100 from a population of 1000, select every 1000/100 =
10th member of the sampling frame.

c. Stratified sampling
In this method, the population is first divided into subgroups (or strata) who all share a similar characteristic. It is
used when we might reasonably expect the measurement of interest to vary between the different subgroups, and we
want to ensure representation from all the subgroups. For example, in a study of stroke outcomes, we may stratify the
population by sex, to ensure equal representation of men and women.

d. Clustered sampling
In a clustered sample, subgroups of the population are used as
the sampling unit, rather than individuals. The population is divided
into subgroups, known as clusters, which are randomly selected to be
included in the study.
cluster sampling is a sampling method in which the entire
population of the study is divided into externally homogeneous, but
internally heterogeneous, groups called clusters. Essentially, each
cluster is a mini-representation of the entire population.

Non-Probability Sampling Methods

a. Convenience sampling
Convenience sampling is perhaps the easiest method of sampling, because participants are selected based on
availability and willingness to take part. Useful results can be obtained, but the results are prone to significant bias,
because those who volunteer to take part may be different from those who choose not to (volunteer bias), and the
sample may not be representative of other characteristics, such as age or sex. Note: volunteer bias is a risk of all non-
probability sampling methods.

b. Quota sampling
This method of sampling is often used by market researchers. Interviewers are given a quota of subjects of a
specified type to attempt to recruit. For example, an interviewer might be told to go out and select 20 adult men, 20
adult women, 10 teenage girls and 10 teenage boys so that they could interview them about their television viewing.
Ideally the quotas chosen would proportionally represent the characteristics of the underlying population.

c. Judgement (or Purposive) Sampling


Also known as selective, or subjective, sampling, this technique relies on the judgement of the researcher when
choosing who to ask to participate. Researchers may implicitly thus choose a “representative” sample to suit their
needs, or specifically approach individuals with certain characteristics. This approach is often used by the media when
canvassing the public for opinions and in qualitative research.

d. Snowball sampling
This method is commonly used in social sciences when investigating hard-to-reach groups. Existing subjects are
asked to nominate further subjects known to them, so the sample increases in size like a rolling snowball. For example,
when carrying out a survey of risk behaviors amongst intravenous drug users, participants may be asked to nominate
other users to be interviewed.

6. Data Collection Methods (Jovancic, 2019)


Quantitative data collection methods
a. Closed-ended Surveys and Online Quizzes
Closed-ended surveys and online quizzes are based on questions that give respondents predefined answer
options to opt for. There are two main types of closed-ended surveys – those based on categorical and those based on
interval/ratio questions.
Categorical survey questions can be further classified into dichotomous (‘yes/no’), multiple-choice questions,
or checkbox questions and can be answered with a simple “yes” or “no” or a specific piece of predefined information.
Interval/ratio questions, on the other hand, can
consist of rating-scale, Likert-scale, or matrix questions
and involve a set of predefined values to choose from on
a fixed scale.

b. Open-Ended Surveys and Questionnaires

Opposite to closed-ended are open-ended surveys and questionnaires. The main difference between the two is
the fact that closed-ended surveys offer predefined answer options the respondent must choose from, whereas open-
ended surveys allow the respondents much more freedom and flexibility when providing their answers.
7. Measure of Central Tendency (Laerd, 2018)
A measure of central tendency is a single value that attempts to describe a set of data by identifying the central
position within that set of data. As such, measures of central tendency are sometimes called measures of central
location. They are also classed as summary statistics. The mean (often called the average) is most likely the measure of
central tendency that you are most familiar with, but there are others, such as the median and the mode.
The mean, median and mode are all valid measures of central tendency, but under different conditions, some
measures of central tendency become more appropriate to use than others. In the following sections, we will look at the
mean, mode and median, and learn how to calculate them and under what conditions they are most appropriate to be
used.

When to use the mean


The mean is usually the best measure of central tendency to use when your data distribution is continuous and
symmetrical, such as when your data is normally distributed. However, it all depends on what you are trying to show
from your data. For example, consider the wages of staff at a factory below:

The mean salary for these ten staff is Php 30.7k. However, inspecting the raw data suggests that this mean value
might not be the best way to accurately reflect the typical salary of a worker, as most workers have salaries in the Php
12k to 18k range. The mean is being skewed by the two large salaries. Therefore, in this situation, we would like to have
a better measure of central tendency. As we will find out later, taking the median would be a better measure of central
tendency in this situation.

When to use the median


The median is usually preferred to other measures of central tendency when your data set is skewed (i.e., forms
a skewed distribution) or you are dealing with ordinal data.

When to use the mode


The mode is the least used of the measures of central tendency and can only be used when dealing with nominal
data. For this reason, the mode will be the best measure of central tendency (as it is the only one appropriate to use)
when dealing with nominal data. The mean and/or median are usually preferred when dealing with all other types of
data, but this does not mean it is never used with these data types.

8. Measures of Variability/Spread (Investopedia, 2019)


Measures of Variability are statistics that describe the amount of difference and spread in a data set. These
measures include variance, standard deviation, and standard error of the mean. If the numbers corresponding to these
statistics are high it means that the scores or values in our data set are widely spread out and not tightly centered
around the mean (Alleydog.com, 2019)
According to Leard (2019), the measure of the spread of data values is important, but one of the main reasons
regards its relationship with measures of central tendency. A measure of spread gives us an idea of how well the mean,
for example, represents the data. If the spread of values in the data set is large, the mean is not as representative of the
data as if the spread of data is small. Since, a large spread indicates that there are probably large differences between
individual scores. Additionally, in research, it is often seen as positive if there is little variation in each data group as it
indicates that the similar.

Variance
The variance is the average of the squared differences from the mean. To figure out the variance, first calculate
the difference between each point and the mean; then, square and average the results.

Standard Deviation
Standard deviation is a statistic that looks at how far from the mean a group of numbers is, by using the square
root of the variance. The calculation of variance uses squares because it weights outliers more heavily than data very
near the mean. This calculation also prevents differences above the mean from canceling out those below, which can
sometimes result in a variance of zero.
Standard deviation is calculated as the square root of variance by figuring out the variation between each data
point relative to the mean. If the points are further from the mean, there is a higher deviation within the date; if they
are closer to the mean, there is a lower deviation. So, the more spread out the group of numbers, the higher the
standard deviation.
Statistical Software

About JASP
⮚ In recognition of Bayesian pioneer Sir Harold Jeffreys, JASP stands for Jeffreys’s Amazing Statistics Program.

⮚ it is released under a Free and Open Source license.

⮚ JASP is currently supported by long-term, multi-million euro grants that help fund a team of motivated software

developers, academics, and students.

⮚ The JASP application is written in C++, using the Qt toolkit. The analyses themselves are written in either R or C+

+. The display layer (where the tables are rendered) is written in javascript, and is built on top of jQuery UI and
webkit.

JASP generally produces APA style results tables and plots to ease publication. It promotes open science by
integration with the Open Science Framework and reproducibility by integrating the analysis settings into the results.
Activity Data:

Preparednes
Res Sex Age Civil Status FMI BLS/DRRM BRGY. Resilience Adaptation
s
Younge
1 Female Married Higher With Training Malinong 4.13 3.25 3.64
r
2 Female Older Single Lower Without Training Malinong 3.33 2.63 3.14
Younge
3 Male Married Lower Without Training Malinong 3.47 2.88 3.00
r
Younge
4 Male Single Lower With Training Malinong 3.00 2.38 3.07
r
5 Female Older Married Higher Without Training Malinong 3.93 3.38 4.64
6 Male Older Married Higher With Training Malinong 4.07 2.94 4.21
7 Female Older Single Higher Without Training Malinong 4.07 3.38 4.07
8 Female Older Married Higher With Training Malinong 4.20 3.31 4.29
Younge
9 Male Married Lower With Training Malinong 3.67 3.25 4.07
r
Younge
10 Female Married Higher With Training Malinong 3.60 4.13 3.29
r
Younge
11 Female Single Lower Without Training Higugma 4.27 4.38 4.21
r
Younge
12 Male Married Higher With Training Higugma 4.40 3.69 4.43
r
13 Female Older Married Lower Without Training Higugma 4.20 4.31 4.43
Younge
14 Male Single Lower Without Training Higugma 4.73 4.56 4.43
r
Younge
15 Female Single Lower Without Training Higugma 4.67 3.94 4.07
r
Younge
16 Male Married Higher Without Training Higugma 4.53 4.50 4.64
r
17 Female Older Single Lower With Training Higugma 4.27 4.44 4.57
18 Male Older Single Lower Without Training Higugma 4.40 4.38 4.43
19 Male Older Single Lower Without Training Higugma 4.53 4.56 4.71
Younge
20 Male Single Higher Without Training Higugma 4.53 4.31 4.36
r
Younge
21 Female Married Lower Without Training Paglaum 4.53 4.69 4.57
r
22 Male Older Single Higher With Training Paglaum 2.73 2.38 2.57
23 Male Older Married Lower With Training Paglaum 2.73 2.56 2.57
Younge
24 Male Single Higher Without Training Paglaum 2.00 2.19 2.5
r
25 Female Older Married Lower Without Training Paglaum 2.07 2.31 2.07
Younge
26 Female Married Lower Without Training Paglaum 1.93 1.94 1.93
r
Younge
27 Male Married Higher Without Training Paglaum 3.33 2.31 2.21
r
Younge
28 Female Married Higher Without Training Paglaum 2.60 2.13 2.21
r
Younge
29 Male Married Lower Without Training Paglaum 2.27 2.13 2.00
r
Younge
30 Female Single Lower With Training Paglaum 2.73 2.38 2.57
r
Legend:
Sex Age Civil Status Basic Life Support Training

1-Male 1-Younger (Below 47 years old) 1-Single 1-With training

2-Female 2-Married (47 years old & above) 2-Married 2-Without training

Family Income Barangay

1-Lower income (Below Php 6,807) 1-Malinong

2-High income (Php 6,807 Above) 2-Paghigugma

3-Gabinuligay

Mean Scale Interpretation

4.50 - 5.00 Very high

3.50 - 4.49 High

2.50 - 3.49 Average

1.50 - 2.49 Low

1.00 - 1.49 Very Low

Practice:

1. What is the profile of the participants?


2. What is the level of resilience of the participants when taken as a whole and grouped according to sex, age, civil
status, family income, basic life support training and barangay?

1. What is the profile of the participants?

Procedure:

Step 1. Open the Data document

Step 2. Select Descriptives and Click Descriptive Statistics


Step 2. Select the following variables and Click the arrow to transfer in the variable box

Step 3. Click Frequency tables and Distribution plots

Step 4. Go to Results
2. What is the level of resiliency of the participants when taken as a whole and grouped according to sex, age, civil
status, family income, basic life support training and barangay?

Step 1. Open the Data document

Step 2. Select Descriptives and Click Descriptive Statistics


Step 2. Select the resilience and Click the arrow to transfer the selected variables in the variable box. Then, select a
grouping variable (example sex) and click the arrow to transfer the selected variable (sex) in the split table.

3. Go to results

Descriptive Statistics
Descriptive Statistics
Resilience
Female Male
Valid 15 15
Missing 0 0
Mean 3.635 3.626
Descriptive Statistics
Resilience
Female Male
Std. Deviation 0.894 0.913
Minimum 1.930 2.000
Maximum 4.670 4.730

Based from the result, both male (M = 3.63, SD = 0.913) and female (M = 3.64, SD = 0.890) have high level of
resiliency. However, the female group has high mean value than the male group this illustrates that the level of
resiliency of the female group was slightly higher than the male group. In addition, the standard deviation of the female
group displays significant consistency than the male group.

Activity 1 (Mean and Standard Deviation)


Process the following problems, prepare a table of results, and discuss the table:

1. What is the level of preparedness of the participants when taken as a whole and grouped according to civil status, and
barangay?

Note: Present your answers in a Microsoft Word (A4, with 1 inch margin, Calibri Font style, 11 font size and 1.5
spacing). In addition, after presenting your answers kindly paste your JASP results at the last page for statistical
evidences.

References:

Alleydog.com (2019), Measures of Variability, https://www.alleydog.com/glossary/definition.php?


term=Measures+Of+Variability

Australian Bureau of Statistics (2019), https://www.abs.gov.au/websitedbs/a3121120.nsf/home/statistical+language+-


+quantitative+and+qualitative+data

Bren, D. (2018), Definition of Statistics, https://www.stat.uci.edu/what-is-statistics/

Cornell, I (2016), Scale of Measurement, http://lsc.cornell.edu/wp-content/uploads/2016/01/Intro-to-measurement-


and-statistics.pdf

Grant, M. and Kenton, W. (2019), Definition of Statistics, https://www.investopedia.com/terms/s/statistics.asp

Investopedia, (2019), Standard deviation and Variance, https://www.investopedia.com/ask/answers/021215/what-


difference-between-standard-deviation-and-variance.asp

Jovanic, N (2019), 5 Data Collection Methods for Obtaining Quantitative, https://www.leadquizzes.com/blog/data-


collection-methods/

Kalla, S (2008) Definition of Statistics, https://explorable.com/what-is-statistics

Kenton, W. (2019), Descriptive Statistics, https://www.investopedia.com/terms/d/descriptive_statistics.asp

Lee, J (2019), https://www.britannica.com/topic/measurement-scale

Shantikumar, S (2018), Methods of sampling from a population, https://www.healthknowledge.org.uk/public-health-


textbook/research-methods/1a-epidemiology/methods-of-sampling-population

Singh, S. (2018), Inferential Statistics, https://towardsdatascience.com/statistics-descriptive-and-inferential-


63661eb13bb5

Statistics Solution (2019), https://www.statisticssolutions.com/data-levels-of-measurement/

You might also like