Lecture 1 - Introduction To Statistics

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

#CreatingADigitalNation

INTRODUCTION TO STATISTICS

Dr. Syaiba Balqish Ariffin


Semester February 2022

www.unimy.edu.my
• Menuju masa depan (The Future)

Contents
▪ 5 Steps in Statistics : Collect, Organize, Present, Analyze, Interpret
▪ Types of Statistics: Descriptive Statistics vs. Inferential Statistics
▪ Types of Study: Observational Study vs. Experimental Study
▪ Population vs. Sample
▪ Parameter vs. Statistic
▪ Sources of Data/Data Collection – Survey, Interview, Census, Experimental
▪ Sampling Methods – Sampling with Probability vs. Sampling with non-Probability
▪ Types of Variables: Quantitative (Continuous/Discrete), Qualitative (Categorical)
▪ Measurement Scales: Nominal, Ordinal, Interval, Ratio

Malaysia’s Premier Digital Technology University Ι Creating A Digital Nation 3


Why Learn Statistics ?
Statistics is the branch of mathematics that transforms numbers into useful information for decision
making.

Statistics lets you know about the factors, risks, effects associated with making decision and allows
you to understand and reduce the variation (error) in the decision-making process.

Statistics provides you with methods for making better sense of the numbers used every
day to describe or analyze the world we live in.
What is Statistics?
The science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making
more effective decisions

5 IMPORTANT STEPS in STATISTICS

• Collect
• Organize
• Present
• Analyze
• Interpret
Types of Statistics
DESCRIPTIVE STATISTICS - consists of procedures used to summarize and describe the important
characteristics of a set of measurements.

Example:
• Tabulating
• Graphing
• Numerical Measures
• Correlation

INFERENTIAL STATISTICS - consists of procedures used to make inferences about population


characteristics from information contained in a sample drawn from this population.

Example:
• Sampling distribution
• Hypothesis testing
• Confidence interval
• Linear Regression (Statistical Model)
Basic Vocabulary of Statistics
DATA- Data are the different values associated with a variable (or collection of variables)
PRIMARY DATA are collected specifically for the analysis desired
SECONDARY DATA have already been compiled and are available for statistical analysis

VARIABLE- A variable is a characteristic that changes or varies over time and/or for different
individuals or objects under consideration.
A CONSTANT has a fixed numerical value.
EXPERIMENTAL UNIT -An experimental unit is the individual or object on which a variable is
measured. A single measurement or data value results when a variable is actually measured
on an experimental unit.

POPULATION - A population is the set of all measurements of interest.


SAMPLE - A sample is a subset of measurements selected from the population of interest.

PARAMETER - A parameter is a measure that describes a characteristic of a population.


STATISTIC - A statistic is a measure that describes a characteristic of a sample.
Basic Vocabulary of Statistics
UNIVARIATE- Univariate data result when a single variable is measured on a single experimental unit
BIVARIATE-Bivariate data result when two variables are measured on a single experimental unit.
MULTIVARIATE- Multivariate data result when more than two variables are measured.

VARIABLE X- Independent variable, factor, attributes, input


VARIABLE Y-Dependent variable, response, output

QUALITATIVE -Qualitative variables measure a quality or characteristic on each experimental unit.


QUANTITATIVE- Quantitative variables measure a numerical quantity or amount on each experimental
unit

DISCRETE- A discrete variable can assume only a finite or countable number of values.
CONTINUOUS-A continuous variable can assume the infinitely many values corresponding to the
points on a line interval.
11
Suppose we want to know the average height of
male students in UNIMY.
What is the data, variable, experimental unit,
Example of population, sample, parameter, statistic?
a study Data-Male student in UNIMY
Variable-height
EU-Male student
Population-All male students
Sample-100 male students
Parameter-Average height all male students
Statistics-Average height of 100 male students
13
Source of •Surveys through questionnaires

Data/Data •Interviews.

Collection •Experiments (doesn’t have to be in the lab).

•Census
15
Types of Study
The researcher do not Example:
apply any treatment
A study to see if
to the sample under
1. Observational customers who enter
study – can only
a clothing store pay
Study observe the behavior
cash or using credit
of the
card for their goods.
objects/subjects.

Not affecting the


objects/subjects
The researcher Example:
applies a A scientific
(ii) Experimental
treatment to the experiment done on
Study sample under rats to see the
study. effect of a new
medication in
affecting the lowering the blood
glucose level of
objects/subjects
these rats.
18
For the following, identify the type of study
involved:

(i) JPJ stationed its personnel at main traffic


junctions to observe the behavior of road
users. (Observational)
(ii) A Chemistry student uses different amount
of hydrochloric acid in his experiment in the
lab. (Experimental)
(iii) A farmer uses different types of fertilizers
on his chili crops to find the best fertilizer.
(Experimental)
Sampling methods
A sample should have the same characteristics as the population it is representing. The goal
of sampling is to collect data that are representative of the entire population of interest.

Sampling can be:

(i) with replacement: a member of the population may be chosen more than once (picking the
candy from the bowl)

(ii) without replacement: a member of the population may be chosen only once (lottery ticket)
Sampling methods
Sampling methods can be:

(i)random (each member of the population has an equal chance of being selected)

This sampling strategy is called randomization.

(ii) Non-random (without probability)

Sampling error/bias occurs when

(i) Apply sampling method with non-random or without probability


(ii)Sample are not adequate/or too small to represent the population.
(iii) Sample contains information not related to the population
Random Sampling Methods
(with probability)
(i) simple random sample - each sample of the same size has an equal chance of being selected

(ii) stratified sample - divide the population into groups called strata and then take a sample from
each stratum. Each group have similar characteristics.

(iii) cluster sample - divide the population into groups called strata and then take a sample from each
stratum randomly. Each group have different characteristics

(iv) systematic sample- randomly select a starting point and take every n-th piece of data from a listing
of the population.
22

Probability
Sampling
23

Non-
probability
sampling
24
Qualitative Data
Qualitative data are generally described by words or letters. They are not as widely used a
quantitative data because many numerical techniques do not apply to the qualitative data. For
example, it does not make sense to find an average hair color or blood type.

Qualitative data can be separated into two subgroups:


 Nominal (if it takes the form of a word with two options (gender - male or female)

 Ordinal (if it takes the form of a word with more than two options (education - primary school,
secondary school and university).
Quantitative Data
Quantitative data are always numbers and are the result of counting or measuring attributes of a
population.

Quantitative data can be separated into two subgroups:

discrete (if it is the result of counting (the number of students of a given ethnic group in a class, the
number of books on a shelf, ...)

continuous (if it is the result of measuring (distance traveled, weight of luggage, …)


Measurement Scales
Nominal – consist of categories in each of which the number of respective observations is recorded.
The categories are in no logical order and have no particular relationship. The categories are said to
be mutually exclusive since an individual, object, or measurement can be included in only one of
them.

Ordinal – contain more information. Consists of distinct categories in which order is implied. Values
in one category are larger or smaller than values in other categories (e.g. rating-excelent, good, fair,
poor)

Interval – is a set of numerical measurements in which the distance between numbers is of a


known, constant size. (same width of interval)

Ratio – consists of numerical measurements where the distance between numbers is of a known,
constant size, in addition, there is a nonarbitrary zero point.
28
Nominal

Nominal data is the simplest form of data, and is defined as data


that is used for naming or labelling variables
Ordinal

Ordinal data is a type of categorical data in which the values follow


a natural order
Interval

Interval data is measured numerical data that has equal distances


between adjacent values, but no meaningful zero
Ratio

Ratio data is measured numerical data that has equal distances


between adjacent values and a meaningful zero

Kelvin as a unit of noise temperature


Can you identify these variables as being either qualitative or 41
quantitative, continuous, discrete?
1.Number of books read in a week : 5, 3, 7, 9 .(Discrete)

2.Types of hand phones : Nokia, Samsung, Sony Ericsson (Nominal)

3.Time taken to finish a 100-meter run: 13.2sec , 15.8sec . (Ratio)

4. Subjects taken this semester: Geometry, English,


Statistics. (Nominal)

5. Academic Year : 1st year, 2nd year, 3rd year . (Ordinal)

6. Height, Weight, Age, Exam scores. (Ratio)

7. Gender, Eye Colour, Laptop brand, Race. (Nominal)

You might also like