Professional Documents
Culture Documents
[STATS] Module 3
[STATS] Module 3
| MGT1103
a.y. 2023-
2024
[Module 1] Data and Data Preparation
categorical
Data preparation
Qualitative
We often spend a considerable amount of time inspecting
Represents categories
and preparing the data for the subsequent analysis.
Labels or names to identify distinguishing characteristics.
o Counting and sorting
Can be defined by two or more categories
o Handling missing values
Coded into numbers for data processing
o Subsetting
Summarize the data with a frequency distribution.
Ex: marital status, grade in a course Counting and sorting
Among the very first tasks analysts perform.
numerical
Gain a better understanding and insights into the data.
Use numbers to identify the distinguishing characteristics
Help to verify that the data set is complete or determine if
of each observation.
there are missing values.
Quantitative
Sorting allows us to review the range of values for each
Represent meaningful numbers
variable.
Discrete or continuous
Sort based on a single or multiple variables.
o DISCRETE assumes a countable number of values.
- Need not be whole numbers
- Ex: number of children in a family
Dealing with missing values:
Omission strategy
o CONTINUOUS assumes an uncountable number of
Observations with missing values be excluded from the
values within an interval
subsequent analysis.
- Often measured in discrete values
- Ex: weight of a newborn baby Imputation strategy
NOTE! Missing values be replaced with some reasonable
In order to choose the appropriate techniques for summarizing imputed values.
and analyzing variables, we need to distinguish between the o Numeric variables: replace with the average
different measurement scales. o Categorical variables: replace with the predominant
category.
Scales of measurement
nominal subsetting
Least sophisticated Process of extracting a portion of the data set that is
Represents categories or groups relevant.
Values differ by labels or names
Ex: marital status
ordinal
Stronger level of measurement
Categorize and rank data with respect to some
characteristic.
Cannot interpret the difference between the ranked values,
numbers are arbitrary.
Ex: reviews from 1 star (poor) to 5 stars (outstanding).
NOTE!
Numerical and ordinal scales are used for CATEGORICAL
VARIABLES.
Typically expressed in words but are coded into numbers
for purposes of data processing.
Typically count the number of observations that fall into
each category (or find %)
Unable to perform meaningful arithmetic operations.
interval
Categorize and rank; differences are meaningful.
Zero value is arbitrary and does not reflect absence of
characteristic.
Ratios are not meaningful
Ex. Temperature
ratio
Strongest level of measurement
A true zero point, reflects absence of characteristic
Ratios are meaningful