Professional Documents
Culture Documents
STAE Lecture Notes - LU1
STAE Lecture Notes - LU1
LEARNING OBJECTIVES
• Understand the concepts of a population, sample, parameter, statistic, random variable and data
• Distinguish between descriptive and inferential statistics
• Distinguish between categorical and numerical variables
• Identify the four different scales of measure
• Know the difference between raw data and frequency data
Population parameter
A population parameter is a constant value (usually unknown) that describes some measurable aspect of a
population. Population parameters are generally denoted using Greek letters.
Sample
A sample is a subset of the population of interest. Samples are generally used to collect information since it is
not always possible or feasible to consider the entire population. The total number of elements in a sample is
denoted by n.
Sampling unit
A sampling unit is the object being measured, counted or observed.
1
Sample statistic
A number calculated from sampled data, which describes a measurable aspect of a sample, is called a statistic.
Sample statistics are generally denoted using Roman letters.
Notation
Sample statistic Population parameter
Mean x (x-bar) (mu)
Variance s 2 (s-squared) 2 (sigma-squared)
Standard deviation s (sigma)
x
Proportion p= (pi)
n
Size n N
Descriptive statistics
Descriptive statistics comprise those methods used to organise and describe information that has been
collected in a sample.
Inferential statistics
Inferential statistics comprise those methods and techniques used for making generalisations, predictions or
estimates about the population using sampled data.
Random variable
A characteristic of the elements of a population (or sample) for which the observed values differ from element
to element is called a variable. In probability theory, where a variable assumes certain values with certain
associated probabilities, the variable is called a random variable. Variables are denoted by capital letters, e.g.,
X, Y, Z, and the actual values assumed by the random variables are denoted by lower case letters, e.g., x, y, z.
For example, let X = the height of boys in metres. Here X is a random variable, which measures the variable
“height”. If three boys are selected at random, i.e., n = 3, and their respective heights are 1.40m, 1.37m and
1.41m, then the realisations of the random variable X are denoted as xi, for i = 1, 2, 3:
x1 = 1.40 x2 = 1.37 x3 = 1.41
Data
The collection of all variables measured forms the data.
2
Unit of measurement
The unit of measurement of a variable is the standard unit used to express a quantity. For example, if
measurements are made in whole seconds, the unit of measurement is one second, i.e., unit = 1. If a stopwatch
records the length of time to solve a problem in tenths of a second , the unit of measurement is 0.1 second, i.e.,
unit = 0.1.
Sigma notation
In Mathematics, sigma notation is the standard notation used to represent summation. It is a convenient and
simple way to write long sums in a compact form. It is denoted by the Greek capital letter sigma, . If a
random variable X consist of n observations x1 , x2 , , xn , the sum of all n values is represented in sigma
n
notation as x , or simply as x .
i =1
i
For example, if X = the number of children in a household where x1 = 2 , x2 = 3 and x3 = 0 , then the total
x =x = x + x
i =1
i 1 2 + x3 = 2 + 3 + 0 = 5
Exercise 1.1
Consider the results of the 2 semester tests for STAE:
1) All the STAE students form the
2) A selection of 50 STAE students is a
3) Each test is a
4) The sampling unit is
5) The results from all 2 tests form the
6) The average mark for Test 1 is a
7) To test whether the current group of STAE students perform better than groups from previous years is the
process of
3
1.3. Variable Type
It is important to identify a variable in terms of its type, namely categorical or numerical. This distinction
determines the appropriate analyses that can be performed on a variable.
Categorical variables
Categorical variables are also known as qualitative variables. Such variables allow for classification based on
some characteristic. For example, gender classified as male and female. The values of categorical variables
are often recorded as numerical values, e.g., coding then gender variable where 1 = Male and 2 = Female, but
these values have no numerical meaning as they simply denote the categories of the variable.
Numerical variables
Numerical variables are also known as quantitative variables. Such variables are naturally measured as
numbers. For example, a person’s height in centimetres. Arithmetic operations can be performed on the
variables as the values have numerical meaning. Numerical variables are further classified as either discrete
or continuous. Discrete variables assume values obtained by counting and consist of a finite number of values,
for example the number of children in a household. Continuous variables assume values obtained by
measuring and consist of an infinite number of values along the real line, such as the variable age.
Nominal
A categorical variable is measured on a nominal scale if the variable consists of two or more categories with
no intrinsic order. For example, a person’s eye colour could be classified as brown, blue, green or grey. There
is no logical way in which these four categories can be ordered.
Ordinal
A categorical variable is measured on an ordinal scale if the variable consists of two or more categories that
can be ordered or ranked. For example, a person’s age classified as young, middle-aged or old. The three
possible values of this variable are ordered in a logical way. Another example is an anxiety rating on a scale
from 1 to 5 where 1 = no anxiety and 5 = high anxiety. In this case numbers are used to reflect the measurement
in an order from low to high, i.e., a score of 4 indicates higher anxiety than a score of 2, but that does not mean
a person who rated 4 is twice as anxious as a person who rated 2.
4
Interval
A numerical variable (discrete or continuous) is measured on an interval scale if the values of the variable can
be arranged in order, there is no true or absolute zero, i.e., the value of zero is an arbitrary reference point,
differences between data values are meaningful, but ratios between values are not meaningful. For example,
temperature in degrees Celsius. The values are numerical and ordered. A temperature of 0 °C does not mean
an absence of temperature, i.e., the scale has an arbitrary zero value. The difference between
10 °C and 20 °C is the same as the difference between 30 °C and 40 °C, namely a 10-degree difference.
However, 20 °C is not twice as hot as 10 °C, i.e., ratios are not meaningful.
Ratio
A numerical variable (discrete or continuous) is measured on a ratio scale if the values of the variable can be
arranged in order, there is a true or absolute zero, differences between data values are meaningful, and ratios
between values are meaningful. For example, the amount of money in a bank account in Rand. The values are
numerical and ordered. An amount of R0 implies an absence money, i.e., the scale has an absolute zero value.
The difference between R10 and R20 is the same as the difference between R30 and R40, namely a R10
difference. R20 is twice as much money as R10, i.e., ratios are meaningful.
Frequency data
Frequency data are raw data in aggregated format where individual, or a range, of data values are listed with
a count of the number of times each value/range appeared in the dataset. This count is referred to as the
frequency of occurrence, or simply the frequency. It shows how the data are distributed across the scale.
Frequency data provide an overview of the sampled information.
5
Univariate frequency data represent counts of a single variable, and bivariate frequency data represent counts
of the combination of two variables. Steps to enter frequency data into the calculator are discussed in Section
2.1.3.
Exercise 1.2
Data were collected for a random sample of 20 coffee consumers. The survey yielded the following nine
variables, and the data are given in Table 1. For each variable, identify the type and the scale of measure.
6
Variable Type Scale of measure
Consumer ID number
Gender
Age
Highest qualification
Household size