Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

INTRODUCTION

1. WHAT IS STATISTICS?

Statistics is the science of planning studies and experiments for collecting, displaying, analyzing
and interpreting data, and making decisions based on the data. Statistics course, generally cover
all these aspects.

2. TYPES OF STATISTICAL APPLICATION


Statistical methods can be divided into two main branches: descriptive and inferential.

Descriptive Statistical Methods


In descriptive statistics, the objective is to describe the properties of a set of scores or a group of
data that have been gathered. The properties that are often of interest to us are, for example, the
centre, the spread, the shape of the dataset.
Consider the following set of ages of persons in a residential area:
28, 38, 45, 47, 51, 56, 58, 60, 63, 63, 65, 66, 66, 67, 68, 70
At a glance we could say that the ages
- range from 28 to 70 (spread)
- the middle age is somewhere around 60 (center)
- the shape of the distribution of the ages is ? (draw a graph to see the shape..)

Inferential Statistical Methods


In inferential statistics, the interest is on data that are so large that not all of them are accessible.
However, samples from these large collections can be taken and used to make inferences about
the larger collection.
The relation between the samples and the larger collections of data (known as populations) from
which they have been drawn, is the subject of inferential statistical methods.

3. TYPES OF MEASUREMENT
Measurement consists of rules for assigning numbers to attributes of objects. By definition, any
set of rules for assigning numbers to attributes of objects is measurement.

Variables differ in how well they can be measured, i.e., in how much information their
measurement scale can provide. A factor that determines the amount of information that can be
provided by a variable is the type of measurement scale used.

The level of measurement of the data helps us to decide the appropriate statistical procedure to
use. There are four levels in the measurement scale of variables:
(i) nominal (ii) ordinal (iii) interval or (iv) ratio.

Nominal Scale
Nominal variables only allow for qualitative classification, i.e. they can be measured only in
terms of whether the individual items belong to some distinct categories. We cannot quantify nor
rank order the categories.

The nominal level of measurement are characterized by data that consist of names, labels or
categories only. Typical examples of nominal variables are gender, race, color, city, etc. For
example, the variable "race"; we can only say that 2 individuals in different categories are of a
different race; we cannot say which category has more of the quality represented by the variable.

A nominal scale is a measurement system that do not possess the properties of magnitude, it
lacks numerical significance and therefore should not be used for calculations.

Ordinal Scales
Data at the ordinal level of measurement can be arrange in some order, in terms of which has less
and which has more of the quality represented by the variable. However, we still cannot tell how
much more or how much less each category differ from the other.

An example of an ordinal variable is the socioeconomic status of families; we know that upper-
middle is higher than middle but we cannot say how much higher.
Rank ordering people in a classroom according to height and assigning the shortest person the
number "1", the next shortest person the number "2", etc. is another example of an ordinal scale.

Ordinal data allow us to make relative comparison, but cannot provide the magnitude of
difference. Like the nominal scale, computation of most of the statistics is not appropriate when
the scale type is ordinal.

Interval Scales
With interval data, we can rank order items that are measured plus quantify and compare the
sizes of differences between them, i.e. they possess the properties of magnitude and intervals.
Thus, it is appropriate to compute numerical statistics of data that is measured at the interval
level.

However, like the nominal and ordinal level, interval data does not have a rational (natural) zero
starting point.

For example, temperature, as measured in degrees Fahrenheit or Celsius, constitutes an interval


scale. We can say that a temperature of 40 degrees is higher than a temperature of 30 degrees,
and that an increase from 20 to 40 degrees is twice as much as an increase from 30 to 40 degrees.
However, 0 degrees does not represent an absence of heat.
Ratio Scales
Ratio scales are measurement systems that possess all three properties: magnitude, intervals, and
rational zero. The added power of a rational zero allows ratios of numbers to be meaningfully
interpreted, thus allowing for statements such as x is two times more than y.

For example, A’s height of 180cm is 1.2 times taller than B's height of 150cm. 0cm represents no
height.

4. TYPES OF VARIABLE
A variable is a characteristic of interest about individuals in a population that takes on different
values for different individuals. Gender and weight of a person, are examples of variables
because the value of these quantities vary from one individual to another.

Example: Favourite colour: blue, green, brown, etc.


Weight of girls (in kg): 55, 50, 49, etc.

There are two broad types of variables: qualitative (categorical, attribute) and quantitative
(numerical).

Qualitative Variable
Values of a qualitative variable are data that arise from observations that are separated into
distinct categories. Such data are discrete in nature and there are a finite number of possible
categories into which each observation may fall.
Qualitative data are classified as nominal if there is no natural order between the categories (eg:
eye colour) or ordinal, if an ordering exists (eg: grades of examination results, socio-economic
status)

Example: Socio-economic status : low, middle or high.

Quantitative Variable
Quantitative or numerical data arise when the observations are counts or measurements. The data
are said to be:

discrete if the measurements are integers (eg: number of people in a household, number
of cigarettes smoked per day). When pictured on the number line, the set of all the
possible values consists only of isolated points.

continuous if the measurements can take on any value, usually within some range (eg:
weight). When pictured on the number line, the set of all values consists of intervals.
5. BASIC TERMS
Population and Sample
A population is the entire collection of individuals or measurements about which information is
desired.
A sample is a subset of the population selected for study. A random sample of size n from a
population is a subset of n elements from that population, chosen in such a way that every
possible subset of size n has the same chance of being selected as any other.

Parameters and Statistics


A parameter is a numerical quantity measuring some aspect or characteristic of a population of
scores. The characteristic can be the central tendency of the scores, the dispersion of scores, etc.
in the population. Population parameters are usually unknowns that are estimated by values
computed from samples.

A statistic is a number calculated to describe an important feature of sample data. Statistics are
used as estimates for their corresponding, unknown population parameters.

The following are several parameters of importance in statistical analyses. Greek symbols are
usually used to represent parameters, and the symbols for the associated statistic are given on the
right.

Quantity Parameter Statistic


Mean µ x
Standard deviation σ s
Proportion π p
Correlation ρ r

Univariate, Bivariate, and Multivariate Data


Depending on how many variables we are measuring on the individuals or objects in our sample,
we will have one of the three following types of data sets:
 Measurements made on only one variable per observation (univariate).
 Measurements made on two variables per observation (bivariate).
 Measurements made on many variables per observation (multivariate).

In this course, we will concentrate on univariate and bivariate data sets only.

You might also like