Stats Methods

STATS METHODS
CHAPTER ONE
Statistics is the branch of Mathematics that focusses on the analysis of data. Data is
the raw observations made by a researcher and statistics provide the mathematical
techniques that transform this raw data into useful information.
DATA – STATISTICS – INFORMATION
DESCRIPTIVE STATISTICS
Descriptive statistics contains techniques used to describe a data set. It

focusses on collection, summarizing and characterization of data.
Examples of descriptive techniques are Tables, Charts (Graphs),
Averages (Means), Variances, etc.
INFERENTIAL STATISTICS
Inference means that we are drawing conclusions about a population

parameter by using a sample statistic. The mathematics of probability
plays an important role here.
POPULATION: A population is the total number of observations (N) of a

phenomena that we study.
SAMPLE: A sample is ’n smaller subset (n) taken from the population.
Note that: the population size =N and the sample size = n (with n less
than N).
POPULATION SAMPLE
COMPANY WITH 100 000 CHOOSE 1000 FROM TOTAL
CLIENTS
UNIVERSITY WITH 20 000 CHOOSE 500 FROM TOTAL

STUDENTS
N=100 000 n=1000
N=20 000 n=500
PARAMETER AND A STATISTIC: A parameter is any measure

calculated from a population, while a statistic is a measure calculated
from a sample.
GROWTH OF STATISTICS
Statistical techniques developed from disciplines such as Mathematics,

Computer Science and Physics.
The first Statistics department was established by Karl Pearson in 1911

in London.
Other statisticians who played an important role in the development of

statistics as a discipline includes R.A. Fisher, E.S. Pearson, J. Neyman,
W. Gosset, J. Tukey, and many more.
Statistical techniques can be applied in a wide variety of areas, eg.

Management, Marketing, Psychology, Economics, Agriculture, Sport,
Medicine, Engineering, Science and many more areas where data exist.
Software for statistical analysis includes Python, SAS, SPSS, R,

Statistica, etc. The availability of these packages makes it possible for
many researchers to apply statistics in their research.
Organizations in South Africa applying Statistics are Government

institutes (ARC, CSIR, Stats SA), Insurance companies (SANLAM, Old
Mutual), Banks (FNB, ABSA, Nedbank), Universities, etc.
Statistician is a career as unlimited as our minds and there exist many

possibilities for theoretical developments and practical applications
2013 was the International year of Statistics. Statistics was celebrated by
over 2000 organization in over 120 countries. For more information see
the website http://www.worldofstatistics.org/
Harvard Business review – “Data scientist: The sexiest job of the 21st
century”
WHY DO WE TAKE SAMPLES?
It is important that the samples we choose are representative of the

population we are studying
 To save money
 Population may be too large to work with
 To save time
SAMPLING METHODS
1. SIMPLE RANDOM SAMPLING

2. SYSTEMATIC SAMPLING
3. STRATIFIED SAMPLING
4. CLUSTER SAMPLING
SIMPLE RANDOM SAMPLING
 Sampling with replacement (probability = n/1

 Sampling without replacement
second selection (probability = 1/n-1)
SYSTEMATIC SAMPLING
Divide population (N) into groups of K

K = N/n
Choose systematically a sample of (n=13) from population (N=181)
K = 181/13
= 13.9
= 14
Choose a random number between 1 – 14 (9) , then we choose
systematically every 14th number after 9 as an observation in our sample
(9 ;23 ;27 ;51; 65; 79; 93; 107; 121; 135; 149; 163; 177)
n = 13
STRATIFIED SAMPLING
Divide population (N) according to some strata
Items / people in strata are the same (homogeneous)
Choose a sample (n=13) of a population (N=181)
The students in the sample must be representative of their year of study
Strata for this example is year of study. Choose a sample by choosing
students from each of the strata.
CLUSTER SAMPLING
Divide the population into clusters
Items / people in clusters are different (heterogeneous)
Choose a sample of students from the population N = 181 the students
in the sample must be representative of the year of study of the students
The clusters in this example are represented by provinces, choose

sample by selecting a few clusters, for example Western Cape and
Gauteng. These two clusters now form the sample, which both contain,
1st 2nd and 3rd years, which is representative of the year of study of
students.
CHAPTER 2
VARIABLES
Variables refer to measurements taken on any phenomena. These

measurements differ/ vary from individual to individual and therefore we
call it a variable. We will consider two types of variables, namely
categorical and numerical (discrete or continuous) variables.
CATEGORICAL (QUALITATIVE) VARIABLES

(Has categories) (NOMINAL / ORDINAL)
 Eye colour: Brown, Green, Blue

 Type of car: Mazda, Ford, Toyota
 Gender: Male, Female
NUMERICAL (QUANTITATIVE) DISCRETE VARIABLES

Contains only integers (whole numbers) ( RATIO / INTERVAL SCALE)
 Number of students: 1, 10, 100, 181

 Number of pens: 1, 2 or 4
NUMERICAL (QUANTITATIVE) CONTINUOUS VARIABLES

Can contain integers (whole numbers) and numbers with decimals
RATIO SCALE / INTERVAL SCALE
 Weight: 60kg, 70.5kg

 Length: 1.5m, 1.99m, 2m
TYPES OF MEASUREMENT SCALES FOR DATA
NOMINAL
For categorical variables
There is no order in categories
 Yes / No
 Male / Female
 Mazda, Ford, BMW, Fiat
ORDINAL
For categorical variables
Order in categories
 Small / Medium / Large
 Good / Better / Best
INTERVAL SCALE
For numerical variables
There is no origin of measurement
 Evaluate your lecturer’s lecturing skills by giving him a mark
between 1 and 5. In this case students may give him different
ratings.
 There is no origin for their ratings.
 Measuring the temperature (can be negative, zero or positive).
 No origin to start from.
RATIO SCALE
For numerical data

There is an origin (zero) of measurement
 Measure your lecturer’s length in meter. In this case all students
will give the same measure for his length.
 There is an origin for their measurements.
TABLES AND GRAPHS FOR CATEGORICAL VARIABLES
Summary table
Bar chart
Pie chart
Pareto diagram
SUMMARY TABLE
Suppose we measured the variable: year of study of students.

Categories of this variable is: 1st year, 2nd year, 3rd year and Post
Graduate (P.G.).
The following is a summary table of n=100 student according to year of
study.
The frequency is the number of students in that category
BAR GRAPH
PIE CHART
The pie chart is a circle taking up 360.

Each category represents a specific part of the pie chart.
For 1st years the angle size is

40/100 X 360 =144 degrees
For 2nd years the angle size is

35/100 X 360 = 126 degrees
For 3rd years the angle size is

20/100 X 360 = 72 degrees
For post graduates the angle size is

5/100 X 360 = 18 degrees
PARETO DIAGRAM
Cumulative percentage distribution for the year of study data
A pareto diagram is the percentage and cumulative percentage on one

graph.
Year of study was ordered from large to small not 1 st year to post grad
TABLES AND GRAPHS FOR TWO CATEGORICAL VARIABLES
CONTINGINCY TABLE
Consider the variables gender and year of study
SIDE BY SIDE BAR GRAPH

Graphically display the contingency table
ORGANIZING NUMERICAL DATA
INCLUDES BOTH DISCRETE AND CONTINUOUS DATA
ORDERED ARRAY
Arranges the data from small to large
6; 17; 19; 22; 28; 30; 50
STEM AND LEAF DISPLAY

Uses the ordered array above
From this we can see the shape of the data and identify outliers
If there are decimals round data to one decimal

3 | 5 refers to 3.5
TABLES AND GRAPHS FOR NUMERICAL DATA
FREQUENCY DISTRIBUTION TABLE
Order data
17 20 21 22 25 30 30 31 41 47 50 57 80 81
Choose number of classes represented by intervals

Large data sets – 15
Small data sets – 5
Interval width
Range (largest minus smallest data value) / number of classes
= (81-17) / 5
= 64 / 5
= 12.8
= 13
Choose lowest data value as the first boundary interval (17). Display
frequencies that fall within each boundary.
RELATIVE FREQUENCY DISTRIBUTION AND PERCENTAGE
DISTRIBUTION
RELATIVE FREQUENCY COLUMN

= frequency / number of observations
PERCENTAGE DISTRIBUTION COLUMN

= Relative frequency x 100
CUMULATIVE FREQUENCY DISTRIBUTION

Cumulative basically means we add the numbers less than the lower
boundary each time.
HISTOGRAM
Graph for numerical data summerized in a frequency, relative frequency
and percentage distribution
POLYGON
Line graph for numerical data summarized in a frequency, relative

frequency and percentage distribution
Calculate the midpoint for each interval and plot the frequency (for
example) against the midpoints
THE CUMMULATIVE PERCENTAGE POLYGON (OGIVE)
Line graph for numerical data summarized cumulative percentage
distribution
From this graph we can approximate the quartiles and percentiles from
the variables
GRAPHICAL PRESENTATION OF BIVARIABLE NUMERICAL DATA

A SCATTER PLOT IS USED TO DISPLAY TWO OR MORE
NUMERICAL VARIABLES GRAPHICALLY
CHAPTER THREE
NUMERICAL DISCRIPTIVE MEASURES
MEASURES OF CENTRAL TENDENCIES
MEAN
MEDIAN (position)
MODE (most)

Stats Methods

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stats Methods

Uploaded by

Copyright:

Available Formats

STATS METHODS

DATA – STATISTICS – INFORMATION

Descriptive statistics contains techniques used to describe a data set. It

Inference means that we are drawing conclusions about a population

POPULATION: A population is the total number of observations (N) of a

UNIVERSITY WITH 20 000 CHOOSE 500 FROM TOTAL

PARAMETER AND A STATISTIC: A parameter is any measure

Statistical techniques developed from disciplines such as Mathematics,

The first Statistics department was established by Karl Pearson in 1911

Other statisticians who played an important role in the development of

Statistical techniques can be applied in a wide variety of areas, eg.

Software for statistical analysis includes Python, SAS, SPSS, R,

Organizations in South Africa applying Statistics are Government

Statistician is a career as unlimited as our minds and there exist many

WHY DO WE TAKE SAMPLES?

It is important that the samples we choose are representative of the

1. SIMPLE RANDOM SAMPLING

SIMPLE RANDOM SAMPLING

 Sampling with replacement (probability = n/1

Divide population (N) into groups of K

The clusters in this example are represented by provinces, choose

Variables refer to measurements taken on any phenomena. These

CATEGORICAL (QUALITATIVE) VARIABLES

 Eye colour: Brown, Green, Blue

NUMERICAL (QUANTITATIVE) DISCRETE VARIABLES

 Number of students: 1, 10, 100, 181

NUMERICAL (QUANTITATIVE) CONTINUOUS VARIABLES

 Weight: 60kg, 70.5kg

TYPES OF MEASUREMENT SCALES FOR DATA

For numerical data

TABLES AND GRAPHS FOR CATEGORICAL VARIABLES

Suppose we measured the variable: year of study of students.

The pie chart is a circle taking up 360.

For 1st years the angle size is

For 2nd years the angle size is

For 3rd years the angle size is

For post graduates the angle size is

Cumulative percentage distribution for the year of study data

A pareto diagram is the percentage and cumulative percentage on one

SIDE BY SIDE BAR GRAPH

STEM AND LEAF DISPLAY

If there are decimals round data to one decimal

FREQUENCY DISTRIBUTION TABLE

Choose number of classes represented by intervals

RELATIVE FREQUENCY COLUMN

PERCENTAGE DISTRIBUTION COLUMN

CUMULATIVE FREQUENCY DISTRIBUTION

Line graph for numerical data summarized in a frequency, relative

GRAPHICAL PRESENTATION OF BIVARIABLE NUMERICAL DATA

MEASURES OF CENTRAL TENDENCIES

You might also like