Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 22

STATS METHODS

CHAPTER ONE

Statistics is the branch of Mathematics that focusses on the analysis of data. Data is
the raw observations made by a researcher and statistics provide the mathematical
techniques that transform this raw data into useful information.

DATA – STATISTICS – INFORMATION

DESCRIPTIVE STATISTICS

Descriptive statistics contains techniques used to describe a data set. It


focusses on collection, summarizing and characterization of data.
Examples of descriptive techniques are Tables, Charts (Graphs),
Averages (Means), Variances, etc.

INFERENTIAL STATISTICS

Inference means that we are drawing conclusions about a population


parameter by using a sample statistic. The mathematics of probability
plays an important role here.

POPULATION: A population is the total number of observations (N) of a


phenomena that we study.
SAMPLE: A sample is ’n smaller subset (n) taken from the population.

Note that: the population size =N and the sample size = n (with n less
than N).
POPULATION SAMPLE
COMPANY WITH 100 000 CHOOSE 1000 FROM TOTAL
CLIENTS

UNIVERSITY WITH 20 000 CHOOSE 500 FROM TOTAL


STUDENTS

N=100 000 n=1000
N=20 000 n=500

PARAMETER AND A STATISTIC: A parameter is any measure


calculated from a population, while a statistic is a measure calculated
from a sample.
GROWTH OF STATISTICS

Statistical techniques developed from disciplines such as Mathematics,


Computer Science and Physics.

The first Statistics department was established by Karl Pearson in 1911


in London.

Other statisticians who played an important role in the development of


statistics as a discipline includes R.A. Fisher, E.S. Pearson, J. Neyman,
W. Gosset, J. Tukey, and many more.

Statistical techniques can be applied in a wide variety of areas, eg.


Management, Marketing, Psychology, Economics, Agriculture, Sport,
Medicine, Engineering, Science and many more areas where data exist.

Software for statistical analysis includes Python, SAS, SPSS, R,


Statistica, etc. The availability of these packages makes it possible for
many researchers to apply statistics in their research.

Organizations in South Africa applying Statistics are Government


institutes (ARC, CSIR, Stats SA), Insurance companies (SANLAM, Old
Mutual), Banks (FNB, ABSA, Nedbank), Universities, etc.

Statistician is a career as unlimited as our minds and there exist many


possibilities for theoretical developments and practical applications
2013 was the International year of Statistics. Statistics was celebrated by
over 2000 organization in over 120 countries. For more information see
the website http://www.worldofstatistics.org/

Harvard Business review – “Data scientist: The sexiest job of the 21st
century”

WHY DO WE TAKE SAMPLES?

It is important that the samples we choose are representative of the


population we are studying
 To save money
 Population may be too large to work with
 To save time

SAMPLING METHODS

1. SIMPLE RANDOM SAMPLING


2. SYSTEMATIC SAMPLING
3. STRATIFIED SAMPLING
4. CLUSTER SAMPLING

SIMPLE RANDOM SAMPLING

 Sampling with replacement (probability = n/1


 Sampling without replacement
second selection (probability = 1/n-1)
SYSTEMATIC SAMPLING

Divide population (N) into groups of K


K = N/n
Choose systematically a sample of (n=13) from population (N=181)
K = 181/13
= 13.9
= 14
Choose a random number between 1 – 14 (9) , then we choose
systematically every 14th number after 9 as an observation in our sample
(9 ;23 ;27 ;51; 65; 79; 93; 107; 121; 135; 149; 163; 177)
n = 13

STRATIFIED SAMPLING
Divide population (N) according to some strata
Items / people in strata are the same (homogeneous)
Choose a sample (n=13) of a population (N=181)
The students in the sample must be representative of their year of study
Strata for this example is year of study. Choose a sample by choosing
students from each of the strata.
CLUSTER SAMPLING
Divide the population into clusters
Items / people in clusters are different (heterogeneous)
Choose a sample of students from the population N = 181 the students
in the sample must be representative of the year of study of the students

The clusters in this example are represented by provinces, choose


sample by selecting a few clusters, for example Western Cape and
Gauteng. These two clusters now form the sample, which both contain,
1st 2nd and 3rd years, which is representative of the year of study of
students.
CHAPTER 2
VARIABLES

Variables refer to measurements taken on any phenomena. These


measurements differ/ vary from individual to individual and therefore we
call it a variable. We will consider two types of variables, namely
categorical and numerical (discrete or continuous) variables.

CATEGORICAL (QUALITATIVE) VARIABLES


(Has categories) (NOMINAL / ORDINAL)

 Eye colour: Brown, Green, Blue


 Type of car: Mazda, Ford, Toyota
 Gender: Male, Female

NUMERICAL (QUANTITATIVE) DISCRETE VARIABLES


Contains only integers (whole numbers) ( RATIO / INTERVAL SCALE)

 Number of students: 1, 10, 100, 181


 Number of pens: 1, 2 or 4

NUMERICAL (QUANTITATIVE) CONTINUOUS VARIABLES


Can contain integers (whole numbers) and numbers with decimals
RATIO SCALE / INTERVAL SCALE

 Weight: 60kg, 70.5kg


 Length: 1.5m, 1.99m, 2m

TYPES OF MEASUREMENT SCALES FOR DATA

NOMINAL
For categorical variables
There is no order in categories
 Yes / No
 Male / Female
 Mazda, Ford, BMW, Fiat

ORDINAL
For categorical variables
Order in categories
 Small / Medium / Large
 Good / Better / Best

INTERVAL SCALE
For numerical variables
There is no origin of measurement
 Evaluate your lecturer’s lecturing skills by giving him a mark
between 1 and 5. In this case students may give him different
ratings.
 There is no origin for their ratings.
 Measuring the temperature (can be negative, zero or positive).
 No origin to start from.
RATIO SCALE

For numerical data


There is an origin (zero) of measurement
 Measure your lecturer’s length in meter. In this case all students
will give the same measure for his length.
 There is an origin for their measurements.

TABLES AND GRAPHS FOR CATEGORICAL VARIABLES

Summary table
Bar chart
Pie chart
Pareto diagram

SUMMARY TABLE

Suppose we measured the variable: year of study of students.


Categories of this variable is: 1st year, 2nd year, 3rd year and Post
Graduate (P.G.).
The following is a summary table of n=100 student according to year of
study.
The frequency is the number of students in that category

BAR GRAPH
PIE CHART

The pie chart is a circle taking up 360.


Each category represents a specific part of the pie chart.

For 1st years the angle size is


40/100 X 360 =144 degrees

For 2nd years the angle size is


35/100 X 360 = 126 degrees

For 3rd years the angle size is


20/100 X 360 = 72 degrees

For post graduates the angle size is


5/100 X 360 = 18 degrees

PARETO DIAGRAM

Cumulative percentage distribution for the year of study data

A pareto diagram is the percentage and cumulative percentage on one


graph.
Year of study was ordered from large to small not 1 st year to post grad
TABLES AND GRAPHS FOR TWO CATEGORICAL VARIABLES

CONTINGINCY TABLE
Consider the variables gender and year of study

SIDE BY SIDE BAR GRAPH


Graphically display the contingency table
ORGANIZING NUMERICAL DATA
INCLUDES BOTH DISCRETE AND CONTINUOUS DATA

ORDERED ARRAY
Arranges the data from small to large
6; 17; 19; 22; 28; 30; 50

STEM AND LEAF DISPLAY


Uses the ordered array above
From this we can see the shape of the data and identify outliers

If there are decimals round data to one decimal


3 | 5 refers to 3.5
TABLES AND GRAPHS FOR NUMERICAL DATA

FREQUENCY DISTRIBUTION TABLE

Order data
17 20 21 22 25 30 30 31 41 47 50 57 80 81

Choose number of classes represented by intervals


Large data sets – 15
Small data sets – 5

Interval width
Range (largest minus smallest data value) / number of classes
= (81-17) / 5
= 64 / 5
= 12.8
= 13
Choose lowest data value as the first boundary interval (17). Display
frequencies that fall within each boundary.
RELATIVE FREQUENCY DISTRIBUTION AND PERCENTAGE
DISTRIBUTION

RELATIVE FREQUENCY COLUMN


= frequency / number of observations

PERCENTAGE DISTRIBUTION COLUMN


= Relative frequency x 100

CUMULATIVE FREQUENCY DISTRIBUTION


Cumulative basically means we add the numbers less than the lower
boundary each time.
HISTOGRAM
Graph for numerical data summerized in a frequency, relative frequency
and percentage distribution
POLYGON

Line graph for numerical data summarized in a frequency, relative


frequency and percentage distribution
Calculate the midpoint for each interval and plot the frequency (for
example) against the midpoints
THE CUMMULATIVE PERCENTAGE POLYGON (OGIVE)
Line graph for numerical data summarized cumulative percentage
distribution
From this graph we can approximate the quartiles and percentiles from
the variables

GRAPHICAL PRESENTATION OF BIVARIABLE NUMERICAL DATA


A SCATTER PLOT IS USED TO DISPLAY TWO OR MORE
NUMERICAL VARIABLES GRAPHICALLY
CHAPTER THREE
NUMERICAL DISCRIPTIVE MEASURES

MEASURES OF CENTRAL TENDENCIES

MEAN
MEDIAN (position)
MODE (most)

You might also like