Professional Documents
Culture Documents
Stats Methods
Stats Methods
CHAPTER ONE
Statistics is the branch of Mathematics that focusses on the analysis of data. Data is
the raw observations made by a researcher and statistics provide the mathematical
techniques that transform this raw data into useful information.
DESCRIPTIVE STATISTICS
INFERENTIAL STATISTICS
Note that: the population size =N and the sample size = n (with n less
than N).
POPULATION SAMPLE
COMPANY WITH 100 000 CHOOSE 1000 FROM TOTAL
CLIENTS
N=100 000 n=1000
N=20 000 n=500
Harvard Business review – “Data scientist: The sexiest job of the 21st
century”
SAMPLING METHODS
STRATIFIED SAMPLING
Divide population (N) according to some strata
Items / people in strata are the same (homogeneous)
Choose a sample (n=13) of a population (N=181)
The students in the sample must be representative of their year of study
Strata for this example is year of study. Choose a sample by choosing
students from each of the strata.
CLUSTER SAMPLING
Divide the population into clusters
Items / people in clusters are different (heterogeneous)
Choose a sample of students from the population N = 181 the students
in the sample must be representative of the year of study of the students
NOMINAL
For categorical variables
There is no order in categories
Yes / No
Male / Female
Mazda, Ford, BMW, Fiat
ORDINAL
For categorical variables
Order in categories
Small / Medium / Large
Good / Better / Best
INTERVAL SCALE
For numerical variables
There is no origin of measurement
Evaluate your lecturer’s lecturing skills by giving him a mark
between 1 and 5. In this case students may give him different
ratings.
There is no origin for their ratings.
Measuring the temperature (can be negative, zero or positive).
No origin to start from.
RATIO SCALE
Summary table
Bar chart
Pie chart
Pareto diagram
SUMMARY TABLE
BAR GRAPH
PIE CHART
PARETO DIAGRAM
CONTINGINCY TABLE
Consider the variables gender and year of study
ORDERED ARRAY
Arranges the data from small to large
6; 17; 19; 22; 28; 30; 50
Order data
17 20 21 22 25 30 30 31 41 47 50 57 80 81
Interval width
Range (largest minus smallest data value) / number of classes
= (81-17) / 5
= 64 / 5
= 12.8
= 13
Choose lowest data value as the first boundary interval (17). Display
frequencies that fall within each boundary.
RELATIVE FREQUENCY DISTRIBUTION AND PERCENTAGE
DISTRIBUTION
MEAN
MEDIAN (position)
MODE (most)