Without understanding how to analyze data, a researcher will not be able to interpret the data, nor draw any conclusions or
recommendations from it

A) Statistical data analysis: finding a test

The following plan outlines the process required to identify the most efficient statistical test or tests for any
research-derived quantitative data. The first five points should be addressed prior to data collection -they are
important since they may serve to frame the measurement approach you adopt to collect the data.

They will also feed into later selection of an analytic strategy. The latter three points apply following data collection
and lead directly into selection of an analytical technique.
1) Identify what type of research you are undertaking, e.g. hypothesis testing, inferring statistical robustness
of a psychometric tool, exploring data, evaluating an intervention of some sort;
2) Identify the type of data you have obtained - are these data categorical, ordinal or interval in nature? (see
glossary for a definition of 'levels of measurement1);
3) If hypothesis testing, decide whether one- or two-tailed tests are most appropriate to answer the research
4) Identify dependent and independent variables (DVs and IVs);
5) Conduct a power analysis (see Chapter 7 for more on this), prior to collecting your data - this will
provide you with a good idea of the sample sizes necessary to ensure accuracy of statistical tests;
6) Screen the data for missing values, outliers and types of distribution (normal, skewed, kurtotic, etc.)
and clean data where necessary via transformation and/or exclusion;
7) Identify sample sizes contained within your data (total sample size, group sizes, etc.);
8) Decide on the test you wish to use.

Finding an appropriate test

Fundamental questions which will narrow the search for a suitable test include:

 Whether your data are categorical, ordinal or interval level. Only interval -level data can draw on parametric
tests; other data types are restricted to non-parametric tests;

 Whether you are interested in only one variable (requiring univariate tests), two variables (requiring bivariate
tests) or more than two variables (requiring multivariate tests).

Having answered these questions, you should be able to identify the test or tests most appropriate to your aims and
data. You can do this by answering the following questions:

1. Are you interested in differences between groups, e.g. pre- and post-intervention, or between work teams? If so,
possible tests will include:

Differences between two groups only by level of data:

 t-tests - if data are interval (unrelated samples t-tests are used where the Samples are unmatched; related,
where samples are matched or include the same participants);

 Mann-Whitney U or Wilcoxon - if data are ordinal (Mann-Whitney U for independent groups, Wilcoxon
for related groups);

 McNemar change test or Chi-square test of independence - if data are nominal (McNemar change test
for related groups, Chi-square test of independence

Differences between more than two groups by level of data:

 ANOVA (analysis of variance) or MANOVA - if data are interval [whether ANOVA (repeated measures
for related groups, one-way ANOVA for independent groups) or MANOVA is selected will depend on the
number of independent and dependent variables of interest - if two or more of each, MANOVA will be

 Kruskal-Wallis or Friedman - if data are ordinal (Kuskal-Wallis if independent groups, Friedman if

related groups);

 Complex chi-square - if data are nominal.

2. Are you interested in relationships between factors/variables, e.g. between job satisfaction and performance
at work? If so, possible tests include:

 Pearson's product moment correlation co-efficient - if data are interval;

 Spearman rank correlation co-efficient - if data are ordinal;
 Phi co-efficient - if data are nominal

3. Are you exploring patterns in the data set, e.g. in a questionnaire measure which purports to lap into various
different 'constructs'? If so possible tests include;

Exploratory principal components factor analysis - looks for groups of variables that share common variance,
from the assumption that these groupings are 'caused' by the same unobservable (latent) factors. Has some tight
restrictions in terms of type, level of data and sample size;

 Cluster analysis - groups variables together on the basis of similarity of the patterns of scores on them.
Less restrictive than factor analysis in terms of property requirements of the data;

 Multi-dimensional scaling - looks for variables that share similar patterns of scores across respondents,
and draws a plot of variables so that those^ responding most similarly are located proximally on the plot.
Again, less restrictive than factor analysis. A variation of this technique with even less restrictions on data
type is multi-dimensional scalogram analysis (MSA)

4. Are you interested in categorizing participants according to certain characteristics, e.g. whether two groups
of workers are best distinguished by variations in commitment, or by job and organizational tenure?

 Discriminant function analysis - requires two or more continuous predictor variables and attempts to
categorize cases according to these predictor variables into a categorical dependent variable.

5. Do you wish to predict one or more outcome factors using the data you have collected, e.g. predicting
individual performance by examining workers’ affective reactions to the workplace?

 Simple regression - if you have one interval-level predictor variable and one interval-level outcome
 Multiple regression - if you have one interval-level outcome variable and more than one interval-,
ordinal or categorical-level predictor variable and wish to determine which predictor variable(s) best
predict(s) the outcome variable;
 Logistic regression - if you have one categorical outcome variable and two or more categorical or
interval-level predictor variables and need to determine the best predictor variable(s);
 Discriminant function analysis - if you have more than two interval-level predictor variables and a
categorical outcome variable.

6. Do you wish to infer the statistical robustness of a questionnaire scale,

e.g. an existing scale such as the OPQ or 16-PF, or a new scale developed to measure, say, workplace integrity?

 Alpha, split-half, Guttman reliability analysis - produces a reliability coefficient by examining

correlations between scale items or between half-scale means;

 Confirmatory factor analysis - if the factor structure is established, e.g. with the 16-PF's sixteen
personality factors, the number of expected factors can be specified;
 Exploratory factor analysis - if the factor structure is unknown, or alternative factor solutions are
suspected, can be used to identify patterns of scale items.

B) Statistical data analysis: the tests

Table 1. Bivariate data analysis: applications, restrictions and interpretation

Test Used to ... Requirements/restrictions

t-test Examine differences between two  DV interval- or ratio-level
groups of participants, e.g. before  Sensitive to outliers
and immediately after an  Dependent variable normally distributed
organizational intervention

Mann-Whitney U test Examine differences between two DV ordinal level

independent groups of participants

Wilcoxon test Examine differences between two DV ordinal level

related groups of participants

McNemar change test Examine differences between two DV nominal level

related groups of participants

Chi-square test of independence Examine differences between two DV nominal level

independent groups of participants

ANOVA Examine variable differences  DV interval- or ratio-level Sensitive to

between more than two groups of outliers DV normally distributed
participants, e.g. before,  Each group randomly sampled from the
immediately after and 4 months population Linear relationships between all
after an organizational dependent variables
intervention; the likelihood of staff
in each

Phi co-efficient Examine relationships between Nominal level variables

two variables

Spearman rank correlation coefficient Examine relationships between two Ordinal level variables

Correlational analysis Examine relationships between two  Interval-level variables

variables, e.g. tenure and job  Linear relationship between variables
satisfaction; perception of
organizational communication and
production performance; underpins
multiple regression and factor

Table 2 Multivariate data analysis: applications, restrictions and interpretation

Test Used to... Requirements/restrictions

Multiple regression Predict outcomes, using one or more  Interval-level DV Linear relationship between variables
predictor variables, e.g. predicting  Homoscedasticity (residuals and variables should be
staff turnover from age, normally distributed)
organizational commitment, rating of  Large sample (see Chapter 7 for more on sampling
supervisor issues)

Exploratory factor Explore patterns in a data set by  Interval-level data

analysis (principal examining correlations between  Normally distributed data Number of items in analysis
components) variables and describing these should allow for at least 3 items per hypothesized
patterns as parsimoniously as factor Sample should have 3 times the number of
possible, e.g. to explore the factor members as items
structure of a new measure of  Items must adequately cover all areas of the research
corporate identity domain (i.e. be content valid)

Confirmatory factor Confirmatory factor analysis As above

analysis (principal (principal components or maximum
components or maximum likelihood)
likelihood) Test the adequacy of a theoretical
prediction by examining variable
correlations, e.g. to test the factor
structure of the 'Big 5' personality

Hierarchical cluster analysis Identify homogeneous groups of cases Predictor variables must be interval-level and may need to be
based on selected attributes, e.g. standardized so that they fall on the same scale
attempting to predict whether or not
someone is married according to their
age, sociability rating, and total number
of friends

K- means cluster analysis As for hierarchical cluster analysis, but Knowledge of expected number of clusters
you must know beforehand how many
clusters you expect

Discriminant function Identify the distinguishing  Interval-level data

analysis characteristics of one or more groups  Normal distributions
based on several potentially  Sensitive to outliers
discriminating measures, e.g. identify  Each group contains >20 people if there are less than 5
whether two work teams are best predictors
differentiated in terms of cohesiveness  No individual can appear in both groups The spread of
or structure scores on predictors should be roughly equal

Reliability analysis Examine the internal consistency of a  Interval-level data

proposed or existing scale containing a  Normally-distributed data
number of given items, e.g. with a new  Construct validity of proposed scale items
measure of organizational climate, or an
existing measure of job satisfaction

Multidimensional scaling Determine the pattern or structure  Predictor variables must be interval -level
(not MSA) contained in a matrix of observations  Multi-dimensional scalogram analysis (MSA) is a variant
and present this in a psychologically of this approach and does not require interval-level data
meaningful way, e.g. by exploring
people's representations of their
psychological contract at work

MANOVA Examine differences between more than  DV interval- or ratio-level Sensitive to outliers
two groups of participants on more than  Dependent variable normally distributed
one variable, i.e. on a combination of  Each group randomly sampled from the population
variables, thereby addressing variable  Linear relationships between all dependent variables
combinations or interactions by group

