Item Analysis & Reliability

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 57

Item analysis, reliability and

validity; further multivariate stats


• Using statistics covered in the main sessions
(including correlation and factor analysis) to
explore reliability and validity
• A introduction to scale reliability stats
(Cronbach’s alpha)
• Mixed methods as an approach to validity
• Some pointers to additional material
– the Rasch model
– further multivariate methods
Item analysis

• Statistical checks on the items that


make up a test
Item analysis
• Used with items which can be marked
as right or wrong
• And when the set of item scores is
added to give a single score
• Assumption: items measure a single
trait
Useful stats

• Facility

• Discrimination
Item difficulty

Facility = the fraction of persons tested


who answered the item correctly (p).

It is usually recommended that p values


should fall within the range 0.2 to 0.8.
Discrimination

• Does the item discriminate between high


scorers and low scorers on the whole test?

• Can be tested via the item-total correlation


Some revision

Spearman

Pearson
Correlation
IQ and attitude to school
180

160

140

120

100

80
IQ score

60

40
0 10 20

attitude to school
Correlation
Correlations

attitude to
IQ s core s chool
IQ s core Pears on Correlation 1.000 .564**
Sig. (2-tailed) . .000
N 40 40
attitude to s chool Pears on Correlation .564** 1.000
Sig. (2-tailed) .000 .
N 40 40
**. Correlation is significant at the 0.01 level (2-tailed). IQ and attitude to school
180

160

140

120

100

IQ score 80

60

40
0 10 20

attitude to school
No Correlation
Correlations

random nos 1 random nos 2


random nos 1 Pears on Correlation 1.000 .087
Sig. (2-tailed) . .595
N 40 40
random nos 2 Pears on Correlation .087 1.000
Sig. (2-tailed) .595 .
N 40 41

12

10

random nos 2
0

-2
0 2 4 6 8 10 12

random nos 1
Strong correlation
Correlations

IQ s core NEWIQ
IQ s core Pears on Correlation 1.000 .995**
Sig. (2-tailed) . .000
N 40 40
NEWIQ Pears on Correlation .995** 1.000
Sig. (2-tailed) .000 . 180
N 40 40
**. Correlation is s ignificant at the 0.01 level 160

(2-tailed).
140

120

100

80
IQ score

60

40
40 60 80 100 120 140 160

NEWIQ
Discrimination

• Does the item discriminate between high


scorers and low scorers on the whole test?

• Can be tested via the item-total correlation


Part of Alpha output from SPSS

Item-Total Statistics

Scale Corrected Cronbach's


Scale Mean if Variance if Item-Total Alpha if Item
Item Deleted Item Deleted Correlation Deleted
ite0001 3.1071 8.766 .722 .861
ite0003 2.9643 7.888 .897 .844
ite0005 2.8929 8.099 .749 .855
ite0007 2.9643 8.332 .705 .859
ite0009 2.9643 8.406 .674 .861
ite0002 2.9643 9.295 .324 .886
ite0004 2.8214 8.671 .503 .875
ite0006 2.8571 9.238 .308 .889
ite0008 2.7500 8.935 .402 .883
ite0010 2.9643 7.888 .897 .844
Interpreting the scores
• A point biserial correlation:
above 0.3 is considered ‘good’;
0.2 to 0.3 is considered ‘workable’
below 0.2 is considered unacceptable
Discrimination Index

(N in top 27% - (N in bottom 27%


getting it right) getting it right)

N in a 27% group
What makes a good question?

FACILITY
Discrimination
Below 40% 40%-60% Above 60%

> 0.40 Difficult ACCEPTABLE Easy

0.30-0.39 Marginal Improvable Marginal

0.20-0.29 Reject Marginal Reject

<0.20 REJECT
Scale qualities
Reliability
• the extent to which the scores on the test
are measured consistently

– Parallel-form reliability
– Split-half reliability
– Internal consistency reliability

– Test-retest reliability
– Inter-rater reliability
• Parallel-form reliability

Correlations

A1 A2
A1 Pearson Correlation 1 .721**
Sig. (2-tailed) . .000
N 28 25
A2 Pearson Correlation .721** 1
Sig. (2-tailed) .000 .
N 25 25
**. Correlation is significant at the 0.01 level
(2-tailed).
• Split-half reliability
• Spearman-Brown formula
Correlations

ODD EVEN
ODD Pearson Correlation 1 .807**
Sig. (2-tailed) . .000
N 28 28
EVEN Pearson Correlation .807** 1
Sig. (2-tailed) .000 .
N 28 28
**. Correlation is significant at the 0.01 level
(2-tailed).
****** Method 1 (space saver) will be used for this analysis ******
_

R E L I A B I L I T Y A N A L Y S I S - S C A L E (S P L I T)

Reliability Coefficients

N of Cases = 28.0 N of Items = 10

Correlation between forms =.8068 Equal-length Spearman-Brown = .8931

Guttman Split-half = .8828 Unequal-length Spearman-Brown =.8931

5 Items in part 1 5 Items in part 2

Alpha for part 1 = .8926 Alpha for part 2 = .6125


• Alpha
R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L P H A)

N of Cases = 28.0

N of
Statistics for Mean Variance Std Dev Variables
Scale 3.2500 10.4167 3.2275 10

Item-total Statistics

Scale Scale Corrected


Mean Variance Item- Squared Alpha
if Item if Item Total Multiple if Item
Deleted Deleted Correlation Correlation Deleted

ITE0001 3.1071 8.7659 .7222 . .8610


ITE0003 2.9643 7.8876 .8968 . .8437
ITE0005 2.8929 8.0992 .7487 . .8547
ITE0007 2.9643 8.3320 .7052 . .8587
ITE0009 2.9643 8.4061 .6744 . .8611
ITE0002 2.9643 9.2950 .3244 . .8863
ITE0004 2.8214 8.6706 .5027 . .8746
ITE0006 2.8571 9.2381 .3080 . .8892
ITE0008 2.7500 8.9352 .4015 . .8827
ITE0010 2.9643 7.8876 .8968 . .8437

Reliability Coefficients 10 items

Alpha = .8782 Standardized item alpha = .8839


Internal consistency and
dimensionality

• Surprisingly, internally consistent scales


may not be uni-dimensional
High alpha – but unidimensional?
• High alpha can be achieved if there are
sets of questions that correlate with each
other within a set but not necessarily
across the sets
• Follow alpha by factor analysis (then
perhaps alpha for each factor separately)
• Report factors
• Consider whether the overall scale still has
meaning
Factor analysis
A B C D E
A 1 0.8 0.9 0.1 0.2
B 1 0.6 0.3 0.1
C 1 0.3 0.1
D 1 0.9
E 1
Factor analysis

A B C D E
A 1 0.8 0.9 0.1 0.2
B 1 0.6 0.3 0.1
C 1 0.3 0.1
D 1 0.9
E 1
Factor structure of the scale
investigated above - Alpha = 0.88
Rotated Component Matrixa

Component
1 2 3
ITE0005 .897
ITE0010 .873
ITE0003 .873
ITE0006 .724
ITE0009 .661
ITE0008 .929
ITE0007 .750
ITE0004
ITE0002 .895
ITE0001 .731
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 4 iterations.
Links between question difficulty
and scale reliability

qu2

qu1
Negative alphas
Are unusual but can be the result of
• The problem of questions that are too
easy
• Errors in coding
• Sampling error
OR
• That the questions really don’t measure
the same thing and right answers to some
go with wrong answers to others)
• Classical test Theory

• Item Response Theory


Rasch Model

• http://www.rasch.org/memo42.htm

• http://www.rasch.org/memo62.htm
Validity
• Does the test measure what it sets out to
measure

• Concurrent validity
• Discriminant validity
• Predictive validity

• Reliability and validity


Wider issues of validity in
quantitative studies
• Threats to validity in experimental designs

• A simple design:
• OXO
Issues of validity at the design
stage – experimental designs
Internal threats to validity
• History
• Maturation
• Testing
• Instrumentation
• Selection
• Statistical regression
• Mortality
External threats to validity
• Interaction of selection bias and
treatment
• Interaction between testing and
treatment
• Reaction to being in an experiment
Hawthorne Effect
• Treatment group do better because they
are in a privileged group
Hawthorne Effect
• Treatment group do better because they
are in a privileged group – or do they?

The Hawthorne defect: Persistence of a flawed theory

“Like other hallowed but unproven concepts in psychology, the so-called Hawthorne
effect has a life of its own.”

By Berkeley Rice

http://www.cs.unc.edu/~stotts/204/nohawth.html
Compensatory effect
John Henry Effect
• Control group do better because they
are not going to let the ‘smarties’ get the
better of them
True experiment

X Oe
R
- Oc
Quasi experiments – no R

O X O
O O
Mixed methodology
• Use quasi experimental designs to reveal
possible consequences of actions
• Use interpretative designs to check the
causal relationship between the actions
and the consequences – from several
perspectives
– (including enquiring about the known threats
to validity – history, maturation etc)
Another view of mixed
methodologies
Linking qualitative and quantitative data

• qualitative work gives rich exemplification of generalisable


relationships established by statistical methods – (Sci
Paradigm)
• quantitative work establishes the generalisablity of
hypotheses which emerge from a qualitative enquiry (Sci
Paradigm)
• qualitative and quantitative work are used together
(iteratively) to deepen the understanding of the particular
cases on which we have been working. (Interp. Paradigm)
It’s …
• NOT the purpose of qualitative work simply to give rich
exemplification of generalisable relationships
established by statistical methods – to give a human
face to a statistical study.
• NOT that quantitative work should be used to establish
the generalisablity of hypotheses which emerge from a
qualitative enquiry - as if this is in some way a
necessary step in order that the qualitative findings can
be taken seriously.
• BUT qualitative and quantitative work are used together
(iteratively) to deepen the understanding of the
particular cases on which we have been working.
An example
C1 site differences
3.5
SpprtMS
Shared Control >
Student Negotiation
WrkbdAst
ELDrama
3.0
ESOL LS
ITSkills

Connect2
ADMPA
GNVQBus
Pth4 Prn
Shared Control

2.5

Student Negotiation >


Shared Control
AVCET&T CACHE
2.0
Voc Path
BTECHlth

Engneer ASPsych
1.5
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4

Student Negotiation
Explaining the High SC/Low SN
grouping
Support for Mature Students
• Self assessment, negotiation based on
assignments, individual learning plans
agreed and reviewed by tutor and student

Workbased assessment
• Individual support from tutor (underground
working)
Explaining the High SC/Low SN
grouping
Workbased assessment
• Different geographical placements

Support for Mature Students and IT skills


• Same room
but ISOLATION
• Different times
ESOL
• Same room, same times
but
• Different languages
Does isolation feature
elsewhere?
3.5

Isolation was a feature of


SpprtMS
Shared Control >
Student Negotiation
WrkbdAst
ELDrama
3.0
ESOL LS

the ‘top left’ sites


ITSkills

Connect2
ADMPA
GNVQBus
Pth4 Prn
Shared Control

2.5

Student Negotiation >


Shared Control
AVCET&T CACHE
2.0
Voc Path

In 4 of the ‘bottom right’


BTECHlth

Engneer ASPsych
1.5
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4

sites isolation was ‘not at


Student Negotiation

all evident in this site’

Isolation (broadly defined) appeared to be


a factor related to a site culture in which
there was low student negotiation.
Some further multivariate stats
EFA & CFA
• Factor analysis used to explore
relationships amongst variables
• Factor analysis used to test expected
relationship between variables
Path analysis

.5 Exogenous variable

variance

ach Endogenous var


.6

IQ .4
Mediator var
.3
mot Direct effects – path
coeff

disturbances
.9
Structural equation modelling

http://www2.chass.ncsu.edu/garson/PA765/structur.htm
One indicator per latent: SEM=PA

http://www2.chass.ncsu.edu/garson/PA765/structur.htm
No dependent: SEM=CFA

http://www2.chass.ncsu.edu/garson/PA765/structur.htm
Multilevel Modelling
• Bennett – 1976, Teaching Styles and Pupil
Progress
– Children taught in a formal style did better

• Aitkin et al – 1981, Teaching Styles and


Pupil Progress: A Re-Analysis British
Journal of Educational Psychology, v51 n2
p170-86
– When pupils’ grouping into classes was taken
into account, this difference disappeared
What kind of problem can this
explore?
• Pupils taught by formal methods do better that those
taught by informal methods
– Formal teaching methods are best
BUT
• All formal teachers are in mixed schools
• All informal teachers in single sex schools
– So mixed schools are best
BUT
• All mixed sch are in one LA that spends a lot on
education
• All single sex sch are in another LA that does not spend
a lot on education
– So really it’s resourcing that matters
And also…
• So MLM groups people appropriately
– eg class, school, local authority
But it also
• Allows the use of covariates at any level of
grouping
– eg initial test scores at the class level, teacher
experience at the school level, funding to schools at
the LA level
• Allows the exploration of interactions
– eg is the variation between schools greater for
children with low initial test scores
A really helpful website:
• http://www.cmm.bristol.ac.uk/learning-
training/multilevel-models/what-why.shtml
Including a useful video on the principles
• http://www.cmm.bristol.ac.uk/learning-
training/videos/jr-clioday_files/Default.htm
and on an application
• http://www.cmm.bristol.ac.uk/learning-
training/videos/kj-clioday_files/Default.htm

You might also like