Finals Reviewer (IO)

Evaluation Techniques
Construct Validity - The extent to which a test actually measures the construct that it purports
to measure
• Is concerned with inferences about test scores
• Determined by correlating scores on a test with scores from other test
– Known group Validity
– Convergent Validity
– Discriminant Validity
Face Validity - The extent to which a test appears to be job-related
➢ Enhances perceptions of fairness
➢ Motivation of the applicants
❖ Reduces the chance of legal challenge

❖ Increasing face validity
❖ Example job knowledge tests and work samples
❖ Applicants might fake tests of Individual differences
➢ Typess of Validity
■ Face - Measure looks like what it assesses
■ Content - Measure assesses entire variable
■ Criterion-related - Measure relates to what is expected
■ Construct - Interpretation of a measure’s meaning
Validity - is a correlation of the test with an independent criteria whereas reliability is the self
correlation of the test.
Reliability - is sufficient but not necessary condition for Validity.
● A test cannot be Valid without being Reliable.
● A test can be Reliable without being Valid.
Cost Effectiveness - If two tests have equivalent validities then costs should be considered
– Wonderlic Personnel Test vs Wechsler Adult Intelligence Scale
– Group Testing vs Individual Testing
– Virtual Vs Real Time Testing (CAT)

Utility - The degree to which a selection device improves the quality of a personnel system,
above and beyond what would have occurred had the instrument not been used.
Selection works best when…
• You have many job openings
• You have many more applicants than openings
• You have a valid test
• The job in question has a high salary
• The job is not easily performed or easily trained
Common Utility Methods

Taylor-Russell Tables - Estimates the percentage of future employees that will be successful
(impact of overall testing procedure)
- Measure of the impact of overall testing procedure
• Three components
– Validity
– Base rate (%of employees successful; successful employees ÷ total employees)
• Selection ratio (hired ÷ applicants)
–
Current employees not performing well
● Vertical: cut off ng score, 50% score, average ng score
● Horizontal: job performance; company standard
● Quadrants:
I: high JP, low TS (wrong decision to hire)
II: high JP, high TS (right decision) to hire)
III: low JP, high TS (wrong decision to hire)
IV: low JP, low TS (correct decision to hire)
● Proportion of correct decisions with test
Q1 and Q4 / Total employees = (Q2 + Q4) / (Q1+Q2+Q3+Q4)
● Baseline of correct decisions successful employees (high JP) / total employees
= (Q1 + Q2) / (Q1+Q2+Q3+Q4)
Example:
• Suppose we have
– a test validity of .40
– a selection ratio of .30
– a base rate of .50
• Using the Taylor-Russell Tables what percentage of future employees would be

successful?
Proportion of Correct Decisions
1. Proportion of Correct Decisions With Test (employee test scores):Estimating test

effectiveness
(Quadrant II + Quadrant IV) ÷ Total employees Quadrants I+II+III+IV. This calculates

the percentage of time we expect to be accurate in making a selection decision. This
should be higher than the second one.
2. Baseline of Correct Decisions (scores on the criterion)
Successful employees ÷ Total employees
Quadrants I + II Quadrants I+II+III+IV
Less accurate than Taylor –Russel Tables
• Quad 1- employee did poorly on the test but performed well on the job
• Quad 2-employees who scored well and performed well on the job
• Quad 3- employee scored high on test and performed poorly on the job
• Quad 4-low score on test and performed poorly on the job

• Proportion of Correct Decisions With Test
( 10 + 11 ) ÷ (5 + 10 + 4 + 11)
Quadrant II Quadrant IV Quadrants I+II+III+IV
= 21 ÷ 30 = .70
• Baseline of Correct Decisions
5 + 10 ÷ 5 + 10 + 4 + 11
= 15 ÷ 30 = .50
•
Proportion of Correct Decisions With Test

( 8 + 6 ) ÷ (4 + 8 + 6 + 2)
Quadrant II Quadrant IV Quadrants I+II+III+IV
= 14 ÷ 20 = .70
• Baseline of Correct Decisions
4+8 ÷ 4+8+6+2
= 12 ÷ 20 = .60
Lawshe Tables
• Gives you probability of a particular applicant being successful.
– Validity coefficient
– Base rate
– Applicant score
– Table 6.5
The Brogden-Cronbach-Gleser Model

• Gives an estimate of utility by estimating the amount of money an organization would save
if it used the test to select employees.
Savings =(n) (t) (r) (SDy) (m) - cost of testing
• n= Number of employees hired per year
• t= average tenure
• r= test validity
• SDy=standard deviation of performance in dollars (40%; if the performance is normally
distributed; dollars difference in good worker and average worker)
• m=mean standardized predictor score of selected applicants
Selection ratio
The ratio between the number of openings to the number of applicants
Validity coefficient
Base rate of current performance
The percentage of employees currently on the
job who are considered successful.
SDy
The difference in performance (measured in dollars)
between a good and average worker (workers one
standard deviation apart)
Calculating m
• For example, we administer a test of mental ability to a group of 100 applicants and hire
the 10 with the highest scores. The average score of the 10 hired applicants was 34.6, the
average test score of the other 90 applicants was 28.4, and the standard deviation of all test
scores was 8.3. The desired figure would be:
• (34.6 - 28.4) ÷ 8.3 = 6.2 ÷ 8.3 = ?
• You administer a test of mental ability to a group of 150 applicants, and hire 35 with the
highest scores. The average score of the 35 hired applicants was 35.7, the average test
score of the other 115 applicants was 24.6, and the standard deviation of all test scores was
11.2. The desired figure would be:
» (35.7 - 24.6) ÷ 11.2 = ?
Standardized Selection Ratio
Example:
– Suppose:
• we hire 10 auditors per year
• the average person in this position stays 2 years
• the validity coefficient is .40
• the average annual salary for the position is $30,000
• we have 50 applicants for ten openings.
– Our utility would be:

Savings =(n) (t) (r) (SDy) (m) - cost of testing
(10 x 2 x .40 x $12,000 x 1.40) – (50 x 10) = $133,900
• Test Bias
– Technical aspects of the test
– A test is biased if there are group differences in test scores (e.g., race, gender) that
are unrelated to the construct being measured (e.g., integrity)
• Test Fairness
– Includes bias as well a political and social issues
– A test is fair if people of equal probability of success on a job have an equal chance
of being hired
• Adverse Impact
– Occurs when the selection rate for one group is less than 80% of the rate for the
highest scoring group
Male Female
Number of applicants 50 30
Number hired 20 10
Selection ratio .40 .33
– .33/.40 = .83 > .80 (no adverse impact)

Other Fairness Issues
• Predictive Bias
– Predictive level of job success favors one group over the other
• Single-Group Validity
– Test predicts for one group but not another
– Very rare
• Differential Validity
– Test predicts for both groups but better for one
– Also very rare

Making the Hiring Decision
Linear Approaches to Making the Selection Decision
• Unadjusted Top-down Selection
– Top down selection - pplicants are rank-ordered on the basis of their test scores.
Selecting applicants in straight rank order of their test scores.
Advantages
• Higher quality of selected applicants
• Objective decision making

Disadvantages
• Less flexibility in decision making
• Adverse impact = less workforce diversity
• Ignores measurement error
• Assumes test score accounts for all the variance in performance (Zedeck,
Cascio, Goldstein & Outtz, 1996).
– Compensatory - A method of making selection decisions in which a high score on
one test can compensate for a low score on another test. For example, a high GPA
might compensate for a low GRE score.
• Rule of 3 - A variation on top-down selection in which the names of the top three
applicants are given to a hiring authority who can then select any of the three. A technique
often used in the public sector. Gives more flexibility to the selectors
• Passing Scores - The minimum test score that an applicant must achieve to be considered
for hire. A means for reducing adverse impact and increasing flexibility
– Who will perform at an acceptable level?
A passing score is a point in a distribution of scores that distinguishes acceptable
from unacceptable performance (Kane, 1994).
– Uniform Guidelines (1978) Section 5H:

Passing scores should be reasonable and consistent with expectations of acceptable
proficiency
Advantages
– Increased flexibility in decision making
– Less adverse impact against protected groups

Disadvantages
– Lowered utility
– Can be difficult to set
– Multiple cut off - Must meet or exceed passing score in more than one test.
Applicants will give all the tests at the same time. If they fail in one test they will not
be considered
– A selection strategy in which applicants must meet or exceed the passing score on
more than one selection test.
• All Applicants take every test
• Must achieve passing on each
• Can lead to different decisions than regression approach
– Multiple hurdle - To reduce the costs associated with applicants failing one or more
tests, multiple-hurdle approaches are often used.
– Selection practice of administering one test at a time so that applicants must pass that
test before being allowed to take the next test.
• All Applicants take the first test
• Pass/fail of first, leads to who take the next, and so on
• Useful when many applicants and tests are costly and time consuming
• Banding - A statistical technique based on the standard error of measurement that allows
similar test scores to be grouped.
– It is a compromise between top down hiring and passing scores
– Banding attempts to hire the top scorer while allowing flexibility for AA
– SEM banding (standard error of measurement)
• Testing differences between scores for statistical significance.
• How many points apart do two applicants have to be so their tests scores are
significantly different?
– Attempts to hire the top test scorers while still allowing some flexibility for
affirmative action (Campion et al., 2001).
– To compute you need the standard deviation and reliability of the test
Standard error =
– Band is established by multiplying 1.96 times the standard error
Types of SEM Bands

● Fixed Sliding Diversity-based
- Females and minorities are given preference when selecting from within a band.
Advantage of Banding
- Helps reduce adverse impact, increase workforce diversity,and increase
perceptions of fairness (Zedeck et al., 1996).
- Allows you to consider secondary criteria relevant to the job (Campion et al.,
2001).
Disadvantages of Banding
- Lose valuable information
- Lower the quality of people selected
- Sliding bands may be difficult to apply in the private sector
- Banding without minority preference may not reduce adverse impact
• The Regression Approach

– A statistical procedure used to predict one variable on the basis of another variable
• Y = a +bX
• Where:
Y = the predicted criterion score
a = a numerical value reflecting the regression lines intercept on the y axis
(where it crosses 0 on the x-axis)
b = a numerical value reflecting the slope of the regression line
X = the predictor score for a given individual
– Validate our measure to discover what our intercept and regression lines are (i.e.,
find out the values of a and b are)
• Think back to Criterion-Validation
– Use the values of a and b, so we can input an applicants score on the selection
measure (our predictor; X) to estimate how well they will do on our criterion
Example:
Test individuals on an intelligence test (b; our predictor) and job performance (Y;
our criterion)
•
Compensatory Approach (Multiple Regression)

– Statistical procedure used to predict one variable on the basis of two or more other
variables
– An approach where:
• All Applicants take every test/go through every selection procedure
• Score are weighted and combined to determine each applicant’s predictor
score (similar to regular regression, but with multiple predictors):
Y’ = a + b1X1 + b2X2
• High score one test compensates/procedure can compensate for low score on
another test
Example:
We assess individuals on their GRE scores, GPA, Letters of Recommendation,
and Vita/Resume for graduate school admissions. Different weights are given based on
their validity with performance in graduate school.
Evaluating Employee Performance

Effective and Legal Performance Appraisal Systems

Finals Reviewer (IO)

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Finals Reviewer (IO)

Uploaded by

Copyright:

Available Formats

Evaluation Techniques

• Is concerned with inferences about test scores

• Determined by correlating scores on a test with scores from other test

– Known group Validity

➢ Enhances perceptions of fairness

➢ Motivation of the applicants

❖ Reduces the chance of legal challenge

– Virtual Vs Real Time Testing (CAT)

• You have many more applicants than openings

• You have a valid test

• The job in question has a high salary

• The job is not easily performed or easily trained

Common Utility Methods

– Base rate (%of employees successful; successful employees ÷ total employees)

• Selection ratio (hired ÷ applicants)

– a test validity of .40

– a selection ratio of .30

– a base rate of .50

• Using the Taylor-Russell Tables what percentage of future employees would be

1. Proportion of Correct Decisions With Test (employee test scores):Estimating test

(Quadrant II + Quadrant IV) ÷ Total employees Quadrants I+II+III+IV. This calculates

• Quad 4-low score on test and performed poorly on the job

Proportion of Correct Decisions With Test

The Brogden-Cronbach-Gleser Model

• (34.6 - 28.4) ÷ 8.3 = 6.2 ÷ 8.3 = ?

• we hire 10 auditors per year

• the average person in this position stays 2 years

• the validity coefficient is .40

• the average annual salary for the position is $30,000

• we have 50 applicants for ten openings.

– Our utility would be:

– Technical aspects of the test

– Includes bias as well a political and social issues

– .33/.40 = .83 > .80 (no adverse impact)

– Test predicts for one group but not another

– Test predicts for both groups but better for one

– Also very rare

• Objective decision making

• Adverse impact = less workforce diversity

• Ignores measurement error

– Uniform Guidelines (1978) Section 5H:

– Less adverse impact against protected groups

– Can be difficult to set

• Pass/fail of first, leads to who take the next, and so on

– SEM banding (standard error of measurement)

• Testing differences between scores for statistical significance.

– Band is established by multiplying 1.96 times the standard error

Types of SEM Bands

• The Regression Approach

Compensatory Approach (Multiple Regression)

Evaluating Employee Performance

You might also like