PsyAss Lecture Notes

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

PSYCHOLOGICAL ASSESSMENT NOTES

Prepared and Screened by:


Prof. Jose J. Pangngay, MS Psych, RPm

CHAPTER I: BRIEF HISTORY OF PSYCHOLOGICAL TESTING


A. Ancient Roots
• Chinese Civilization – testing was instituted as a means of selecting who, of the many applicants, would obtain government jobs
• Greek Civilization – tests were used to measure intelligence and physical skills
• European Universities – these universities relied on formal exams in conferring degrees and honors
B. Individual Differences
• Charles Darwin – believed that despite our similarities, no two humans are exactly alike. Some of these individual differences are more “adaptive
than others and these differences lead to more complex, intelligent organisms over time.
• Francis Galton – he established the testing movement; introduced the anthropometric records of students; pioneered the application of rating-scale
and questionnaire method, and the free association technique; he also pioneered the use of statistical methods for the analysis of psychological tests
He used the Galton bar (visual discrimination length) and Galton whistle (determining the highest audible pitch). Moreover, he also noted that
persons with mental retardation tend to have diminished ability to discriminate among heat, cold and pain.
C. Early Experimental Psychologists
• Johan Friedrich Herbart – Mathematical models of the mind; father of pedagogy as an academic discipline; went against Wundt
• Ernst Heinrich Weber – sensory thresholds; just noticeable differences (JND)
• Gustav Theodor Fechner – mathematics of sensory thresholds of experience; founder of psychophysics; considered one of the founders of
experimental psychology; Weber-Fechner Law first to relate sensation and stimulus
• Wilhelm Wundt – considered one of the founders of Psychology; first to setup a psychology laboratory
• Edward Titchner – succeeded Wundt; brought Structuralism to America; his brain is still on display in the psychology department at Cornell
• Guy Montrose Whipple – pioneer of human ability testing; conducted seminars that changed the field of psychological testing
• Louis Leon Thurstone – large contributor of factor analysis; approach to measurement was termed as the law of comparative judgment
D. The Study of Mental Deficiency and Intelligence Testing
• Jean Esquirol – provided the first accurate description of mental retardation as an entity separate from insanity.
• Edouard Seguin – pioneered modern educational methods for teaching people who are mentally retarded/intellectually disabled
• James McKeen Cattell – an American psychologist who coined the term “mental test”
• Alfred Binet – the father of IQ testing
• Lewis M. Terman – introduced the concept of IQ as determined by the mental age and chronological age; translated the Binet-Simon into English
IQ Classification according to the Stanford-Binet
Over 140 : Genius
120-140 : Very Superior
110-119 : Superior
90-109 : Average
80-89 : Dullness
70-79 : Borderline Deficiency
Under 70 : Feeble-mindedness
• Charles Spearman – introduced the two-factor theory of intelligence (General ability or “g” – required for performance on mental tests of all kinds;
and Special abilities or “s” – required for performance on mental test of only one kind)
• Thurstone – Primary Mental Abilities
• David Wechsler – Wechsler Intelligence Tests (WISC, WAIS)
• Raymond Cattell – introduced the components of “g” (Fluid “g” – ability to see relationships as in analogies and letter and number series, also known
as the primary reasoning ability which decreases with age; and Crystallized “g” – acquired knowledge and skills which increases with age)
• Guilford – theorized the “many factor intelligence theory”
(6 types of operations X 5 types of contents X 6 types of products = 180 elementary abilities)
• Vernon and Carroll – introduced the hierarchical approach in “g”
• Sternberg – introduced the “3 g’s” (Academic g, Practical g, and Creative g)
• Howard Gardner – conceptualized the multiple intelligences theory
• Henry Goddard – translated the Binet-Simon test into French
E. World War I
• Robert Yerkes – pioneered the first group intelligence test known as the Army Alpha (for literate) and Army Beta (for functionally illiterate)
• Arthur S. Otis – introduced multiple choice and other “objective” item type of tests
• Robert S. Woodworth – devised the Personal Data Sheet (known as the first personality test) which aimed to identify soldiers who are at risk for shell
shock
F. Personality Testers
• Herman Rorschach – slow rise of projective testing; Rorschach Inkblot Test
• Henry Murray & Christina Morgan – Thematic Apperception Test
• Early 1940’s – structure tests were being developed based on their better psychometric properties
• Raymond B. Cattell – 16 Personality Factors
• McCrae & Costa – Big 5 Personality Factors
G. Psychological Testing in the Philippines
• Virgilio Enriquez – Panukat ng Ugali at Pagkatao or PUP
• Aurora R. Palacio – Panukat ng Katalinuhang Pilipino or PKP
• Anadaisy Carlota – Panukat ng Pagkataong Pilipino or PPP
• Gregorio E.H. Del Pilar – Masaklaw na Panukad ng Loob or Mapa ng Loob

CHAPTER II: PSYCHOLOGICAL TESTING AND PSYCHOLOGICAL ASSESSMENT


A. Psychological Testing vs. Psychological Assessment
Psychological Testing Psychological Assessment
Objective Typically, to obtain some gauge, usually numerical in Typically to answer a referral question, solve a problem, or arrive
nature, with regard to an ability or attribute at a decision through the use of tools of evaluation.
Focus How one person or group compares with others The uniqueness of a given individual, group, or situation
(nomothetic) (idiographic)
Process Testing may be individual or group in nature. After test Assessment is typically individualized. In contrast to testing,
administration, the tester will typically add up “the number of assessment more typically focuses on how an individual
correct answers or the number of certain types of processes rather than simply the results of that processing.
responses… with little if any regard for the how or
mechanics of such content”
Role of Evaluator The tester is not the key to the process; practically The assessor is the key to the process of selecting tests and/or
speaking, one tester may be substituted for another tester other tools of evaluation as well as in drawing conclusions from
without appreciably affecting the evaluation. the entire evaluation.
Skill of Evaluator Testing typically requires technician-like skills in terms of Assessment typically requires an educated selection of tools of
administering and scoring a test as well as in interpreting a evaluation, skill in evaluation, and thoughtful organization and
test result. integration of data.
Outcome Typically, testing yields a test score or series of test scores. Typically, assessment entails a logical problem-solving approach
that brings to bear many sources of data designed to shed light
on a referral question.
Duration Shorter, lasting from few minutes to few hours Longer, lasting from a few hours to a few days or more
Sources of Data One person, the test taker only Often collateral sources, such as relatives or teachers, are used
in addition to the subject of the assessment
Qualification for Use Knowledge of tests and testing procedures Knowledge of testing and other assessment methods as well as
of the specialty area assessed (psychiatric disorders, job
requirements, etc.)
Cost Inexpensive, especially when group testing is done Very expensive, requires intensive use of highly qualified
professionals

B. Tools of Psychological Assessment


1. Tests – a measuring device or procedure used to describe the ability, knowledge, skills or attitude of the individual
• Measurement – the process of quantifying the amount or number of a particular occurrence of event, situation, phenomenon, object or
person
• Assessment – the process of synthesizing the results of measurement with reference to some norms and standards
• Evaluation – the process of judging the worth of any occurrence of event, situation, phenomenon, object or person which concludes with a
particular decision
2. Interviews – a tool of assessment in which information is gathered through direct, reciprocal communication. Has three types (structured,
unstructured and semi-structured).
3. Portfolio Assessment – a type of work sample is used as an assessment tool
4. Case-History Data – records, transcripts, and other accounts in any media that preserve archival information, official and informal accounts, and
other data and items relevant to the assessee
5. Behavioral Observation – monitoring the actions of other or oneself by visual or electronic means while recording qualitative and/or quantitative
information regarding those actions, typically for diagnostic or related purposes and either to design intervention or to measure the outcome of an
intervention.

C. Parties in Psychological Assessment


1. Test Authors and Developer – create tests or other methods of assessment
2. Test Publishers – they publish, market, and sell tests, thus controlling their distribution
3. Test Reviewers – they prepare evaluative critiques of tests based on their technical and practical merits
4. Test Users – professionals such as clinicians, counselors, school psychologists, human resource personnel, consumer psychologists, experimental
psychologists, social psychologists, etc. that use these tests for assessment
5. Test Sponsors – institutional boards or government agencies who contract test developers or publishers for a various testing services
6. Test Takers – those who are taking the tests; those who are subject to assessment
7. Society at Large

D. Three-Tier System of Psychological Tests


1. Level A
– these tests are those that can be administered, scored and interpreted by responsible non-psychologist who have carefully read the manual
and are familiar with the overall purpose of testing. Educational achievement tests fall into this category.
– Examples: Achievement tests and other specialized (skill-based) aptitude tests
2. Level B
– these tests require technical knowledge of test construction and use of appropriate advanced coursework in psychology and related courses
– examples: Group intelligence tests and personality tests
3. Level C
– these tests require an advanced degree in Psychology or License as Psychologist and advanced training/supervised experience in a particular
test
– Examples: Projective tests, Individual Intelligence tests, Diagnostic tests
E. General Types of Psychological Tests According to Variable Measured
1. Ability Tests
- Assess what a person can do
- Includes Intelligence Tests, Achievement Tests and Aptitude Tests
- Best conditions are provided to elicit a person’s full capacity or maximum performance
- There are right and wrong answers
- Objective of motivation: for the examinee to do his best
2. Tests of Typical Performance
- Assess what a person usually does
- Includes personality tests, interest/attitude/values inventories
- Typical performance can still manifest itself even in conditions not deemed as best
- There are no right or wrong answers
- Objective of motivation: for the examinee to answer questions honestly

F. Specific Types of Psychological Tests


1. Intelligence Test
– measures general potential
– Assumption: fewer assumptions about specific prior learning experiences
– Validation process: Content Validity and Construct Validity
– examples: WAIS, WISC, CFIT, RPM
2. Aptitude Test
- Measures an individual’s potential for learning a specific task, ability or skill
- Assumption: No assumptions about specific prior learning experiences
- Validation process: Content validity and Predictive Validity
- Examples: DAT, SATT
3. Achievement Test
- This test provides a measure for the amount, rate and level of learning, success or accomplishment, strengths/weaknesses in a particular
subject or task
- Assumption: Assumes prior relatively standardized educational learning experiences
- Validation process: Content validity
- Example: National Achievement Test
4. Personality Test
- measures traits, qualities, attitudes or behaviors that determine a person’s individuality
- can measure overt or covert dispositions and levels of adjustment as well
- can be measured idiographically (unique characteristics) or nomothetically (common characteristics)
- construction strategies: theory-guided inventories, factor-analytically derived inventories, criterion-keyed inventories
- examples: NEOPI, 16PF, MBTI, MMPI
5. Interest Inventory
- Measures an individual’s performance for certain activities or topics and thereby help determine occupational choice or make career decisions
- Measure the direction and strength of interest
- Assumption: Interests though unstable, have a certain stability or else it cannot be measured
- Stability is said to start at 17 years old
- Broad lines of interests are more stable.
- Specific lines of interests are more unstable, they can change a lot.
- Example: CII
6. Attitude Inventory
- Direct observation on how a person behaves in relation to certain things
- Projective Techniques
- Attitude questionnaires or scales (Bogardus Social Distance Scale, 1925)
- Reliabilities are good but not as high as those of tests of ability
- Attitude measures have not generally correlated very highly with actual behavior
- Specific behaviors, however, can be predicted from measures of attitude toward the specific behavior
7. Values Inventory
- Purports to measure generalized and dominant interests
- Validity is extremely difficult to determine by statistical methods
- The only observable criterion is overt behavior
- Employed less frequently than interest in vocational counseling and career decision-making
8. Trade Test
- This test determines skills, special abilities that make an individual fit for the job
9. Diagnostic Test
- This test can uncover and focus attention on weaknesses of individuals for remedial purposes
10. Power Test
- Requires an examinee to exhibit the extent or depth of his understanding or skill
- Test with varying level of difficulty
11. Speed Test
- Requires the examinee to complete as many items as possible
- Contains items of uniform and generally simple level of difficulty
12. Creativity Test
- A tests which assesses an individual’s ability to produce new/original ideas, insights or artistic creations that are accepted as being social,
aesthetic or scientific value
- Can assess the person’s capacity to find unusual or unexpected solutions for vaguely defined problems
13. Neuropsychological Test
- Measures cognitive, sensory, perceptual and motor performance to determine the extent, locus and behavioral consequences of brain damage,
given to persons with known or suspected brain dysfunction
- Example: Bender-Gestalt II
14. Objective Test
- Standardized test
- Administered individually or in groups
- Objectively scored
- There are limited number of responses
- Uses norms
- There is a high level of reliability and validity
- Examples: Personality Inventories, Group Intelligence Test
15. Projective Test
- Test with ambiguous stimuli which measures wishes, intrapsychic conflicts, dreams and unconscious motives
- Administered individually
- Scored subjectively
- With low levels of reliability and validity
- Examples: Rorschach Inkblot Test, TAT, HTP, SSCT, DAP
16. Norm-Referenced Test – raw scores are converted to standard scores
17. Criterion-Referenced Test – raw scores are converted to cut-off scores

G. Psychological Tests are used in the following settings:


1. Educational Settings
- Basis for admission and placement to an academic institution
- Identify developmental problems or exceptionalities for which a student may need special assistance
- Assist students for educational od vocational planning
2. Clinical Settings
- For diagnosis and treatment planning
3. Counseling Settings
- Counseling in schools, prisons, government or private institutions
4. Geriatric Settings
- Assessment for the aged
5. Business Settings
- Selection of employees’ classification of individuals to positions suited for them
- Basis for promotion
6. Military Settings
- For proper selection of military recruits
- For placement in the military duties
7. Government and Organizational Credentialing
- For promotional purposes
- For licensing, certification or general credentialing of professionals
8. Courts
- Evaluate the mental health of people charged with a crime
- Investigating malingering cases in courts
- Making child custody/annulment/divorce decisions
9. Academic Research Settings

H. Uses of Psychological Test


1. Classification – assigning a person to one category rather than the other
a. Placement – refers to sorting of persons into different programs appropriate to their needs/skills (example: a university mathematics placement
exam is given to students to determine if they should enrol in calculus, in algebra or in a remedial course)
b. Screening – refers to quick and simple tests/procedures to identify persons who might have special characteristics or needs (example:
identifying children with exceptional thinking and the top 10% will be singled out for a more comprehensive testing)
c. Certification – determining whether a person has at least the minimum proficiency in some discipline/activity (example: right to practice
medicine after passing the medical board exam; right to drive a car)
d. Selection – example: provision of an opportunity to attend a university; opportunity to gain employment in a company or in a government
2. Diagnosis and Treatment Planning – diagnosis conveys information about strengths, weaknesses, etiology and best choices for treatment (example:
IQ tests are absolutely essential in diagnosing intellectual disability)
3. Self-Knowledge – psychological tests also supply a potent source of self-knowledge and in some cases, the feedback a person receives from
psychological tests is so self-affirming that it can change the entire course of a person’s life.
4. Program Evaluation – another use of psychological tests is the systematic evaluation of educational and social programs (they are designed to
provide services which improve social conditions and community life)
5. Research – psychological tests also play a major role in both the applied and the theoretical branches of behavioral researches

I. Objectives of Psychometrics
1. To measure behavior (overt and covert)
2. To describe and predict behavior and personality (traits, states, personality types, attitudes, interests, values, etc.)
3. To determine signs and symptoms of dysfunctionality (for case formulation, diagnosis, and basis for intervention/plan for action)

J. Assumptions about Psychological Testing and Assessment


1. Psychological traits and states exist.
• Trait - characteristic behaviors and feelings that are consistent and long lasting.
• State -temporary behaviors or feelings that depend on a person's situation and motives at a particular time
2. Psychological traits and states can be quantified and measured.
3. Test-related behavior predicts non-test-related behavior.
• Postdict it - To estimate or suppose something which took place in past; to conjecture something that occurred beforehand
• Predict - say or estimate that (a specified thing) will happen in the future or will be a consequence of something
4. Tests and other measurement techniques have strengths and weaknesses.
5. Various sources of error are part of the assessment process.
• Error – long standing assumption that factors other than what a test attempts to measure will influence performance on the test
• Error variance – the component of test score attributable to sources other than the trait or ability being measured
6. Testing and assessment can be conducted in a fair and unbiased manner.
7. Testing and assessment benefit society.

K. Cross-Cultural Testing
1. Parameters where cultures vary
- Language
- Test Content
- Education
- Speed (Tempo of Life)
2. Culture Free Tests
- An attempt to eliminate culture so nature can be isolated
- Impossible to develop such because culture is evident in its influence since birth or an individual
- The interaction between nature and nurture is cumulative and not relative
3. Culture Fair Tests
- These tests were developed because of the non-success of culture-free tests
- Nurture is not removed but parameters are common an fair to all
- Can be done using three approaches such as follows:
✓ Fair to all cultures
✓ Fair to some cultures
✓ Fair only to one culture

CHAPTER III: RESEARCH AND STATISTICS REFRESHER


A. Research Method
Research Component Qualitative Research Design Quantitative Research Design
Purpose • To gain an understanding of underlying reasons and • To quantify data and generalize results from a sample to
motivations the population of interest
• To provide insights into the setting of a problem, • To measure the incidence of various views and opinions
generating ideas and/or hypotheses for later in a chosen sample
quantitative research • Sometimes followed by qualitative research which is used
• To uncover prevalent trends in thought and opinion to explore some findings further
• To explore causality • To suggest causality
Philosophical • Post-positivist perspective • Positivist perspective
Assumptions • Naturalistic • Objective reality
• Social, multiple & subjective reality where researcher • Researcher is independent of that which is researched
interacts with that being researched
Research Method • Phenomenology • Experimental
• Case study • Quasi-experimental
• Ethnography • Single subject
• Grounded theory • Comparative
• Cultural studies • Correlational
Time Element • Conducted if time is not limited because of the extensive • Most suitable if time and resources are limited
interviewing
Research Problem & • Hypothesis is informed guess or prediction • Question is evolving, general and flexible
Hypotheses/Assumptions • Hypotheses are being generated • Hypotheses are being tested
Sample • Usually a small number of non-representative cases. • Usually a large number of cases representing the
Respondents selected to fulfill a given quota. population of interest. Randomly selected respondents.
• Sampling depends on what needs to be learned • Sampling focus is on probability and “representativeness”.
• More focused geographically • More dispersed geographically
• Control group is not required • Control group or comparison is necessary to determine the
impact
Data Collection • Unstructured or semi-structured techniques e.g.: • Structured techniques such as online questionnaires, and
individual depth interviews or group discussions. standardized tests..
Data Analysis • Non-statistical analysis • Statistical analysis
Outcome • Exploratory and/or investigative. Findings are not • Used to recommend a final course of action.
conclusive and cannot be used to make
generalizations about the population of interest.

B. Research Designs
Research Method Salient Features
Descriptive-Qualitative ▪ Detailed descriptions of specific situation(s) using interviews, observations, document review.
(Case Study/Ethnography) ▪ The researcher’s task is to describe things as they are.
Descriptive-Quantitative ▪ Numerical descriptions (frequency, average) of specific situations.
▪ The researcher’s task is to measure things as they are.
Correlational/Regression Analysis ▪ Quantitative analyses of the strength of relationships between two or more variables.
Quasi-Experimental Research ▪ Comparing a group that gets a particular intervention with another group that is similar in characteristics but did not
receive the intervention.
▪ There is no random assignment used.
Research Method Salient Features
Experimental Research ▪ Using random assignment to assign participants to an experimental or treatment group and a control or comparison
group.
Meta-analysis ▪ Synthesis of results from multiple studies to determine the average impact of a similar intervention across the
studies.

C. Scales of Measurement
1. Primary Scales of Measurement
a. Nominal: a non-parametric measure that is also called categorical variable, simple classification. We do not need to count to distinguish one
item from another.
Example: Sex (Male and Female); Nationality (Filipino, Japanese, Korean); Color (Blue, Red and Yellow)
b. Ordinal: a non-parametric scale wherein cases are ranked or ordered; they represent position in a group where the order matters but not the
difference between the values.
Example: 1st, 2nd, 3rd, 4th and 5th; Pain threshold in a scale of 1 – 10, 10 being the highest
c. Interval: a parametric scale wherein this scale use intervals equal in amount measurement where the difference between two values is
meaningful. Moreover, the values have fixed unit and magnitude.
Example: Speed of a car (70KpH); Temperature (Fahrenheit and Celsius only)
d. Ratio: a parametric scale wherein this scale is similar to interval but include a true zero point and relative proportions on the scale make sense.
Example: Height and Weight

2. Comparative Scales of Measurement


a. Paired Comparison: a comparative technique in which a respondent is presented with two objects at a time and asked to select one object
according to some criterion. The data obtained are in ordinal nature.
Example: Pairing the different brands of cold drink with one another please put a check mark in the box corresponding to your preference.
Brand Coke Pepsi Sprite Limca
Coke
Pepsi ✓ ✓
Sprite ✓
Limca ✓ ✓ ✓
No. of Times Preferred 3 1 2 0
b. Rank Order: respondents are presented with several items simultaneously and asked to rank them in order of priority. This is an ordinal scale
that describes the favoured and unfavoured objects, but does not reveal the distance between the objects. The resultant data in rank order is
ordinal data. This yields a better result when comparisons are required between the given objects. The major disadvantage of this technique is
that only ordinal data can be generated.
Example: Rank the following brands of cold drinks you like most and assign it a number 1. Then find the second most preferred brand and
assign it a number 2. Continue this procedure until you have ranked all the brands of cold drinks in order of preference. Also remember that no
two brands should receive the same rank order.
Brand Rank
Coke 1
Pepsi 3
Sprite 2
Limca 4
c. Constant Sum: respondents are asked to allocate a constant sum of units such as points, rupees or chips among a set of stimulus objects with
respect to some criterion. For example, you may wish to determine how important the attributes of price, fragrance, packaging, cleaning power
and lather of a detergent are to consumers. Respondents might be asked to divide a constant sum to indicate the relative importance of the
attributes. The advantage of this technique is saving time. However, the main disadvantages of are the respondent may allocate more or fewer
points than those specified. The second problem is respondents might be confused.
Example: Between attributes of detergent, please allocate 100 points among the attributes so that your allocation reflects the relative
importance you attach to each attribute. The more points an attribute receives, the more important the attribute is. If an attribute is not at all
important, assign it zero points. If an attribute is twice as important as some other attribute, it should receive twice as many points.
Attribute Number of Points
Price 50
Fragrance 05
Packaging 10
Cleaning power 30
Lather 05
Total Points 100
d. Q-Sort Technique: This is a comparative scale that uses a rank order procedure to sort objects based on similarity with respect to some
criterion. The important characteristic of this methodology is that it is more important to make comparisons among different responses of a
respondent than the responses between different respondents. Therefore, it is a comparative method of scaling rather than an absolute rating
scale. In this method the respondent is given statements in a large number for describing the characteristics of a product or a large number of
brands of products.
Example: The bag given to you contain pictures of 90 magazines. Please choose 10 magazines you prefer most, 20 magazines you like, 30
magazines which you are neutral (neither like nor dislike), 20 magazines you dislike and 10 magazines you prefer least.
Prefer Most Like Neutral Dislike Prefer Least

(10) (10)

(20) (20)

(30)
3. Non-Comparative Scales of Measurement
a. Continuous Rating Scales: the respondent’s rate the objects by placing a mark at the appropriate position on a continuous line that runs from
one extreme of the criterion variable to the other.
Example: How would you rate the TV advertisement as a guide for buying?
Strongly Agree Strongly Disagree
10 9 8 7 6 5 4 3 2 1
b. Itemized Rating Scale: itemized rating scale is a scale having numbers or brief descriptions associated with each category. The categories are
ordered in terms of scale position and the respondents are required to select one of the limited numbers of categories that best describes the
product, brand, company or product attribute being rated. Itemized rating scales are widely used in marketing research. This can take the
graphic, verbal or numerical form.
c. Likert Scale: the respondents indicate their own attitudes by checking how strongly they agree or disagree with carefully worded statements
that range from very positive to very negative towards the attitudinal object. Respondents generally choose from five alternatives (say strongy
agree, agree, neither agree nor disagree, disagree, strongly disagree). A likert scale may include a number of items or statements.
Disadvantage of Likert scale is that it takes longer time to complete that other itemized rating scales because respondents have to read each
statement. Despite the above disadvantages, this scale has several to advantages. It is easy to construct, administer and use.
Example: I believe that ecological questions are the most important issues facing human beings today.
1 2 3 4 5
Strongly Disagree Disagree Neutral Agree Strongly Agree
d. Semantic Differential Scale: This is a seven-point rating scale with end points associated with bipolar labels (such as good and bad, complex
and simple) that have semantic meaning. It can be used to find whether a respondent has a positive or negative attitude towards an object. It
has been widely used in comparing brands and company images. It has also been used to develop advertising and promotion strategies and in
a new product development study.
Example: Please indicate you attitude towards work using the scale below:
Attitude towards work
Boring : : : : : : : Interesting
Unnecessary : : : : : : : Necessary
e. Staple Scale: The staple scale was originally developed to measure the direction and intensity of an attitude simultaneously. Modern versions
of the staple scale place a single adjective as a substitute for the semantic differential when it is difficult to create pairs of bipolar adjectives.
The modified staple scale places a single adjective in the center of an even number of numerical values.
Example: Select a plus number for words that you think describe personnel banking of a bank accurately. The more accurately you think the
word describes the bank, the larger the plus number you should choose. Select a minus number for words you think do not describe the bank
accurately. The less accurate you think the word describes the bank, the larger the minus number you should choose.
+5 +5
+4 +4
+3 +3
+2 +2
+1 +1
Friendly Personnel Competitive Loan Rates
-1 -1
-2 -2
-3 -3
-4 -4
-5 -5

D. Descriptive Statistics
1. Frequency Distributions – distribution of scores by frequency with which they occur
2. Measures of Central Tendency – a statistic that indicates the average or midmost score between the extreme scores in a distribution
a. Mean – formula: X ̅ = ΣX (for ungrouped distribution) X ̅ = Σ(fX) (for grouped distribution)
N N
b. Median – the middle score in a distribution
c. Mode – frequently occurring score in a distribution
3. Measures of Variability – a statistic that describe the amount of variation in a distribution
a. Range – the difference between the highest and the lowest scores
b. Interquartile range – the difference between Q1 and Q3
c. Semi-Interquartile range – interquartile range divided by 2
d. Standard Deviation – the square root of the averaged squared deviations about the mean
4. Measures of Location
a. Percentiles – an expression of the percentage of people whose score on a test or measure falls below a particular raw score
b. Quartiles – one of the three dividing points between the four quarters of a distribution, each typically labelled Q1, Q2 and Q3
c. Deciles – divided to 10 parts
5. Skewness - a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean

a. Positive skew – relatively few scores fall at the positive end


b. Negative skew – relatively few scores fall at the negative end
6. Kurtosis - the sharpness of the peak of a frequency-distribution curve.
a. Platykurtic – relatively flat in its center
b. Leptokurtic – relatively peaked in its center
c. Mesokurtic – neither extremely peaked nor flat in its center

E. The Normal Curve and Standard Scores

̅
X−X
1. “z” Scores – Mean of 0, SD of 1 (Formula: SD
)
2. T scores – Mean of 50, SD of 10 (Formula: z-score X 10 + 50)
3. Stanines – Mean of 5, SD of 2 (Formula: z-score X 2 + 5)
4. Sten – Mean of 5.5, SD of 2 (Formula: z-score X 2 + 5.5)
5. IQ scores – Mean of 100, SD of 15
6. A scores – Mean of 500, SD of 100

F. Inferential Statistics
1. Parametric vs. Non-Parametric Tests
Parametric Test Non-Parametric Test
Requirements • Normal Distribution • Normal Distribution is not required
• Homogenous Variance • Homogenous Variance is not required
• Interval or Ratio Data • Nominal or Ordinal Data
Common Statistical Tools • Pearson’s Correlation • Spearman’s Correlation
• Independent Measures t-test • Mann-Whitney U test
• One-way, independent-measures ANOVA • Kruskal-Wallis H test
• Paired t-test • Wilcoxon Signed-Rank test
• One-way, repeated-measures ANOVA • Friedman’s test

2. Measures of Correlation
a. Pearson’s Product Moment Correlation – parametric test for interval data
b. Spearman Rho’s Correlation – non-parametric test for ordinal data
c. Kendall’s Coefficient of Concordance – non-parametric test for ordinal data
d. Biserial Correlation – non-parametric test for dichotomous and continuous nominal data
e. Point-Biserial Correlation – non-parametric test for dichotomous and continuous nominal data
f. Phi Coefficient – non-parametric test for dichotomous nominal data
g. Lambda – non-parametric test for 2 groups (dependent and independent variable) of nominal data
3. Chi-Square Test
a. Goodness of Fit – used to measure differences and involves nominal data and only one variable with 2 or more categories
b. Test of Independence – used to measure correlation and involves nominal data and two variables with two or more categories
4. Comparison of Two Groups
a. Paired t-test – a parametric test for paired groups with normal distribution
b. Unpaired t-test – a parametric test for unpaired groups with normal distribution
c. Wilcoxon Signed-Rank Test – a non-parametric test for paired groups with non-normal distribution
d. Mann-Whitney U test – a non-parametric test for unpaired groups with non-normal distribution
5. Comparison of Three or More Groups
a. Repeated measures ANOVA – a parametric test for matched groups with normal distribution
b. One-way/Two-Way ANOVA – a parametric test for unmatched groups with normal distribution
c. Friedman test – a non-parametric test for matched groups with non-normal distribution
d. Kruskal-Wallis H test – a non-parametric test for unmatched groups with non-normal distribution

CHAPTER IV: PSYCHOMETRIC PROPERTIES OF A GOOD TEST


A. Reliability – the stability or consistency of the measurement
1. Goals of Reliability
a. Estimate errors in psychological measurement
b. Devise techniques to improve testing so errors are reduced
2. Sources of Measurement Error
Source of Error Type of Test Prone to Each Error Source Appropriate Measures Used to Estimate Error
Inter-scorer differences and Tests scored with a degree of subjectivity Scorer reliability
Interpretation
Time Sampling Error Tests of relatively stable traits or behavior Test-Retest Reliability (rtt), a.k.a. Stability Coefficient
Content Sampling Error Tests for which consistency of results, as a Alternate-form reliability (a.k.a. coefficient of equivalence)
whole, is required or split-half reliability (a.k.a. coefficient of internal
consistency)
Inter-item Inconsistency Tests that require inter-item consistency Split-half reliability or more stringent internal consistency
measures, such as KR-20 or Cronbach Alpha
Inter-item Inconsistency and Tests that require inter-item consistency and Internal consistency measures and additional evidence of
Content Heterogeneity combined homogeneity homogeneity
Time and Content Sampling error Tests that require stability and consistency of Delayed alternate-form reliability
combined result, as a whole
3. Types of Reliability
a. Test-Retest Reliability
– compare the scores of individual who have been measured twice by the instrument
– this is not applicable for tests involving reasoning and ingenuity
– longer interval will result to lower correlation coefficient while shorter interval will result to higher correlation
– the ideal time interval for test-retest reliability is 2-4 weeks
– source of error variance is time sampling
– utilizes Pearson r or Spearman rho
b. Parallel-Forms/Alternate Forms Reliability
– same persons are tested with one form on the first occasion and with another equivalent form on the second
– the administration of the second, equivalent form either takes place immediately or fairly soon.
– the two forms should be truly paralleled, independently constructed tests designed to meet the same specifications, contain the same
number of items, have items which are expressed in the same form, have items that cover the same type of content, have items with the
same range of difficulty, and have the same instructions, time limits, illustrative examples, format and all other aspects of the test
– has the most universal applicability
– for immediate alternate forms, the source of error variance is content sampling
– for delayed alternate forms, the source of error variance is time sampling and content sampling
– utilizes Pearson r or Spearman rho
c. Split-Half Reliability
– Two scores are obtained for each person by dividing the test into equivalent halves (odd-even split or top-bottom split)
– The reliability of the test is directly related to the length of the test
– The source of error variance is content sampling
– Utilizes the Spearman-Brown Formula
d. Other Measures of Internal Consistency/Inter-Item Reliability – source of error variance is content sampling and content heterogeneity
• KR-20 – for dichotomous items with varying level of difficulty
• KR-21 – for dichotomous items with uniform level of difficulty
• Cronbach Alpha/Coefficient Alpha – for non-dichotomous items (likert or other multiple choice)
• Average Proportional Distance – focuses on the degree of difference that exists between item scores.
e. Inter-Rater/Inter-Observer Reliability
– Degree of agreement between raters on a measure
– Source of error variance is inter-scorer differences
– Utilizes Cohen’s Kappa statistic, Pearson r or Spearman rho
4. Standard Error of Measurement
– an index of the amount of inconsistency or the amount of expected error in an individual’s score
– the higher the reliability of the test, the lower the SEM
• Confidence Interval – a range or band of test scores that is likely to contain the true score
• Standard error of the difference – a statistical measure that can aid a test user in determining how large a difference should be
before it is considered statistically significant
5. Factors Affecting Test Reliability
a. Test Format
b. Test Difficulty
c. Test Objectivity
d. Test Administration
e. Test Scoring
f. Test Economy
g. Test Adequacy
B. Validity – a judgment or estimate of how well a test measures what it purports to measure in a particular test
1. Types of Validity
a. Face Validity – the least stringent type of validity, whether the a test looks valid to test users, examiners and examinees
b. Content Validity – whether the test covers the behavior domain to be measured which is built through the choice of appropriate content areas,
questions, tasks and items
c. Criterion-Related Validity – indicates the test effectiveness in estimating an individual’s behavior in a particular situation
• Concurrent Validity – the extent to which test scores may be used to estimate an individual’s present standing on a criterion
• Predictive – the scores on a test can predict future behavior or scores on another test taken in the future
• Incremental Validity – this type of validity is related to predictive validity wherein it is defined as the degree to which an additional predictor
explains something about the criterion measure that is not explained by predictors already in use
d. Construct Validity
– A test designed to measure a construct must estimate the existence of an inferred, underlying characteristic based on a limited sample of
behavior
– this can be established through any of the following:
• Discriminant Validation
✓ Convergent Validity – a test correlates highly with other variables with which it should correlate (example: Extraversion which is
highly correlated sociability)
✓ Divergent Validity – a test does not correlate significantly with variables from which it should differ (example: Optimism which is
negatively correlated with Pessimism)
• Factor Analysis – a retained statistical technique for analysing the interrelationships of behavior data
✓ Principal Components Analysis – a method of data reduction
✓ Common Factor Analysis – items do not make a factor, the factor should predict scores on the item and is classified into two
(Exploratory Factor Analysis for summarizing data and Confirmatory Factor Analysis for generalization of factors)
2. Test Bias
- This is a factor inherent in a test that systematically prevents accurate, impartial measurement
• Rating Error – a judgment resulting from the intentional or unintentional misuse of rating scales
o Severity Error/Strictness Error – less than accurate rating or error in evaluation due to the rater’s tendency to be overly critical
o Leniency Error/Generosity Error – a rating error that occurs as a result of a rater’s tendency to be too forgiving and
insufficiently critical
o Central Tendency Error – a type of rating error wherein the rater exhibits a general reluctance to issue ratings at either a
positive or negative extreme and so all or most ratings cluster in the middle of the rating continuum
o Proximity Error – rating error committed due to proximity/similarity of the traits being rated
o Primacy Effect – “first impression” affects the rating
o Contrast Effect – the prior subject of assessment affects the latter subject of assessment
o Recency Effect – tendency to rate a person based from recent recollections about that person
o Halo Effect – a type of rating error wherein the rater views the object of the rating with extreme favour and tends to bestow
ratings inflated in a positive direction
o Impression Management
o Acquiescence
o Non-acquiescence
o Faking-Good
o Faking-Bad
3. Test Fairness
- This is the extent to which a test is used in an impartial, just and equitable way
4. Factors Influencing Test Validity
a. Appropriateness of the test
b. Directions/Instructions
c. Reading Comprehension Level
d. Item Difficulty
e. Test Construction factors
f. Length of Test
g. Arrangement of Items
h. Patterns of Answer
C. Norms – designed as reference for evaluating or interpreting individual test scores
1. Types of Norms
a. Developmental Norms
- Mental Age
* Basal Age
* Ceiling Age
* Partial Credits
- Intelligence Quotient
- Grade Equivalent Norms
- Ordinal Scales
b. Within Group Norms
- Percentiles
- Standard Scores
c. Relativity Norms
- National Norms
- Co-norms
- Local Norms
- Subgroup Norms

CHAPTER V: TEST DEVELOPMENT


A. Standardization
1. Test Administration Procedure
- There should be uniformity in the instructions and in the testing conditions. Test administration includes carefully following standard procedures
so that the test is used in the manner specified by the test developers. The test administrator should ensure that test takers work within
conditions that maximize opportunity for optimum performance. As appropriate, test takers, parents, and organizations should be involved in
the various aspects of the testing process including
2. Scoring
- There should be a consistent mechanism and procedure in scoring. Accurate measurement necessitates adequate procedures for scoring the
responses of test takers. Scoring procedures should be audited as necessary to ensure consistency and accuracy of application.
3. Interpretation
- There should be common interpretations among similar results. Many factors can impact the valid and useful interpretations of test scores.
These can be grouped into several categories including psychometric, test taker, and contextual, as well as others.
a. Psychometric Factors: Factors such as the reliability, norms, standard error of measurement, and validity of the instrument are important
when interpreting test results. Responsible test use considers these basic concepts and how each impacts the scores and hence the
interpretation of the test results.
b. Test Taker Factors: Factors such as the test taker’s group membership and how that membership may impact the results of the test is a
critical factor in the interpretation of test results. Specifically, the test user should evaluate how the test taker’s gender, age, ethnicity,
race, socioeconomic status, marital status, and so forth, impact on the individual’s results.
c. Contextual Factors: The relationship of the test to the instructional program, opportunity to learn, quality of the educational program, work
and home environment, and other factors that would assist in understanding the test results are useful in interpreting test results. For
example, if the test does not align to curriculum standards and how those standards are taught in the classroom, the test results may not
provide useful information.
B. Objectivity
1. Time-Limit Tasks – every examinee gets the same amount of time for a given task
2. Work-Limit Tasks – every examinee has to perform the same amount of work
3. Issue of Guessing
C. Stages in Test Development
1. Test Conceptualization
2. Test Construction
3. Test Tryout
4. Item Analysis
a. Item Difficulty Index
- 0.00-0.20 : Very Difficult : Unacceptable
- 0.21-0.40 : Difficult : Acceptable
- 0.41-0.60 : Moderate : Highly Acceptable
- 0.61-0.80 : Easy : Acceptable
- 0.81-1.00 : Very Easy : Unacceptable
b. Item-Reliability Index – the higher the index, the greater the test’s internal consistency
c. Item-Validity Index – the higher the index, the greater the test’s criterion-related validity
d. Item Discrimination Index or Item Total Correlation
- Item discrimination index is for dichotomous items while item total correlation is used for alternative formats
- Statistical tools used are pearson r and point-biserial correlation
- The acceptable index is 0.30 and above
5. Test Revision

CHAPTER VI: ETHICAL STANDARDS IN PSYCHOLOGICAL ASSESSMENT


A. Responsibilities of Test Publishers
1. The publisher is expected to release tests of high quality
2. The publisher is expected to market product in a responsible manner
3. The publisher restrict distributions of test only to person with proper qualification
B. Publication and Marketing Issues
1. The most important guideline is to guard against premature release of a test
2. The test authors should strive for a balanced presentation of their instruments and refrain from one-sided presentation of information
C. Competence of Test Purchasers
D. Responsibilities of Test Users
1. Best interest of clients
2. Informed Consent
3. Duty to Warn
4. Confidentiality
5. Expertise of Test Users
6. Obsolete Tests and The Standard of Care
7. Responsible Report Writing
8. Communication of Test Results
9. Consideration of Individual Differences

You might also like