Professional Documents
Culture Documents
PredVerbal ELL
PredVerbal ELL
net/publication/254114219
CITATIONS READS
37 1,706
2 authors, including:
Joni Lakin
Auburn University
39 PUBLICATIONS 679 CITATIONS
SEE PROFILE
All content following this page was uploaded by Joni Lakin on 30 May 2014.
Joni M. Lakin
Auburn University
Author Note
Abstract
Verbal and quantitative reasoning tests provide valuable information about cognitive abilities
that are important to academic success. Information about these abilities may be particularly
valuable to teachers of students who are English-language learners (ELL), because leveraging
reasoning skills to support comprehension is a critical aptitude for their academic success.
However, due to concerns about cultural bias, many researchers advise exclusive use of
nonverbal tests with ELL students despite a lack of evidence that nonverbal tests provide greater
validity for these students. In this study, a culturally and linguistically diverse sample of students
were administered a test measuring verbal, quantitative, and nonverbal reasoning. The two-year
predictive relationship between ability and achievement scores revealed that nonverbal scores
had weaker correlations with future achievement than quantitative and verbal reasoning ability
for ELL and non-ELL students. Results do not indicate differential prediction and do not support
Cognitive ability tests that measure verbal, quantitative, and nonverbal reasoning skills
are widely used by schools to provide valuable information to teachers hoping to differentiate
instruction to their students’ cognitive strengths (Lohman, 2009; Lohman & Hagen, 2001b).
Verbal and quantitative reasoning skills are particularly critical for academic success because of
the heavy reliance on these skills in traditional academic domains. This may be even more true
for English-language learner (ELL) students, for whom leveraging verbal reasoning skills to
support comprehension is a critical aptitude for school success and language acquisition. For
example, knowing an ELL student has relatively weak verbal reasoning skills, a teacher might
provide that student with more linguistic support than other ELL students need. However,
despite the potential utility of such information, many researchers advise the exclusive use of
nonverbal tests with ELL students, suggesting that the linguistic and cultural demands of the
items creates measurement bias (Lewis, 2001; McCallum, Bracken, & Wasserman, 2001;
Naglieri & Ronning, 2000). While this argument is persuasive to many, there is little direct
evidence that nonverbal tests provide more useful information about ELL students’ academic
This study evaluated the validity and fairness of the Cognitive Abilities Test (CogAT,
Form 6; Lohman & Hagen, 2001a) in predicting reading and math achievement in a sample of
Hispanic ELL and Hispanic and White non-ELL students. The CogAT consists of verbal,
quantitative, and nonverbal batteries. These batteries have been found to provide strong
predictive validity for achievement in non-ELL populations (Lohman & Hagen, 2002). The
Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 4
purpose of the study was to explore whether the batteries provide similar predictive validity for
Multi-battery ability tests assess cognitive ability by sampling multiple content domains.
Such tests are useful for teachers because they provide information about the range of a student’s
talents. Both individually and group-administered tests can provide multiple test scores for
students that contrast their performance in various domains (Sattler, 2008). Teachers can use this
information to target both student weaknesses—for extra practice and instructional support—and
student strengths—for enrichment opportunities that make school more enjoyable and
challenging.
A common misconception is that ability tests should enable users to measure innate
ability that is uninfluenced by educational opportunity. In fact, ability tests measure developed
and well-practiced reasoning skills (Anastasi, 1980). Rather than providing qualitatively distinct
information from achievement tests, ability and achievement tests differ in the degree to which
they tap into recent and specific learning accomplishments versus general and long-term
acquisitions (Anastasi, 1980; Lohman, 2001). Thus, ability tests offer a different broader
perspective on developed knowledge and skills that can be contrasted with more narrowly
focused achievement test performance and can be useful to teachers who want to adapt the pace
and content of their instruction to students who differ widely in the speed and readiness with
The misconception about ability tests measuring innate capabilities leads many to
conclude that mean differences on verbal and quantitative ability tests by definition reflect bias
in the assessments. Large mean differences have been documented between ELL and non-ELL
Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 5
students on a range of ability tests (Lakin & Lohman, in press; Palmer, Olivarez, Willson, &
Fordyce, 1989; Patterson, Mattern, & Kobrin, 2007). Thus, a number of researchers and
educators have called for the exclusive use of nonverbal tests in assessing the cognitive strengths
of culturally and linguistically diverse students (Lewis, 2001; Naglieri & Ford, 2003).
However, the existence of mean differences is not in itself evidence of bias (Jensen,
1980; Reynolds, 1982). When test users interpret scores appropriately by controlling for
opportunity to learn, mean differences do not negate the utility of the tests for differentiating
instruction. Furthermore, the exclusive use of nonverbal tests to predict achievement and make
academic placement decisions has been widely criticized by many researchers, because those
tests clearly under-represent the domain of interest and lack the obvious links that verbal and
quantitative reasoning have to learning and school success (Braden, 2000; Figueroa, 1989; Lakin
& Lohman, in press; Ortiz & Dynda, 2005). They also do not provide teachers with a clear path
To support the contention that nonverbal tests are more valid and useful for
differentiating instruction for ELL students, researchers must first show conclusive evidence of
bias for tests measuring verbal and quantitative reasoning, and, second, that nonverbal tests
provide an effective alternative. The evidence of bias for verbal and quantitative tests is not
conclusive because most proponents of nonverbal assessments rely solely on mean differences as
evidence of bias. Evidence of differential prediction for tests measuring verbal and quantitative
ability would provide more conclusive evidence, but has not been found by previous research.
For example, despite finding large mean differences, Palmer et al. (1989) found no differences in
the regression slopes between language proficiency groups when predicting achievement scores
Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 6
from the Kaufman Assessment Battery for Children (K-ABC; Kaufman & Kaufman, 1983) using
ability scores from the Wechsler Intelligence Scale for Children-Revised (WISC-R; Wechsler,
1974).
In contrast, there is strong evidence that nonverbal tests do not provide an effective
alternative to verbal and quantitative tests because these tests often yield low validities for
predicting reading and math achievement (both critical domains of academic development). For
a sample of ELL students, Borghese (2009) reported correlations between the Universal
Nonverbal Intelligence Test (UNIT; Bracken & McCallum, 1998) and achievement of r = .28 for
reading achievement. Prediction of math achievement was stronger at r = .51. Jones (2006) found
correlations below .10 between UNIT scores in first grade and reading achievement on the Texas
Assessment of Knowledge and Skills (TAKS) in third grade for both ELL and non-ELL students.
Even in non-ELL samples, the correlations between nonverbal tests and achievement usually
range between .3 and .6 (e.g., Balboni, Naglieri, & Cubelli, 2010; Naglieri & Ronning, 2000).
These values are far below what is typically observed for CogAT verbal and quantitative
batteries with non-ELL samples, which predict their relevant domain of achievement (reading
and mathematics, respectively) with correlations of .75-.80 (Lakin & Lohman, in press). Lakin
and Lohman (in press) showed that differences in correlations of this magnitude (.5 vs. .8) have
The purpose of this study was to provide additional data on the predictive validity of
verbal, quantitative, and nonverbal test batteries for culturally and linguistically diverse students.
1. Are there substantial mean differences in verbal, quantitative, and nonverbal ability
proficiency?
2. Are the same achievement and ability measures useful as predictors of future
3. Does the nonverbal battery play a more important role in predicting later achievement for
ELL students?
Methods
Two schools in Arizona participated in the Project Bright Horizons study developed by a
team of researchers and school administrators (see Lohman, Korb, & Lakin, 2008). The data
used in this study came from students in the sample who were in 3rd to 5th grade in the first year
of the study and reported either White or Hispanic ethnicity. The sample consisted of 124
Hispanic ELL students, 161 Hispanic non-ELL students, and 72 White non-ELL students.
Ethnicity was based on district data, which relies on U.S. Census classifications. Other ethnic
groups of non-ELL students (Asian, American Indian, and African American) included fewer
than 30 students each and were omitted from the analyses. Table 1 provides additional
[Table 1]
ELL status in this study relied on district classifications reported by the schools.
StudentsThese classifications were based partially on student scores on the Stanford English
Language Proficiency Test (SELP; Harcourt Educational Measurement, 2003). For this study,
students were classified based on their ELL status in year 1. The range of English proficiency
varied considerably within the ELL group: 13% were first-year ELL students (i.e., low
Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 8
proficiency) while another 33% were reclassified by the second year of the study (likely high
proficiency).
In the first year of the study, students completed both ability and achievement tests in the
late spring. In the second year, only achievement tests were administered. The achievement tests
were administered as part of the schools’ annual accountability testing. Only students with
complete teststest records were used. The variables with the greatest proportion of missing data
were the year-two achievement scores. Unlike many studies, in this case, White students had
missing scores more often than ELL and Hispanic students., perhaps due to differences in school
mobility.
Measures
The CogAT consists of three separate batteries measuring verbal, quantitative, and
nonverbal reasoning (Lohman & Hagen, 2001a). In this study, students received the appropriate
level of the CogAT given their grade level (levels A to C, respectively). The verbal (65 items),
quantitative (60 items), and nonverbal batteries (65 items) each consist of three subtests that use
different item formats. Universal scale scores on a vertical scale spanning grades K through 12
were used in this study. A previous research study on the same dataset indicated that the factor
structure of this test was consistent for ELL and non-ELL students, though the variance of the
verbal factor was attenuated for ELL students (Author, 2010). Another study found that the
reliability of the verbal battery was adequate (Φ = .82) for ELL students, though lower than that
of non-ELL students (Φ = .96). See Lakin and Lai (in press) for a detailed exploration of the
All tests on the CogAT begin with directions that are read aloud by the teacher. In this
study, teachers read directions in Spanish as well as English when appropriate. All three subtests
of the verbal battery and one subtest of the quantitative battery require the examinee to complete
some reading in English. On the verbal battery, students must read either individual words
(verbal classification and verbal analogies) or short sentences (sentence completion). On the
quantitative relations subtest, students read individual words (e.g., foot, gallon). The other
quantitative tests and all of the nonverbal battery do not require reading.
Achievement test
The Arizona Instrument to Measure Standards Dual Purpose Assessment (AIMS DPA)
was designed to yield normative and criterion-referenced information about student achievement.
Thirty to fifty percent of items on the AIMS DPA come from the TerraNova achievement tests
(CTB/McGraw-Hill, 2002). The remaining items were developed by educators specifically for
the AIMS DPA to better align the test with state educational goals (Arizona Department of
Education, 2006). Reading/language arts and mathematics subtests of the AIMS DPA each
contained approximately 80 items. Separate scale scores are reported for mathematics and
reading.
Procedure
In separate models with year 2 reading and math achievement as the criterion, regression
analyses explored the incremental prediction of the ability tests when year-one achievement
scores are available. The order of entry for predictor variables was based on prior research,
which indicates that the best predictor of future achievement is prior achievement followed by
the ability to reason in the domain and then by general reasoning skills (Lohman, 2009). Thus,
year-one achievement scores entered first, followed by domain-relevant, year-one ability tests
Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 10
scores (verbal or quantitative), and finally nonverbal ability scores. Variables for ethnicity (1 =
Hispanic, 0 = non-Hispanic) and ELL status (1 = ELL; 0 = non-ELL) were then entered as a
block. Finally, interaction terms of the ability scores with ELL status and Hispanic background
were entered as a block. To explore the utility of nonverbal tests for ELL students, a separate
series of regressions compared variance accounted for with different combinations of predictors.
Design
Interaction terms and regression residuals form the basis for analyzing differences in the
magnitude of the relationships between the predictor tests and the achievement criterion tests. In
the predictive bias framework originally outlined by Cleary (1968; see also Cleary, Humphreys,
Kendrick, & Wesman, 1975), two levels of differential prediction were defined. One type of
differential prediction was defined by an interaction of group membership with predictors in the
regression analysis and reflected bias in the slope of the regression lines. Differences in
regression slopes indicate that the predictors being used are less relevant to the criterion for one
group versus another. In this study, an interaction of the ability test scores with ethnicity or ELL
status might indicate that the tests are less predictive of achievement for those students.
Cleary (1968) defined another type of differential prediction as persistent under- or over-
prediction for one group. This form of differential prediction is detected by analyzing regression
residuals for evidence that one group’s observed criterion scores are significantly higher or lower
than the model predicts (Reynolds, 1982). In the absence of an interaction of group membership
with predictor variables, differences in residuals indicate that the regression slopes for two
groups are nearly parallel, but do not coincide. For this type of differential prediction with
parallel regression lines, Cleary et al. (1975) explained, “the test can be used within each group
with the same accuracy of prediction” (p. 27; see also Reynolds, 1982).
Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 11
Results
Descriptive statistics are reported in Table 2. Mean differences were large between the
ELL and non-ELL groups (-1.0 to -2.1SD). The differences were largest for verbal reasoning and
math achievement and somewhat smaller for quantitative reasoning and year-one reading
achievement. Nonverbal reasoning scores and reading achievement in year two showed the
smallest differences, though they were still substantial. Mean differences between the two non-
ELL groups were much smaller. Only verbal reasoning and year-one reading showed moderate
[Table 2]
range can attenuate correlations with other variables. In Table 2, the ratios of variance are
reported for each test. As an example, on the quantitative battery, the variance ratio of 1.7 for
ELL and non-ELL Hispanic groups indicated that the variance of non-ELL Hispanic students
was 70% greater than the variance for ELL Hispanic students. Across the board, non-ELL
students were much more variable than ELL students were and White students were more
variable than Hispanic students were. Despite this finding, there was no apparent floor effect in
the histograms of test scores. The data for all three groups also satisfied Bracken’s (2007)
heuristic for floor effects in that the range of scores extended above and below the mean by
2SDs.
Patterns of Correlations
Hispanic ELL students had substantially lower correlations between tests, which may be
related to their restricted variability in scores. See Table 3. Despite this, the pattern of
correlations between achievement and ability tests were consistent with previous research. For
Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 12
all three groups of students, math achievement correlated most strongly with quantitative
reasoning and reading achievement correlated most strongly with verbal reasoning. Even for
ELL students, nonverbal ability scores had significantly lower correlations 1 with year-one
achievement than verbal had with reading and quantitative had with math. Furthermore, the
relationship between the ability scores and achievement remained strong through year two.
[Table 3]
The strong correlations between ability scores and year-two achievement highlight their
relevance to future academic success. However, ability tests can also provide incremental
prediction beyond the data that schools already have—namely, previous achievement test scores.
Thus, a series of regression models tested the incremental prediction of year-two achievement
Math Achievement
Year-one math achievement accounted for 64% of the variance in year-two math
accounted for, and nonverbal added an additional 1%. When ELL status and ethnicity (White vs.
Hispanic) entered the model, they did not account for an appreciable amount of variance.
Interaction variables between ethnicity and the ability scores also did not contribute to
prediction, indicating that the regression slope was the same for all three groups.
Reading Achievement
Year-one reading achievement accounted for 70% of year-two achievement. See Table 5.
Verbal reasoning added an additional 1% to the variance explained, but nonverbal reasoning
1
Using a Fisher r-to-z transformation (p<.05). See Hays (1994).
Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 13
failed to improve prediction any further. ELL status and ethnicity accounted for a significant but
negligible amount of variance (less than 1%). Coefficients in the final model indicated that
neither coefficient was significant. Inspection of the coefficients before the interactions were
added indicated that ELL status had a slight negative effect on achievement (b = -.10).
Interactions between ELL status and test scores failed to add significantly to the prediction of
[Table 5]
residuals across groups were analyzed as a one-way ANOVA to detect consistent under- or over-
prediction for one group (Reynolds, 1982). The same regression analyses for reading and
mathematics achievement were repeated and residuals recorded without the effects for ELL and
ethnicity status included. Means and SDs are reported in Table 6. For math achievement, there
was no main effect for residuals, indicating that the three groups of students did not vary
significantly in the fit of the regression model. For reading achievement, however, there was a
significant effect (F [2, 373] = 3.80, p < .025). Follow-up tests using Tukey’s comparisons
indicated that there was significant, though slight, under-prediction of reading achievement for
Hispanic, non-ELL students in year two of around 6 points on the reading achievement scale (an
effect size of about .17). On average, both White non-ELL and Hispanic ELL students showed
[Table 6]
Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 14
Given the lower correlations between ability and achievement for ELL students, and the
arguments by some researchers that nonverbal tests provide more valid information about the
abilities of ELL students, we explored combinations of ability scores were explored to see if they
improved predictive validity for ELL students. See Table 7. For math achievement, quantitative
reasoning added the most predictive variance: 42% when entered first and adding 14%
incrementally when nonverbal was added first. For reading achievement, verbal reasoning
adding the most predictive variance: 28% when entered first and adding 13% incrementally
when nonverbal was added first. When entered first, nonverbal ability accounted for just 30% of
variance for math achievement and 17% of variance for reading achievement. When entered
[Table 7]
Discussion
The research questions addressed (1) the presence of mean differences, (2) the pattern of
correlations between ability and achievement tests across groups, and (3) the interaction of
nonverbal tests with ELL and Hispanic group membership. Large mean differences were found
between the observed test scores for ELL and non-ELL students, while small-to-negligible
differences were found between Hispanic and White non-ELL students. For math achievement,
these differences translated into a small, but significant, positive main effect for Hispanic
students in the regression analysis indicating that their year 2 achievement scores were higher
than those for White and ELL students with similar achievement and ability scores in year 1. For
reading achievement, the tests indicated a small negative main effect of ELL status indicating
Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 15
that ELL students’ scores were lower than for the other two groups when controlling for prior
An interaction between ELL or Hispanic variables and ability test scores would indicate
differential prediction between the three groups. However, none of the interaction terms entered
in the final step of the regression analyses. were statistically significant. This finding indicates
that the same test variables are similarly important to the prediction of later achievement for all
three groups of students. This conclusion is further supported by the table of observed test
correlations, which showed that the same ability tests were most important for predicting
achievement in all three groups. One contradictory finding came from the analysis of residuals,
which revealed that Hispanic non-ELL students’ reading achievement was somewhat under-
Separate analyses explored whether nonverbal ability scores were particularly important
in predicting achievement for ELL students. For math achievement, nonverbal tests were clearly
inferior to quantitative tests in predicting year-two achievement. For reading achievement, verbal
ability was clearly the best predictor for ELL students. NonverbalIn contrast to the recommended
use of nonverbal tests for ELL students, nonverbal ability scores did not appear to provide
similar predictive validity compared to quantitative or verbal ability and did not add much
incremental prediction beyond those scores even for ELL students. Thus, although nonverbal
ability tests can play an important role as part of an assessment battery, their relationship to
current and future achievement is not as strong as for verbal and quantitative ability tests.
Therefore, for teachers seeking guidance on how best to adapt instruction to the cognitive
strengths of their ELL students, this study provides evidence that, overall, nonverbal tests do not
provide superior information about the cognitive strengths and academic promise of ELL
Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 16
students. As with non-ELL students, the most relevant information comes from verbal reasoning
It would be reasonable to expect these results to generalize to other nonverbal ability tests
that are primarily unidimensional. The CogAT nonverbal battery consists of three item formats:
figure analogies, figure classification, and paper folding. The figure analogies format is related to
the item formats used by the Naglieri Nonverbal Ability Test and Raven’s Progressive Matrices
and shows strong convergent validity with those tests (Lohman, Korb, & Lakin, 2008). On this
basis, it is reasonable to assume that these findings would generalize to those tests.
The consistency of the regression slope between ELL and non-ELL students indicated
that the tests provide similar information about the future achievement of all three groups of
students. For educators seeking to differentiate instruction, verbal and quantitative reasoning
tests show equally strong predictive accuracy for reading and mathematics achievement,
respectively. In this study, nonverbal measures did not provide an effective alternative and were
less useful for making decisions about which students are most likely to succeed in traditional
academic domains relative to other students with similar linguistic and cultural backgrounds.
The main effects of ELL status for reading achievement and Hispanic background for
math achievement in addition to the slight underprediction of reading achievement for non-ELL
Hispanic students indicate that the use of those scores requires careful interpretation. As Cleary
et al. (1975) explained, despite the presence of main effects in the regression (or mean
differences in observed scores), “when the [regression] lines are parallel, the test can be used
within each group with the same accuracy of prediction” (p. 27; see also Reynolds, 1982).
Recently, there have been innovations in making appropriate inferences about ability when using
Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 17
tests that are affected by opportunity to learn or access to the curriculum. This is discussed in the
next section.
The common misconception that ability tests measure innate intelligence often leads to
the (mistaken) conclusion that mean differences must either be interpreted as immutable group
differences in intelligence or test bias (Jensen, 1980; Lohman, 2006a). In fact, ability tests
measure developed capabilities that are impacted by educational experience and opportunity to
learn (Anastasi, 1980; Martinez, 2000). This does not negate their utility for making inferences
about students’ intellectual capacity as long as opportunity to learn is taken into account. In fact,
comparing the performance of ELL students to appropriate norm groups (i.e., those with similar
opportunities to learn) is critical for making valid inferences about the cognitive abilities of ELL
students (Author, 2010). Comparing ELL students to national norms based on predominantly
non-ELL students will not provide appropriate inferences about the skills of ELL students.
Two strategies have recently been suggested to account for group differences that likely
reflect different degrees of opportunity to learn. Lohman (2006b, 2009) proposed the use of local
subgroup norms to provide a rudimentary adjustment for opportunity to learn when identifying
students for gifted programs and talent development. Weiss, Saklofske, Prifitera, and Holdnack
(2006) used national subgroup norms based on proxies for acculturation, including years in U.S.
schools, to provide multiple perspectives on student scores for the WISC-IV. Contextualizing
student scores with multiple norm comparisons can identify students from minority cultural or
linguistic backgrounds who excel relative to their educational opportunities even when they may
not compare favorably to the national norms (Callahan, 2009; Gándara, 2005; Weiss et al.,
2006). Lohman (2006b, 2009) provides practical guidance as to how local norms can be
Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 18
developed and used by teachers for the identification of students for gifted and talented
programs. Additional research is needed o expand the use of local norms to instructional
differentiation as well as to explore the practicality and political feasibility of these solutions.
Instructional differentiation
Although in this study a multi-battery test has been found to provide useful information
about the cognitive abilities of ELL and non-ELL students, it does not follow that all students
identified with, for example, verbal strengths require the same instructional interventions
(Callahan, 2009). Recommendations for non-ELL students are already available (Lohman &
differentiation for ELL students. 2 In fact, a wide range of research is needed to guide teachers’
use of assessment data to make appropriate educational decisions for ELL students (Young,
2009).
Limitations
Although no evidence of significant differential prediction was found in this study, there
may be other undetected sources of bias. For instance, if there is bias in both the predictor and
criterion, it will not affect the correlations (Cronbach, 1970). Given the central role that
achievement tests play in the modern educational system, bias in the criteria of this study
(reading and math achievement) deserve critical analysis that is beyond the scope of this paper.
Another important limitation is the unusual ethnic makeup of this study. Less than one-
third of the sample was White, which makes their relative weight in determining the shape of the
common regression line smaller than it would be in schools with a majority of White students.
2
It should be noted that efforts to capitalize on the apparent nonverbal strengths of ELL students (sometimes
misconstrued as spatial strengths) neglect the impact of opportunity to learn on ability scores. Many ELL students
may in fact have relative strengths in verbal and quantitative reasoning that are obscured by use of national norms.
Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 19
However, the regression lines for all three groups were nearly identical. To the extent that the
White students in this sample are similar to the population of White students in the U.S. as a
whole, there is no reason to expect that the common regression line would be much different if
Finally, it should be noted that the choice of assessment should depend on the type of
instructional differentiation being considered (Callahan, 2009). This study focused on traditional
academic domains and thus the CogAT was appropriate for predicting success. However, if a
talent development program were targeting skills beyond general reasoning and verbal and
quantitative domains, other tests might be more appropriate. Multiple indicators of student
aptitude are always critical to making decisions about gifted and talented program placement.
Conclusion
This study confirmed that within ELL groups and Hispanic and White ethnic groups,
multi-battery ability tests provide useful and valid information about the future performance of
students. The exclusive use of nonverbal tests does not appear warranted when assessing ELL
students with some level of English proficiency and when interpreting scores using appropriate
normative comparisons. In fact, assessing the verbal reasoning skills of ELL students may be
particularly helpful for teachers. Verbal reasoning skills, which include the ability to make sense
of incomplete verbal information, is critical for the academic success of ELL students who must
constantly leverage these skills to make sense of teachers, other students, and reading materials.
Knowledge about which students struggle to make connections within verbal information may
help teachers target those students for additional linguistic support. Although verbal reasoning
scores for ELL students have limitations in their psychometric qualities relative to scores for
Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 20
non-ELL students, they are still very useful in this regard. Efforts to improve these measures and
References
Abedi, J., & Lord, C. (2001). The language factor in mathematics tests. Applied Measurement in
Anastasi, A. (1980). Abilities and the measurement of achievement. New Directions for Testing
Author (2010). Multidimensional ability tests and culturally and linguistically diverse students:
Balboni, G., Naglieri, J.A., & Cubelli, R. (2010). Concurrent and predictive validity of the Raven
Bracken, B. A. (2007). Creating the optimal preschool testing situation. In B. A. Bracken, & R. J.
Nagle (Eds.), Psychoeducational assessment of preschool children (4th ed., pp. 137-154).
Bracken, B.A., & McCallum, R.S. (1998). Universal Nonverbal Intelligence Test examiner’s
Callahan, C.M. (2009). Myth 3: A family of identification myths: Your sample must be the same
as the population. There is a "silver bullet" in identification. There must be "winners" and
Cleary, T.A. (1968). Test bias: Prediction of grades of Negro and White students in integrated
Cleary, T.A., Humphreys, L.G., Kendrick, S.A., & Wesman, A. (1975). Educational uses of tests
Cronbach, L. J. (1970). Essentials of psychological testing (3rd ed.). New York: Harper & Row.
Gándara, P. (2005). Fragile futures: Risk and vulnerability among Latino high achievers.
Harcourt Educational Assessment. (2003). Stanford English Language Proficiency Test. San
Jensen, A. R. (1980). Bias in mental testing. New York, NY: The Free Press.
Jones, C.K. (2006). The relationship of language proficiency, general intelligence, and reading
Kaufman, A.S., & Kaufman, N.L. (2004). Kaufman Assessment Battery for Children, Second
Lakin, J.M., & Lai, E.R. (in press). Multi-group generalizability analysis of verbal, quantitative,
and nonverbal ability tests for culturally and linguistically diverse students. Educational
Lakin, J.M., & Lohman, D.F. (in press). The predictive accuracy of verbal, quantitative, and
nonverbal reasoning tests: Consequences for talent identification and program diversity.
Lewis, J. D. (2001). Language isn't needed: Nonverbal assessments and gifted learners. Growing
Lohman, D. F. (2001, November). Aptitude for college: The importance of reasoning tests for
minority admissions. Talk given at Rethinking the SAT: The future of standardized testing
http://faculty.education.uiowa.edu/dlohman/
Lohman, D. F. (2006a). Beliefs about differences between ability and accomplishment: From
Lohman, D. F. (2006b). Practical advice on using the Cognitive Abilities Test as part of a talent
Lohman, D.F. (2009). Identifying academically talented students: Some general principles, two
Lohman, D. F., & Hagen, E. P. (2001a). Cognitive Abilities Test (Form 6). Itasca, IL: Riverside.
Lohman, D. F. & Hagen, E. P. (2001b). Cognitive Abilities Test (Form 6): Interpretive guide for
Lohman, D. F., & Hagen, E. P. (2002). Cognitive Abilities Test (Form 6): Research handbook.
Lohman, D. F., Korb, K. A., & Lakin, J. M. (2008). Identifying academically gifted English-
language learners using nonverbal tests: A comparison of the Raven, NNAT, and CogAT.
Erlbaum Associates.
Naglieri, J. A. (1996). Naglieri Nonverbal Ability Test (NNAT). San Antonio, TX: Harcourt
Naglieri, J. A., & Ford, D. Y. (2003). Addressing underrepresentation of gifted minority children
using the Naglieri Nonverbal Ability Test (NNAT). Gifted Child Quarterly, 47, 155-160.
Naglieri, J. A., & Ronning, M. E. (2000). The relationship between general ability using the
Naglieri Nonverbal Ability Test (NNAT) and Stanford Achievement Test ( SAT) reading
Ortiz, S. O., & Dynda, A.M. (2005). Use of intelligence tests with culturally and linguistically
Intellectual Assessment: Theories, Tests, and Issues (2nd ed., pp. 545-556). New York:
Guilford Press.
Palmer, D.J., Olivarez, A., Willson, L.V., & Fordyce, T. (1989). Ethnicity and language
Patterson, B.F., Mattern, K.D., & Kobrin, J.L. (2007). Validity of the SAT for predicting FYGPA:
2007 SAT validity sample [Statistical Report]. New York, NY: College Board.
Raven, J. C., Court, J. H., & Raven, J. (1996). Manual for Raven’s Progressive Matrices and
Psychologists Press.
Reynolds, C.R. (1982). Methods for detecting construct and predictive bias. In R.A. Berk,
Handbook of methods for detecting test bias (pp. 199-227). Baltimore, MD: Johns
Sattler, J.M. (2008). Assessment of children: Cognitive foundations (5th edition). La Mesa, CA:
Author.
Wechsler, D. (1974). Wechsler Intelligence Scale for Children-Revised (WISC-R). New York:
Psychological Corporation
Weiss, L. G., Saklofske, D.H., Prifitera, A., & Holdnack, J. A. (2006). WISC-IV Advanced
Young, J.W. (2009). A Framework for Test Validity Research on Content Assessments Taken by
Table 1
Percent
Total Home Grade Grade
Ethnicity N Female FRL lang. 3 4
Hispanic ELL 128 45 98 100 45 35
Hispanic non-ELL 161 55 94 16 20 38
White non-ELL 72 44 44 4 19 36
Note. FRL = Eligible for free or reduced lunch price. Home lang. = Primary home language other
than English
Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 27
Table 2
Means (SDs)
ELL
prog.
Grade yrs CogAT AIMS DPA
Non- Y1 Y1 Y2 Y2
Verbal Quant. verbal Rdg Math Rdg Math
Hispanic ELL 3.9 3.8 150.7 160.0 173.0 419.3 409.6 450.1 431.5
N=124 (1.5) (0.8) (11.2) (14.0) (17.7) (32.0) (31.1) (39.0) (31.6)
Hispanic non-ELL 4.2 177.5 182.0 192.7 468.4 470.5 498.1 485.0
N=161 (0.8) (16.9) (18.3) (17.8) (39.8) (37.3) (44.5) (34.7)
White non-ELL 4.3 190.4 186.0 197.4 487.6 481.2 503.1 489.9
N=72 (0.8) (24.5) (21.4) (21.8) (50.1) (49.4) (59.9) (41.6)
Cohen's d effect sizes
Hispanic ELL –
Hispanic non-ELL -1.9 -1.4 -1.1 -1.4 -1.8 -1.1 -1.6
Hispanic ELL -
White non-ELL -2.1 -1.4 -1.2 -1.6 -1.7 -1.0 -1.6
Hispanic - White
non-ELL -0.6 -0.2 -0.2 -0.4 -0.2 -0.1 -0.1
Variance Ratios
Hispanic non-ELL/
Hispanic ELL 2.26 1.70 1.01 1.55 1.44 1.30 1.20
White non-ELL/
Hispanic ELL 4.78 2.34 1.51 2.44 2.52 2.36 1.74
White/Hispanic
non-ELL 2.12 1.37 1.49 1.58 1.75 1.81 1.44
Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 28
Table 3
Correlations Between Tests in Year 1 and 2 Across ELL and Ethnic Groups
Table 4
Table 5
Table 6
Mathematics Reading
M SD M SD
Hispanic ELL 0.97 27.83 -3.08 22.43
Hispanic non-ELL 2.16 25.33 3.71 22.4
White non-ELL -6.31 33.23 - 2.56 26.5
Running head: ABILITY TESTS AND LINGUISTICALLY DIVERSE GROUPS 32
Table 7