Professional Documents
Culture Documents
Motor Speech Asses - For Children
Motor Speech Asses - For Children
Purpose: In this article, the authors report reliability and validity intrajudge reliability, and 91% for interjudge reliability.
evidence for the Dynamic Evaluation of Motor Speech Skill Agglomerative hierarchical cluster analysis showed that total
(DEMSS), a new test that uses dynamic assessment to aid in DEMSS scores largely differentiated clusters of children with
the differential diagnosis of childhood apraxia of speech (CAS). CAS vs. mild CAS vs. other speech disorders. Positive and
Method: Participants were 81 children between 36 and 79 negative likelihood ratios and measures of sensitivity and
months of age who were referred to the Mayo Clinic for specificity suggested that the DEMSS does not overdiagnose
diagnosis of speech sound disorders. Children were given the CAS but sometimes fails to identify children with CAS.
DEMSS and a standard speech and language test battery as Conclusions: The value of the DEMSS in differential
part of routine evaluations. Subsequently, intrajudge, diagnosis of severe speech impairments was supported on
interjudge, and test–retest reliability were evaluated for a the basis of evidence of reliability and validity.
subset of participants. Construct validity was explored for all
81 participants through the use of agglomerative cluster
analysis, sensitivity measures, and likelihood ratios. Key Words: childhood apraxia of speech, dynamic
Results: The mean percentage of agreement for 171 assessment, test development, differential diagnosis,
judgments was 89% for test–retest reliability, 89% for pediatric motor speech disorders, speech sound disorders
P
ediatric speech sound disorders (SSDs) result from a have little or no functional verbal communication but can at
variety of etiologies and represent impairment at a least attempt imitation.
number of different levels of speech production, The DEMSS is specifically designed to identify those
including linguistic–phonologic and/or motor speech. One of children who have difficulty with praxis for speech. This level
the many challenges in differential diagnosis of SSD is of motoric difficulty is now most commonly designated by
determining to what degree motor speech impairment (i.e., the label childhood apraxia of speech (CAS; e.g., American
deficits in planning–programming movement sequences Speech-Language-Hearing Association [ASHA], 2007;
and/or executing the movement) contributes to the child’s Terband & Maassen, 2010). Isolating deficits in the ability to
SSD. In this article, we report evidence regarding the plan and program movement transitions between articula-
reliability and validity of the Dynamic Evaluation of Motor tory postures for volitional speech in children with SSD is
Speech Skill (DEMSS), a new instrument intended to address difficult, in part because speech and language processes are
this challenge. The DEMSS is intended to assist in interactive (Goffman, 2004; Kent, 2004; Strand, 1992).
differential diagnosis of SSD in younger children and/or Because these motor deficits often occur along with deficits
those with more severe impairments, including children who in phonology, symptomatology may reflect a combination of
linguistic (phonologic) and motor speech deficits (CAS and/
or dysarthria; Crary, 1993; Rvachew, Hodge, & Ohberg,
2005; Smith & Goffman, 2004). In the absence of a
a
Mayo Clinic, Rochester, Minnesota physiologic marker for CAS (Shriberg, 2003; Shriberg,
b
The Ohio State University, Columbus Aram, & Kwiatkowski, 1997), clinicians are left to make this
Correspondence to Edythe A. Strand: strand.edythe@mayo.edu diagnosis on the basis of behavioral characteristics.
Editor: Jody Kreiman A number of possible discriminative behavioral
Associate Editor: Julie Liss characteristics for CAS have been proposed (e.g., Davis,
Received March 24, 2012 Jakielski, & Marquardt, 1998; Ozanne, 1995; Shriberg et al.,
Revision received July 9, 2012 2003). In an effort to encourage greater uniformity in
Accepted August 18, 2012 methods used for diagnosis, treatment, and study of CAS,
DOI: 10.1044/1092-4388(2012/12-0094) ASHA (2007) undertook an examination of this area of
Journal of Speech, Language, and Hearing Research N Vol. 56 N 505–520 N April 2013 N ß American Speech-Language-Hearing Association 505
practice. On the basis of an extensive review of the literature this report is designed to describe our efforts in developing an
and expert testimony, authors of the resulting position MSE supported by more extensive evidence of reliability and
statement came to the following conclusion: validity.
Reliability and validity. Reliability evidence is funda-
[T]hree segmental and suprasegmental features that are
mental to a test’s development because measures of reliability
consistent with a deficit in the planning and
estimate the degree to which a test is vulnerable to various
programming of movements for speech have gained
sources of error (e.g., variation due to readministrations,
some consensus among investigators in apraxia of
testers, or within-tester inconsistencies), which constitute
speech in children: (a) inconsistent errors on
threats to the test’s validity. This is especially true for an
consonants and vowels in repeated productions of
MSE for children because of the high potential for error
syllables or words, (b) lengthened and disrupted co-
presented by the nature of young children and their response
articulatory transitions between sounds and syllables;
to the artificialities of speech testing (Kent, Kent, &
and (c) inappropriate prosody, especially in the
Rosenbek, 1987).
realization of lexical or phrasal stress. (p. 1)
Validity refers to the degree a test measures what it
Although these characteristics are broadly endorsed, purports to measure. Validity evidence for a test’s use in
the position statement contains the acknowledgment that no diagnosis—that is, evidence supporting its contribution to
list of specific characteristics has yet been empirically accurate diagnosis—can be provided using several different
validated for differential diagnosis. approaches (Downing & Haladyna, 2006). Contrasting
Although ASHA’s (2007) position statement and other groups and correlations between the test under study and a
frequently cited publications (e.g., Davis et al., 1998) posit gold standard (i.e., acknowledged valid measure) are
behavioral characteristics as discriminative for the diagnosis particularly common methods (McCauley, 2001). Cluster
of CAS, they do not directly lead to assessment procedures. analysis—a statistical method that divides data into mean-
Typical strategies used in the clinical assessment of SSDs ingful groups sharing common characteristics—may provide
include taking a thorough history describing phonetic and an equally compelling method for exploration of a test’s
phonemic inventories, administering oral structural–func- validity for use in diagnosis. In studies of communication
tional examinations, as well as giving articulation tests and/ disorders, researchers have used cluster analysis primarily to
or measures of phonologic performance. Tests of articulation (a) probe constructs such as the existence of homogeneous
and phonology are designed for children who have at least a subgroups within broader diagnoses, including autism
rudimentary speech sound inventory. Yet, there is a need for spectrum disorders (e.g., Verté et al., 2006), specific language
a tool to examine the performance of children who are very disorders (e.g., Conti-Ramsden & Botting, 1999), and SSDs
young and/or very limited in their speech production skills, (Arndt, Shelton, Johnson, & Furr, 1977; Johnson, Shelton, &
including those who are nonverbal. Arndt, 1982; Peter & Stoel-Gammon, 2008) and (b) identify
clusters of co-occurring speech and nonspeech characteristics
Motor Speech Examination in CAS (Ball, Bernthal, & Beukelman, 2002).
When used in test development, cluster analysis allows
One type of assessment tool that may be adapted for one to begin with a heterogeneous group of children (e.g.,
use with these younger children is a motor speech examination children with SSDs) and determine to what extent the test
(MSE), which is frequently used with adults to determine the and its components (e.g., subscores) can identify meaningful
presence of, or to rule out difficulty with, speech motor subgroups—for example, ones that might be related to motor
planning and programming (Duffy, 2005; McNeil, Robin, & speech status or other unanticipated participant variables.
Schmidt, 2009; Yorkston, Beukelman, Strand, & Hackel, When test-determined clusters match other bases for
2010). It allows the clinician to observe speech production participant groupings (e.g., diagnosis using different
across utterances that vary in length and phonetic complexity methods), it provides valuable evidence of the test’s construct
using hierarchically organized stimuli—conditions that (diagnostic) validity. This report on the DEMSS makes use
systematically vary programming demands. It also allows the of this type of evidence.
clinician to make observations of behaviors frequently
associated with deficits in speech praxis, including consonant
and vowel distortions due to lengthened and/or distorted The Dynamic Evaluation of Motor Speech Skill
movement transitions, timing errors, dysprosody, and As a motor speech examination, the DEMSS system-
inconsistency across repeated trials. However, MSEs are less atically varies the length, vowel content, prosodic content,
frequently used in assessing child SSDs (Skahan, Watson, & and phonetic complexity within sampled utterances. Thus, it
Lof, 2007), and, when used, they often lack evidence of is not intended to serve as yet another articulation test or test
psychometric quality (e.g., Guyette, 2001; McCauley, 2003; of phonologic proficiency, both of which typically sample all
McCauley & Strand, 2008). Specifically, McCauley and segments in a language. Rather the DEMMS is designed
Strand (2008) found that of six published tests designed to specifically to examine the speech movements of younger
assist in the diagnosis of motor speech disorders, only one— children and/or children who are more severely impaired,
the Verbal Motor Production Assessment for Children even those who may not yet produce many sounds, syllables,
(Hayden & Square, 1999)—provided evidence of validity, or words. Therefore, instead of sampling all American
and none provided adequate evidence of reliability. Thus, English speech sounds, the DEMSS focuses on earlier
506 Journal of Speech, Language, and Hearing Research N Vol. 56 N 505–520 N April 2013
developing consonant sounds paired with a variety of vowels segmentation, timing errors, or other characteristics asso-
in several earlier developing syllable shapes. The stimuli and ciated with CAS that are frequently not evident in
the scoring system are designed to allow examination of spontaneous utterances or in noncued repetitions.
characteristics that have been frequently posited or reported The cuing used in dynamic assessment has the
in the literature to be associated with CAS (e.g., ASHA, potential to facilitate judgments of severity and therefore
2007; Campbell, 2003; Davis et al., 1998; Strand, 2003), prognosis, as well as to facilitate treatment planning. For
including lengthened and disrupted coarticulatory transitions example, if a child consistently needs considerable cuing to
between sounds and syllables (articulatory accuracy), correctly produce a target or never produces it correctly
inconsistency of errors on consonants and vowels across despite cuing, his or her problem is seen as more severe and
repeated trials, and prosodic accuracy such as lexical stress. the prognosis for rapid improvement as more guarded (Peña
The DEMSS comprises nine subtests (see Table 1), et al., 2006). Treatment planning is facilitated in that the
consisting of 66 utterances. These 66 utterances are types of cues that proved helpful during the administration
associated with 171 items (i.e., judgments) that contribute to of the test suggest cuing strategies that are likely to be useful
four types of subscores (overall articulatory accuracy of the in treatment. Further, reviewing errors on specific vowels
word, vowel accuracy, prosodic accuracy, and consistency). and across particular syllable shapes facilitates choices of
Judgments of overall articulatory accuracy are made for all content and complexity of early stimulus sets. Because the
66 utterances, judgments of vowel accuracy are made for 56 entire DEMSS uses dynamic assessment, judgments of
of these utterances, judgments of prosodic accuracy (viz., severity and prognosis are facilitated beyond tools currently
lexical stress accuracy) are made for 21 of the utterances, and available.
judgments of consistency are made for 28 of the utterances. During administration of the DEMSS, the clinician
The DEMSS total score is the sum of the four subscores asks the child to imitate a series of words, with the child’s
(overall articulatory accuracy, vowel accuracy, prosodic eyes directed to the clinician’s face as much as possible.
accuracy, and consistency). Items are scored either during Depending on the child’s initial imitation of an utterance, the
testing or from a videotape sample following administration clinician may elicit one or more additional imitative attempts,
of the test. with various levels of cuing, before scoring is completed.
The DEMSS uses dynamic assessment (Bain, 1994; Visual, temporal, and tactile cues are implemented to help the
Glaspey & Stoel-Gammon, 2007; Lidz & Peña, 1996), in child improve accuracy of production over repeated trials,
which multiple attempts are elicited for scoring as the with multidimensional scoring reflecting his or her respon-
clinician uses cues and other strategies (e.g., slowed rate or siveness to cuing. (Please refer to the Appendix for examples
simultaneous production) designed to facilitate performance. of the multidimensional scoring.)
Dynamic assessment may be especially important in asses- Table 2 describes basic rules for assigning specific
sing SSDs. For some children who have few sounds, scores within the four subscores—overall articulatory
syllables, or words, even modest support may allow them to accuracy, vowel accuracy, prosody (lexical stress accuracy),
produce the utterance more accurately, showing emerging and consistency (with higher scores reflecting poorer
skills. Further, when the child is attempting to imitate performance). Only vowel accuracy and prosody are always
specific speech movements with cuing, he or she may increase scored on the first attempt at an utterance. Overall
attention and/or effort toward achieving a particular spatial articulatory accuracy may be scored based on subsequent
or temporal target. This allows observation of groping, trials. Consistency is always scored after all attempts on an
Note. Only items within specific utterance types contribute to subscores for consistency, vowel accuracy, or prosodic accuracy.
508 Journal of Speech, Language, and Hearing Research N Vol. 56 N 505–520 N April 2013
Table 3. Participant characteristics ordered by DEMSS score within Table 3. (Continued).
diagnostic group.
Age
Age Group ID DEMSS GFTA Sex (mos) PPVT RLS ELS
Group ID DEMSS GFTA Sex (mos) PPVT RLS ELS
Mild CAS 66 54 56 M 56 119 99 123
Non-CAS 6 0 97 F 72 93 100 75 57 74 55 F 63 99 94 85
8 0 105 F 62 114 105 115 16 99 47 M 71 86 69 65
65 4 69 M 59 126 119 115 9 112 50 M 68 90 84 70
59 4 72 M 62 119 94 108 30 123 62 F 45 123 90 92
50 4 97 M 50 101 81 95 28 145 40 F 69 84 74 75
45 6 61 M 72 77 93 93 67 159 65 M 44 83 73 72
17 7 92 F 54 120 102 89 56 159 81 M 37 90 — —
5 7 97 M 61 108 85 87 Severe CAS 41 46 46 M 69 102 103 99
75 10 83 M 62 119 122 101 25 73 58 M 48 85 76 86
43 11 80 M 52 123 110 95 70 161 < 40 M 71 80 — 64
42 11 86 M 53 115 110 103 82 212 71 M 53 106 89 74
49 12 106 M 49 97 87 93 39 232 78 M 38 90 90 80
35 13 97 M 61 87 82 84 40 237 40 M 74 88 92 —
2 17 80 M 66 110 108 100 64 240 40 M 70 75 59 64
61 18 84 M 38 89 81 80 22 260 52 F 45 101 87 79
14 18 91 M 51 100 95 95 55 270 40 F 79 68 — —
47 19 77 M 60 77 85 80 77 328 60 M 84 64 50 50
80 20 54 M 63 109 97 80 53 337 59 M 37 98 86 90
81 20 77 M 46 101 103 93 33 425 40 M 43 95 84 57
24 20 91 M 64 111 97 102
54 21 106 M 39 129 112 122 Note. Em dashes indicate that child attention or lack of time
23 22 71 F 53 99 104 103 prevented completion of the entire test and of scoring. ID =
36 22 88 F 40 110 116 114 identification; GFTA = standard score on the Goldman Fristoe Test
21 22 92 M 64 94 78 73 of Articulation—Second Edition; PPVT = Peabody Picture
26 23 90 F 58 — 85 84 Vocabulary Test; RLS = Receptive Language standard score on the
72 24 40 M 75 92 97 73 Oral and Written Language Scales or the Preschool Language
46 25 46 F 61 — 85 99 Scales; ELS = Expressive Language standard score on the Oral and
38 25 85 M 45 98 98 92 Written Language Scales or the Preschool Language Scales; CAS =
78 27 60 M 61 112 115 88 childhood apraxia of speech; M = male; F = female.
73 28 77 F 45 119 107 92
83 29 73 M 47 103 107 96
19 30 80 M 44 93 90 69
12 32 78 M 48 95 92 86 of the four assessing clinicians. Two clinicians (fourth and
18 34 95 F 44 106 86 92 fifth authors) underwent training on the test. The test’s
44 38 45 M 72 93 88 87 developer (first author) spent one session with them,
15 41 107 M 39 126 96 100
27 42 49 M 67 78 58 70
explaining and demonstrating the administration and scor-
29 42 78 M 40 102 104 100 ing. The clinicians were also given a sheet of scoring
74 45 78 M 47 77 88 74 instructions. The first author then co-administered or
63 46 70 M 56 63 73 67 observed video recordings of at least five of each of their
62 50 78 M 45 118 107 110
administrations prior to beginning the study. Feedback was
10 52 80 M 64 63 67 63
48 54 82 M 56 92 91 78 provided on both methods of elicitation and scoring. For
60 55 78 M 57 102 83 78 those children assessed by the fourth assessing clinician, the
52 57 73 M 56 95 95 80 DEMSS was administered by the first author, the fourth
11 62 69 M 73 70 69 59 author, or the fifth author immediately following their
76 62 76 M 45 91 90 83
3 63 83 F 39 109 114 112 evaluation.
34 82 81 M 36 92 92 102 Given that the study was conducted in the context of
1 82 93 M 39 97 94 81 standard clinical practice at the Mayo Clinic, three of the
58 83 62 F 54 112 105 93 four clinicians who conducted the clinical exam and speech
32 85 82 F 53 77 — —
4 95 60 M 71 103 78 86 diagnosis also administered the DEMSS as part of the
71 101 56 F 47 103 104 86 assessment protocol. In order to reduce possible bias, the
20 103 81 M 36 94 85 86 subscores (overall articulatory accuracy, vowel accuracy,
31 106 71 M 39 108 96 100 prosodic accuracy, and consistency) and total scores on the
7 109 84 M 46 77 — —
13 113 60 M 45 110 113 119 DEMSS were not calculated prior to dictating the report,
51 120 49 M 61 88 81 70 which was done soon after the assessment session. These
37 134 69 M 43 85 86 82 reports included a statement of differential diagnosis based
79 205 40 M 56 83 76 82 on the examiner’s clinical observations and examination of
(table continues) language and articulation test results. Although observations
510 Journal of Speech, Language, and Hearing Research N Vol. 56 N 505–520 N April 2013
0.5, 0.6, 1.1} and for Participant B were {0.7, 0.2, 0.5, 0.0}, subtest is the score that separates scores considered indicative
the dissimilarity between the two participants would be of disorder from those that are considered indicative of no
calculated as the square root of (0.3 – 0.7)2 + (0.5 – 0.2)2 + disorder. The cut points used in this study were empirically
(0.6 – 0.5)2 + (1.1 – 0.0)2. The dissimilarity between two derived from univariate binary logistic regression models.
clusters is defined as the mean of all pairwise dissimilarities The cut points were defined as the smallest value of the
between participants. The cluster analysis is summarized variable (total score or subscore) such that the estimated
using a dendrogram, which graphically represents a series of probability of having CAS (viz., having a clinical diagnosis
nested clusters in such a way that the members of each cluster of CAS or mCAS) is greater than 50%.
are individually identified and the distance or dissimilarity Because sensitivity and specificity are both affected by
between clusters is indicated by the vertical axis. the sample base rate (i.e., the likelihood that affected
To supplement the cluster approach, we used the gap individuals will be included in the sample), we also calculated
statistic to assess the number of clusters supported by the two other measures of accuracy that are less likely to be
data (Tibshirani, Walther, & Hastie, 2001). The gap statistic affected by sample base rates (Dollaghan, 2007). This is
is based on summing across clusters the within-cluster sum of important in this study because there is almost certainly a
squared errors (SSE). Additional clusters will always reduce referral bias at the Mayo Clinic, a tertiary medical center to
the total SSE, but the gap statistic can be used to determine which many families come for a second opinion about
the point at which increasing the number of clusters no possible CAS. The additional accuracy measures were
longer reduces the total SSE beyond what might be expected positive and negative likelihood ratios: LR+ and LR–,
by chance. Thus, the gap statistic is not a direct hypothesis respectively. LR+ indicates how much more likely a positive
test of the number of clusters in a data set but instead helps test result is for someone with the disorder than for someone
examine the efficiency (parsimoniousness) of a particular without the disorder. LR–, on the other hand, indicates how
solution. much more likely a negative test result is for someone with
To evaluate the clustering in the context of other the disorder than for someone without the disorder. For
measurements, we compared scores across the clusters for the diagnostic tests, Dollaghan (2007) recommended LR+
following variables: gender, age, GFTA, PPVT, receptive exceeding 10 and LR– being less than 0.10.
language standard score (RLS), and expressive language As a final method for examining the discrimination
standard score (ELS). Gender was evaluated with a x2 test, ability of the DEMSS, we used a nonparametric estimate of
whereas the others were evaluated with a nonparametric the area under the receiver operating characteristic
Kruskal–Wallis test due to skewness in some measures. (AUROC) curve and 95% bootstrap confidence intervals to
quantify discrimination. The AUROC can be thought of as
Discrimination Methods the average sensitivity of a measurement as well as an
estimate of the probability of correctly classifying two
To minimize the potential confound of diagnosis and children, one with CAS and one without CAS, based only on
DEMSS administration having been completed by the same the DEMSS. We compared the discriminative ability of the
individuals, we structured the study so that the clinician’s DEMSS total score and subscores by using the method
diagnostic activities and calculation of DEMSS scores were described by DeLong, DeLong, and Clarke-Pearson (1988).
separate tasks. Weeks after all of the children had completed
the protocol and after cluster data analyses were completed,
the experimenters went back to the children’s medical Results
records. At that time, we categorized children on the basis of
whether their medical record indicated a diagnosis of (a) Reliability
CAS (characteristics of apraxia of speech as the major Results for the reliability analyses are reported in order
contributor to their SSD), (b) mCAS (phonologic impair- for test–retest reliability, intrajudge reliability, and interjudge
ment and/or residual articulation errors, with a contribution reliability. For each type of reliability, we report the mean
of at least mild difficulty with speech praxis), or (c) any other percentage of agreement for the 171 judgments as well as the
SSD. We then compared the diagnostic categorization with ICC. Figure 1 illustrates the observed agreement for the
the results of the DEMSS cluster analysis. three reliability conditions.
We used several methods to evaluate the ability of the Test–retest reliability. For test–retest reliability, the
DEMSS total score and the four subscores to discriminate mean percentage of agreements was 89%, with a range from
between participants with CAS (i.e., those with CAS or 77% to 99%. The data for these calculations were obtained
mCAS diagnoses) and those without CAS. Unlike the cluster from the 11 participants for whom test–retest reliability data
analysis described above, these methods explicitly used the were obtained and consisted of mean percentage of agree-
participants’ clinically defined CAS status, which was taken ment across the 171 judgments per participant at Time 1 and
as the reference standard in this study. When a test is used Time 2. The test–retest column of Table 4 shows the ICCs
for diagnosis, sensitivity refers to the extent to which the test for the total DEMSS score and for the subscores. These
correctly identifies those with the disorder as having the ranged from 0.82 for the consistency subscore to > 0.99 for
disorder, whereas specificity refers to the extent to which it the total DEMSS score.
correctly identifies those without the disorder as not having Intrajudge reliability. The mean intrajudge agreement
the disorder (Dollaghan, 2007). The cut point of a test or of the 171 judgments was 89%, with a range from 81% to
Validity
The cluster analysis using four DEMSS subscores
(overall articulatory accuracy of the word, vowel accuracy,
prosody, and consistency) is summarized in the dendrogram
in Figure 2. Although the clustering algorithm does not take
diagnosis into account, children who had been diagnosed
with CAS or mCAS in their initial clinical evaluation appear
in the figure as boxes and triangles, respectively, to identify
to what degree children with this diagnosis are clustered
together. The vertical axis of the dendrogram represents the
dissimilarity between clusters. A horizontal line drawn at any
point on the dendrogram will divide the participants into
clusters, that is, groups of participants whose profiles on the
four subscores (rather than the DEMSS total scores) are
similar. Key clusters are labeled to guide the reader.
This dendrogram illustrates that there appear to be
three major clusters of children. Cluster A consists of 15
children: Three who had been diagnosed with CAS, five who
had been diagnosed with mild CAS, and seven who had been
diagnosed with some other SSD. Cluster B is composed of
seven participants, who had been diagnosed with CAS. As
might be expected, there is considerable variability within the
cluster (as indicated by the horizontal distances among
individuals), even though the algorithm clearly distinguishes
this group from the rest of the children. Cluster C includes all
other participants, including two children who had been
clinically diagnosed with CAS and three children who had
been diagnosed with at least mild difficulty with praxis for
speech movement (mCAS).
Although not designed for use in the framework of
hypothesis testing, the gap statistic provides a heuristic for
determining the number of clusters in a data set. The gap
statistics for the first five clusters were, in order, 0.42, 0.99,
1.02, 0.88, and 0.82. This sequence shows strong evidence
512 Journal of Speech, Language, and Hearing Research N Vol. 56 N 505–520 N April 2013
Table 4. Intraclass correlation coefficient estimate (95% confidence interval [CI]) for three components of the
reliability analysis.
Reliability
Variable Test–retest Intrajudge Interjudge
Total DEMSS score 1.00 [.99, 1.00] .92 [.77, .98] .98 [.95, .99]
Vowel accuracy subscore .99 [.98, 1.00] .89 [.67, .97] .94 [.87, .98]
Overall articulatory accuracy subscore .99 [.98, 1.00] .95 [.84, .98] .98 [.96, .99]
Prosodic accuracy subscore .99 [.96, 1.00] .31 [–.27, .73] .91 [.78, .96]
Consistency subscore .82 [.48, .95] .38 [–.20, .77] .73 [.44, .88]
against the single-cluster solution in that there is a marked gender (p = .81) or age (p = .32). Presenting data in the order
increase in the gap statistic for the two-cluster solution, Cluster A, B, and C, respectively, we did observe significant
corresponding to an appreciable reduction in the within- differences in scores on the GFTA (Mdn = 65, 60, and 78
cluster sum of squares. The gap statistic shows the three- for Clusters A, B, and C, respectively; p = .03), PPVT
cluster solution to be optimal. Given that there is only a (Mdn = 86, 96, and 102 for Clusters A, B, and C,
marginal increase from two to three clusters, the two-cluster respectively; p < .01), RLS (Mdn = 86, 88, and 95 for
solution is the more parsimonious of the two. In this sense, Clusters A, B, and C, respectively; p < .01), and ELS (Mdn =
we have found the gap statistic to be generally consistent 81, 77, and 89 for Clusters A, B, and C, respectively; p = .03).
with the above interpretation of the dendrogram. Table 5 provides measures of how well the DEMSS
Using Clusters A, B, and C identified in the total score and subscores discriminate among children with
dendrogram, we found no differences across the clusters in and without CAS, regardless of severity. AUROC values
Figure 2. Dendrogram indicating the sequence of cluster merges beginning with each participant in his or her own cluster at bottom and ending
with all participants merged into a single cluster at top. The vertical axis represents the dissimilarity between merging clusters. A horizontal line
drawn at any point will divide the participants into clusters. We have indicated the point at which there are three clusters and have labeled them
A, B, and C. Although clinical status was not used in the cluster analysis, we have indicated participants with CAS and mCAS, and we have
identified them by their study identifier.
Note. AUROC = area under the receiver operating characteristic curve; LR+ = likelihood ratio for a positive
test result; LR– = likelihood ratio for a negative test result.
a
Cutoff is defined as the minimum value such that the estimated probability of CAS from the logistic
regression model is greater than 0.5. bBased on the indicated logistic regression cutoff.
514 Journal of Speech, Language, and Hearing Research N Vol. 56 N 505–520 N April 2013
scores across children were primarily due to differences used as a substitute for the whole test in differential
among children rather than other factors. In the intrajudge diagnosis, the data also suggest that the total DEMSS is the
context only, prosody and consistency subscores were less best discriminator.
reliable. The poorer performance of these two subscores, When logistic regression modeling was used to
however, did not undermine the value of the DEMSS total determine optimal cut points on the DEMSS, measures of
score in discriminating groups of children. The smaller sensitivity and specificity as well as likelihood ratios showed
number of utterances judged for prosody (n = 21) and that the test did not overidentify CAS. However, a few
consistency (n = 28) than for either vowels (n = 56) or overall children who had been given a diagnosis of CAS (n = 2) or
articulatory accuracy subscores (n = 66) likely contributed to mCAS (n = 3) based on the comprehensive speech evaluation
the poorer ICCs. Also, the clinicians anecdotally reported were not successfully identified by the DEMSS. This is
that these judgments were among the most difficult they were probably because the DEMSS had been designed specifically
required to make, consistent with the findings of Munson, to address the significant challenges posed by children who
Byorum, and Windsor (2003), thus suggesting the need for are younger and/or exhibit more severe speech impairments.
providing ample practice on judgments of stress. We plan to Thus, some children who were less severely affected probably
provide numerous examples of judgments of stress when we performed well on the simpler stimuli incorporated in the
create a training tape for test administration. DEMSS but were given the diagnosis of CAS or mCAS
because of their performance on more phonetically complex
Validity or multisyllabic stimuli used during other testing conducted
as part of the entire evaluation. For example, participants 41
The DEMSS was designed to help clinicians identify
and 25 were not identified by the DEMSS as being CAS,
the subset of children with severe SSDs who exhibit difficulty
although the clinical examination showed significant diffi-
in motor skill, particularly motor programming difficulty.
culty with praxis. They exhibited numerous vowel distortions
The label CAS is most frequently used for these children,
in connected speech. Direct imitation using visual attention
who seem to have problems executing accurate movement
to the clinician’s face facilitated their performance on this
gestures to reach specific spatial and temporal targets for
speech (in the absence of weakness or other neuromuscular isolated word task. Groping and voicing errors (frequently
deficits). The ability of the DEMSS to distinguish children noted in children with CAS) were also noted in more difficult
with characteristics associated with CAS versus those with- tasks but are not used in the scoring of the DEMSS. This
out, within a population of children with SSDs, represents a likely contributed to the DEMSS not identifying them. This
vital contribution to evidence of its validity. issue points to the likelihood that no one task or test is
In this study, agglomerative hierarchical cluster sufficient for differential diagnosis. This finding was not
analysis was selected as the preferred method to examine the unexpected and emphasizes the fact that the DEMSS is best
validity of the DEMSS in part because it could provide used as part of a test battery for children who are younger
evidence that the test differentiated subgroups of children and/or demonstrate more severely impaired speech acquisi-
based on their patterns of performance in motor speech skill. tion. Until an alternate version of the DEMSS is designed for
The content of the DEMSS was designed not only to include older children with more speech output, existing measures
information about children’s articulatory accuracy but also may be more appropriate for children with those skills (e.g.,
to focus on vowel accuracy, prosody, and consistency of Rvachew et al., 2005; Thoonen, Maassen, Gabreels, &
error productions on repeated attempts, which have been Schreuder, 1999; Thoonen, Maassen, Wit, Gabreels, &
posited as important behavioral markers for the label of Schreuder, 1996).
CAS. The three clusters identified using the DEMSS closely This study showed that the DEMSS could identify
resembled groups that were based solely on a clinical groups with similar profiles of motor speech characteristics
diagnosis (CAS, mCAS, and other SSD), which had been (movement accuracy, vowel accuracy, consistency, and
made based on an earlier and more comprehensive set of test prosody). We have been using the term CAS given the
observations. Although the DEMSS was part of that test literature suggesting these characteristics as being supportive
protocol, the scores were not tallied until weeks later, and of the label for this subgroup of children with speech sound
they were not used in establishing the clinical diagnosis. disorders. Although theories of CAS accounting for the
Although this represents a limitation of this study, it is pathophysiology of the disorder are in early stages of
consistent with the incomplete nature of any examination of development (e.g., ASHA, 2007; Caruso & Strand, 1999;
validity. Nijland, Maassen, & van der Meulen, 2003; Shriberg,
The variability among the seven children in one cluster Lohmeier, Strand, & Jakielski, 2012; Terband & Maassen,
(Cluster B, all of whom had been diagnosed with CAS) 2010; Terband, Maassen, Guenther, & Brumberg, 2009),
illustrates the expected heterogeneity of behaviors observed there is now fairly good agreement that these characteristics
within the diagnosis. For the children participating in this should be present to warrant the use of the diagnostic label.
study, the total DEMSS score as well as the overall Although a larger number of characteristics may also be
articulatory accuracy, vowel, and consistency subscores were present (e.g., sound omissions, substitution errors, reduced
all effective in distinguishing among groups of children phonemic and phonetic inventories), they are less likely to be
within a larger group with SSDs. Although the findings discriminative than movement accuracy, vowel accuracy, and
suggest that any of those discriminating subscores might be prosodic accuracy. There are reasons to expect these
516 Journal of Speech, Language, and Hearing Research N Vol. 56 N 505–520 N April 2013
the validity of these variables as part of a behavioral operating characteristic curves: A nonparametric approach.
phenotype for CAS. Biometrics, 44, 837–845.
Although the development of theories of CAS and Dollaghan, C. (2007). The handbook for evidence-based practice in
explication of its nature and most consistently discriminative communication disorders. Baltimore, MD: Brookes.
Downing, S. M., & Haladyna, T. M. (2006). Handbook of test
signs continue, it is important to be able to identify children
development. Mahwah, NJ: Erlbaum.
who may benefit most from treatment approaches and Duffy, J. (2005). Motor speech disorders: Substrates, differential
strategies that directly facilitate motor learning and speech diagnosis and management (2nd ed.). St. Louis, MO: Elsevier
motor control. The ASHA (2007) position statement on CAS Mosby.
was a start toward more consistent participant description so Dunn, L., & Dunn, L. (1997). Peabody Picture Vocabulary Test—III.
that researchers and consumers of research could be more Circle Pines, MN: AGS.
confident about the similarity of children studied and treated Glaspey, A., & Stoel-Gammon, C. (2007). A dynamic approach to
under the label CAS. The reliability and validity data for the phonological assessment. Advances in Speech Language
DEMSS provided in this study suggest that this test has the Pathology, 9, 286–296.
potential to further advance that clinical and research goal. Goffman, L. (2004). Assessment and classification: An integrative
model of language and motor contributions to phonological
development. In A. Kamhi & K. Pollock (Eds.), Phonological
Acknowledgments disorders in children: Clinical decision making in assessment and
intervention (pp. 51–64). Baltimore, MD: Brookes.
The first author acknowledges the support of Mayo Clinic Goldman, R., & Fristoe, M. (2000). Goldman Fristoe Test of
CTSA through National Center for Research Resources Grant Articulation—Second Edition. Circle Pines, MN: AGS.
UL1RR024150. We also thank Heather Clark and David Ridge for Guyette, T. W. (2001). Review of the Apraxia profile. In B. S. Plake
reading and commenting on earlier versions of this article. & J. C. Impara (Eds.), The fourteenth mental measurements
yearbook (pp. 57–58). Lincoln, NE: Buros Institute of Mental
Measurements.
References Hastie, T., Tibshirani, R., & Friedman, J. H. (2003). The elements of
American Speech-Language-Hearing Association. (2007). Childhood statistical learning. New York, NY: Springer-Verlag.
apraxia of speech: Nomenclature, definition, roles and Hayden, D., & Square, P. (1999). Verbal Motor Production
responsibilities, and a call for action [Position statement]. Assessment for Children. San Antonio, TX: Pearson, PsychCorp.
Rockville, MD: Author. Johnson, A. F., Shelton, R. L., & Arndt, W. B. (1982). A technique
Arndt, W. B., Shelton, R. L., Johnson, A. F., & Furr, M. L. (1977). for identifying the subgroup membership of certain
Identification and description of homogeneous subgroups within misarticulating children. Journal of Speech and Hearing
a sample of misarticulating children. Journal of Speech and Research, 25, 162–166.
Hearing Research, 20, 263–292. Kehoe, M., Stoel-Gammon, C., & Buder, E. H. (1995). Acoustic
Bain, B. A. (1994). A framework for dynamic assessment in correlates of stress in young children’s speech. Journal of Speech
phonology: Stimulability revisited. Clinics in Communication and Hearing Research, 38, 338–350.
Disorders, 4, 12–22. Kent, R. D. (2004). Models of speech motor control: Implications
Ball, L. J., Bernthal, J. E., & Beukelman, D. R. (2002). Profiling from recent developments in neurophysiological and
communication characteristics of children with developmental neurobehavioral science. In B. Maassen, R. D. Kent, H. F. M.
apraxia of speech. Journal of Medical Speech-Language Peters, P. H. M. van Lieshout, & W. Hultsijn (Eds.), Speech
Pathology, 10, 221–229. motor control in normal and disordered speech (pp. 3–28).
Campbell, T. (2003). Childhood apraxia of speech: Clinical London, England: Oxford University Press.
symptoms and speech characteristics. In L. D. Shriberg & T. F. Kent, R. D., Kent, J., & Rosenbek, J. (1987). Maximal performance
Campbell (Eds.), Proceedings of the 2002 Childhood Apraxia of tests of speech production. Journal of Speech and Hearing
Speech Symposium (pp. 37–40). Carlsbad, CA: The Hendrix Disorders, 52, 367–387.
Foundation. Lidz, C. S., & Peña, E. D. (1996). Dynamic assessment: The model,
Carrow-Woolfolk, E. (1997). Oral & Written Language Scales. Circle its relevance as a non-biased approach and its application to
Pines, MN: AGS. Latino American preschool children. Language, Speech, and
Caruso, A., & Strand, E. (1999). Motor speech disorders in children: Hearing Services in Schools, 27, 367–372.
Definitions, background and a theoretical framework. In A. McCauley, R. J. (2001). Assessment of language disorders in children.
Caruso & E.A. Strand, (Eds.), Clinical management of motor Mahwah, NJ: Erlbaum.
speech disorders in children (pp. 1–27). New York, NY: Thieme. McCauley, R. J. (2003). Review of Screening Test for
Conti-Ramsden, G., & Botting, N. (1999). Classification of children Developmental Apraxia of Speech (2nd ed.). In B. Plake, J. C.
with specific language impairment: Longitudinal considerations. Impara, & R. A. Spies (Eds.), The fifteenth mental measurements
Journal of Speech, Language, and Hearing Research, 42, yearbook (pp. 786–789). Austin, TX: Pro-Ed.
1195–1204. McCauley, R. J. & Strand, E. A. (2008). A review of standardized
Crary, M. A. (1993). Developmental motor speech disorders. San tests of nonverbal oral and speech motor performance in
Diego, CA: Singular. children. American Journal of Speech-Language Pathology, 17,
Davis, B. L., Jakielski, K. J., & Marquardt, T. P. (1998). 1–11.
Developmental apraxia of speech: Determiners of differential McLeod, S., Harrison, L. J., & McCormack, J. (2012). The
diagnosis. Clinical Linguistics & Phonetics, 12, 25–45. intelligibility in context scale: Validity and reliability of a
DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). subjective rating measure. Journal of Speech, Language, and
Comparing the areas under two or more correlated receiver Hearing Research, 55, 648–656.
518 Journal of Speech, Language, and Hearing Research N Vol. 56 N 505–520 N April 2013
Appendix
DEMSS Content Scoring
General Instructions
N Make sure the child’s face is in good view on the camera. Check the audio levels. Make sure there is little background noise.
N Draw the child’s attention to your face for every item.
N Use a moderate to slightly slow, but natural, rate, and natural prosody.
N Use reinforcers that keep their attention to your face and are quick.
N Score online—so that you will be sure to do the appropriate cuing for the third and fourth attempts. Then, if necessary, go
back and score using videotape.
N Give up to four trials.
N Use an X only if there was no attempt.
Rules for Dynamic Administration and Scoring
520 Journal of Speech, Language, and Hearing Research N Vol. 56 N 505–520 N April 2013