Professional Documents
Culture Documents
A Validated Facial Grading Scale The Future of Facial Ageing Measurement Tools
A Validated Facial Grading Scale The Future of Facial Ageing Measurement Tools
To cite this article: Alastair Carruthers & Jean Carruthers (2010) A validated facial grading scale:
The future of facial ageing measurement tools?, Journal of Cosmetic and Laser Therapy, 12:5,
235-241, DOI: 10.3109/14764172.2010.514920
REVIEW ARTICLE
Abstract
Validated, standardized scales for measurement of ageing and response to cosmetic procedures are lacking. Numerous scales
have been published and, while they allow intra-study comparisons, almost all remain non-validated, and their heterogene-
ity makes comparisons across studies impossible. A set of validated, objective, quantitative scales has been developed in
association with experts from different specialties within aesthetic medicine. These scales allow evaluation of the key signs
of ageing that cause individuals to seek cosmetic procedures. Each scale is a five-point photonumeric scale based on
computer-simulated photographs incorporating each aspect to be evaluated in a stepwise manner. Validation studies show
that these scales have good intra- and inter-rater reliability, with intra-class correlation coefficients generally in the range
0.85–0.95, and similar test–retest correlations.
Key Words: aesthetic medicine, facial wrinkles, rating scales, skin ageing, validation
Introduction
the effects of treatments designed to improve the signs
In recent years, the use of botulinum neurotoxins of ageing, and can also be used as part of the regula-
(BoNTs) and facial fillers to improve the appearance tory approval process for new products.
of facial wrinkles has become common. Indeed, since Validated scales are also important for assessing
1997, the number of cosmetic procedures performed patient-reported outcomes, such as quality of life and
in the USA has increased by 446%, and data from satisfaction with treatment. These are more subjective
2005 showed that 66% of all cosmetic procedures per- measures but are important for assessing the success of
formed were non-surgical (1). In general, patients cosmetic procedures. However, objective tools, which
present for such treatment, and return for retreatment, are more reliable and not influenced by the subjective
because they notice signs of ageing, which includes assessment of the patient, have to be considered supe-
both intrinsic ageing and photo-ageing. The increased rior. Importantly, recent systematic literature reviews
interest in non-surgical cosmetic procedures has have demonstrated the heterogeneous nature of meth-
necessitated the development of scales to measure the ods used to rate patient satisfaction, and highlight the
degree of ageing and the severity of facial wrinkles. need for validated reliable measures for reporting it
Scales are also necessary to evaluate the level of (2,3). For example, a systematic review by Kosowski et
improvement resulting from cosmetic procedures. In al. identified 47 patient-reported outcome measures
the past, the scales used by clinicians have often been assessing facial appearance after a cosmetic procedure,
non-validated, and a wide range of such scales exist. from 442 articles (3). None of the measures identified
Additional scales were developed by pharmaceutical satisfied all relevant international guidelines for the
companies during the regulatory approval processes development and validation of health outcomes ques-
for cosmetic products, but these scales are proprietary tionnaires (3). Similarly, Fagien and Carruthers identi-
and not available for general use. Thus, there is a need fied 17 separate outcome measures in 23 studies of
for properly validated scales that are freely available to treatment with BoNT (2). Most of these were based on
clinicians. Today, such a set of scales exists, which can Likert-type scales, ranging from three-point to 11-point
be used by all to measure the ageing process itself and scales, and many focused on the glabellar area.
Correspondence: Alastair Carruthers, Suite 820 - 943 West Broadway, Vancouver, BC, Canada V5Z 4E1. Fax: 1 604 714 0223. E-mail: alastair@carruthers.net
validation process, and many are subjective in their inter-rater variability analysis was performed for
assessment of wrinkle severity (2). In addition, anal- each scale.
ysis using some instruments requires equipment The Brow Positioning Grading Scale was devel-
that may not be widely available. For example, some oped to provide objective quantification of the sever-
instruments rely on skin replicas using silicone rub- ity of eyebrow malposition (19). The scale ranges
ber or similar materials (12,13), while others use from 0 (youthful, refreshed look and high-arch eye-
optical systems to evaluate the three-dimensional brow) to 4 (flat eyebrow with barely any arch, marked
topography and morphology of the skin (14,15). visibility of folds and very tired appearance) (Figure 1).
Importantly, there is currently no standard approach Intra-class correlation coefficients (ICCs) for the two
to the assessment of wrinkles, making direct com- evaluations of the scale were 0.697 and 0.660, sug-
parisons between studies impossible (3). gesting an acceptable level of agreement between the
experts. Test–retest correlation coefficients ranged
from 0.678 to 0.912.
A set of new, validated facial grading scales
The Forehead Lines Grading Scale was developed
Clearly, there is a need for a validated, objective, to objectively quantify resting (static) and hyperki-
quantitative rating scale for evaluating the aesthetic netic (dynamic) forehead lines, and ranges from 0 (no
signs of ageing. Furthermore, the ideal scale would wrinkles) to 4 (deeper wrinkles at rest and deeper
allow the response of wrinkles to cosmetic treatment furrows with facial expression) (Figure 2) (20). ICCs
to be monitored. Recently, a set of validated grading were calculated for the first and second ratings,
scales have been published for brow positioning (19), respectively, for static (0.846 and 0.863) and dynamic
forehead lines (20), melomental folds (marionette (0.852 and 0.892) forehead lines, with a high level
lines) (21) and crow’s feet (22). The scales are designed of agreement between the experts. The test–retest
to be used in everyday clinical practice, and could correlation coefficients (static forehead lines, 0.846–
also be used in clinical trials to assess the outcomes 0.942; dynamic forehead lines, 0.859–0.941) were
of treatment with BoNT or facial fillers. also high for each expert.
Each scale was developed as a five-point scale The Marionette Lines Grading Scale ranges from
using computer-simulated photography. Specific 0 (no visible fold, continuous skin line) to 4 (extremely
anatomical changes resulting from ageing were iden- long and deep folds detrimental to facial appearance)
tified in consultation with a clinician, and were incor- (Figure 3) (21). As with the Forehead Lines Grading
porated into photographs to create five representative Scale, agreement between the experts was high, with
images, with the aspect under consideration showing ICCs of 0.873 and 0.891 after the first and second
a stepwise variation (Figures 1–4). When ‘lines’ were grading, respectively. Intra-rater test–retest correla-
formed as a result of movement of different muscle tions were also high (0.845–0.966).
groups (e.g. crow’s feet and forehead lines), static The Crow’s Feet Grading Scale was developed to
and dynamic pictures were included. Approximately quantify the severity of lateral canthal lines at rest
50 images (per scale validation set) were selected (static) and at maximum contracture of the orbicu-
from a database of photographs from 100 individu- laris oculi muscle (dynamic) (22). The scale grades
als, based on quality and equal distribution across wrinkles from 0 (none) to 4 (severe) (Figure 4).
each representative scale. Using a standardized com- Based on the ICCs, there was considerable agree-
puter randomization program, 35 images per target ment between the experts. For the static scale, the
area or validation set were randomly selected from ICCs were 0.893 and 0.882 for the first and second
the 50 for final inclusion in the pool. ratings, respectively, while those for the dynamic
The scales were assessed and validated at an scale were 0.879 and 0.892. Intra-rater test–retest
international meeting of physicians involved in aes- correlation coefficients were 0.904–0.968 for the
thetic medicine, representing a number of special- static scale and 0.888–0.951 for the dynamic scale.
ties, including dermatology, ophthalmology, plastic
surgery and dermatologic surgery (1). As part of the
Advantages of this approach
development process, the expert panel, all of whom
were formally trained in using the scales, discussed This approach to creating a series of grading scales
each scale and graded them twice using an over- to evaluate various aesthetic parameters has numer-
night break between the two ratings. Intra- and ous advantages. Importantly, the scales were validated
Figure 1. Reference images for the five-point Brow Positioning Grading Scale (19). (Reproduced with permission from Dermatologic
Surgery, Wiley-Blackwell, and Merz Pharmaceuticals GmbH.)
238 A. Carruthers & J. Carruthers
Figure 2. Reference images for the five-point Forehead Lines Grading Scale: (A) static grading scale; (B) dynamic grading scale (20).
(Reproduced with permission from Dermatologic Surgery, Wiley-Blackwell, and Merz Pharmaceuticals GmbH.)
by a team of international physicians representing a dynamic pictures were included (1). This is especially
cross-section of age, experience and sex. As noted important for assessing the efficacy of botulinum tox-
above, all were experts practicing aesthetic medicine, ins, which affect muscle activity, while validation of
trained in different disciplines. This was essential to scales when the muscles are at rest is more important
bring in different perspectives and expertise in assess- for facial fillers.
ing beauty during the ageing process (1). Each instru- During the validation process, each of the nine
ment was designed as a five-point scale that, unlike participants graded each scale twice, with an over-
the common four-point scales used in this area, night break between the two ratings. This allowed
includes a mid-point. The grading of a continuous intra- and inter-rater variability to be evaluated,
process such as ageing is facilitated if there are clearly with the former depicted visually as bubble graphs
identified centre and end points to the scale (1). (Figure 5). These provide an informative method of
Development of the scales was based on computer- comparing grading between the different raters (1).
simulated photography using similar methodology for For forehead lines, test–retest reliability coefficients
each instrument. For each scale, anatomical features were high, indicating sufficiently high stability of
specific to the aspect being evaluated were incorpo- the ratings after an overnight interval (20). For the
rated into the photographs in a stepwise manner. dynamic rating of forehead maximum lift, scores of
Importantly, for those aspects where lines are formed 3 (fine wrinkles present at rest and deeper lines with
as a result of movement of different muscle groups facial expression) and 4 (deeper wrinkles at rest and
(such as crow’s feet and forehead lines), static and deeper furrows with facial expression) showed a
Figure 3. Reference images for the five-point Marionette Lines Grading Scale (21). (Reproduced with permission from Dermatologic
Surgery, Wiley-Blackwell, and Merz Pharmaceuticals GmbH.)
A validated facial grading scale 239
Figure 4. Reference images for the five-point Crow’s Feet Grading Scale: (A) static grading scale; (B) dynamic grading scale (22).
(Reproduced with permission from Dermatologic Surgery, Wiley-Blackwell, and Merz Pharmaceuticals GmbH.)
large cluster and therefore a good result in the dynamic forehead lines suggests that scoring of
correlation coefficients (Figure 5). For marionette both components is essential to provide an accu-
lines (21) and crow’s feet (22), test–retest reliability rate assessment.
coefficients were also high, again indicating good
stability of the ratings after the overnight break.
Regarding inter-rater variability, ICCs were found Important considerations
to be high for forehead lines, crow’s feet and
The assessment of lines and wrinkles is performed
marionette lines, and moderately high for brow
from photographs, and it is therefore critical that
positioning (19–22).
the before and after pictures are standardized care-
fully. Quan et al. have documented a simple method
Possible limitations by which facial lesions and other features can be
documented accurately using only a single land-
Computer-processed, morphed images, although
mark, based on a ‘clock-face’ technique (23). When
standardized to a site-specific area, do not translate
combined with a method such as the Frankfort
clinically to the multiple physical changes that occur
horizontal (a line joining the upper external audi-
in an ageing face (20–22). The photographic evalu-
tory meatus and the infraorbital rim), simple and
ation and live patient evaluation are distinct, and thus
accurate positioning of facial features can be
evaluations using the scale in the context of a clinical
achieved (24).
trial should be performed on standardized photo-
graphs rather than (live) physical examination.
Using the Brow Positioning Grading Scale,
Advancing assessments in aesthetics
test–retest reliability was moderate, indicating
rather weak stability of the ratings after an over- The four validated scales described herein have
night interval (19). The ICCs for brow positioning been presented at a number of international meet-
were also of a moderate level, and it may be that ings, where they have been well received by clini-
the number of possible variations in eyebrow shape, cians working in aesthetic medicine. Indeed, the
arch and upper eyelid, together with sex differ- scales are now in clinical use in the USA. When
ences, explain why the consensus among the eval- presented in Europe there has been great interest
uators was not higher during the grading process. and a desire for further information, and it is hoped
The wide range of differences in size of the brow that European clinicians will start to use these
(artificially thinned in females and bushy/broad scales in their daily practice. Looking to the future,
in males), placement (determined by genetic and it seems likely that aesthetic doctors will see the
racial differences) and shape (arched or straight), introduction of further scales to measure the impact
makes it harder to set standardized grading. For of their interventions, starting with the face/neck
the Forehead Line Grading Scale (20), compari- and then broadening to include other areas of the
son of the distribution of results for static and body. These new scales will, of course, present new
240 A. Carruthers & J. Carruthers
VIEW = Forehead Max Lift EXPERT_NO = 1 VIEW = Forehead Max Lift EXPERT_NO = 2 VIEW = Forehead Max Lift EXPERT_NO = 3
4 4 4
3 3 3
Rating time 1
Rating time 1
Rating time 1
2 2 2
1 1 1
0 0 0
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
Rating time 2 Rating time 2 Rating time 2
VIEW = Forehead Max Lift EXPERT_NO = 4 VIEW = Forehead Max Lift EXPERT_NO = 5 VIEW = Forehead Max Lift EXPERT_NO = 6
4 4 4
3 3 3
Rating time 1
Rating time 1
Rating time 1
2 2 2
1 1 1
0 0 0
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
Rating time 2 Rating time 2 Rating time 2
VIEW = Forehead Max Lift EXPERT_NO = 7 VIEW = Forehead Max Lift EXPERT_NO = 8 VIEW = Forehead Max Lift EXPERT_NO = 9
4 4 4
3 3 3
Rating time 1
Rating time 1
Rating time 1
2 2 2
1 1 1
0 0 0
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
Rating time 2 Rating time 2 Rating time 2
Figure 5. Example of bubble graphs comparing raters of the Forehead Lines Grading Scale (dynamic rating) (20). The size of each ‘bubble’
corresponds to the frequency of each pair of scores on the initial grading and subsequent re-grading. (Reproduced with permission from
Dermatologic Surgery, Wiley-Blackwell, and Merz Pharmaceuticals GmbH.)
challenges, but they will also help to refine aesthetic soon, where we look forward to sharing them more
practice and drive the search for new solutions. widely with colleagues.
Conclusion Acknowledgements
Four validated scales for grading of facial lines and Editorial support for the preparation of this manu-
wrinkles have been developed. These five-point pho- script was provided by Ogilvy 4D; funding was pro-
tonumeric scales span the severity of wrinkles in each vided by Merz Pharmaceuticals GmbH.
position for which patients commonly seek correc-
tion. Each scale is well stratified for consistent rating
and, together, they represent an advance in aesthetic Conflicts of interest: The authors are consultants
procedures. The scales are currently being used with and investigators to Allergan Medical, Bioform
success in the USA and are likely to come to Europe Medical and Merz Pharmaceuticals GmbH.
A validated facial grading scale 241
References 13. Hatzis J. The wrinkle and its measurement – A skin surface
profilometric method. Micron. 2004;35:201–19.
1. Carruthers A, Carruthers J. ‘Scale Summit’. Dermatol Surg. 14. Grove GL, Grove MJ, Leyden JJ. Optical profilometry: An
2008;34(suppl 2):S149. objective method for quantification of facial wrinkles. J Am
2. Fagien S, Carruthers JD. A comprehensive review of patient- Acad Dermatol. 1989;21:631–7.
reported satisfaction with botulinum toxin type A for aes- 15. Akazaki S, Nakagawa H, Kazama H, Osanai O, Kawai M,
thetic procedures. Plast Reconstr Surg. 2008;122:1915–25. Takema T, et al. Age-related changes in skin wrinkles assessed
3. Kosowski TR, McCarthy C, Reavey PL, Scott AM, Wilkins
by a novel three-dimensional morphometric analysis. Br J
EG, Cano SJ, et al. A systematic review of patient-reported
Dermatol. 2002;147:689–95.
outcome measures after facial cosmetic surgery and/or non-
16. Jacobi U, Chen M, Frankowski G, Sinkgraven R, Hund M,
surgical facial rejuvenation. Plast Reconstr Surg. 2009;123:
Rxanv B, et al. In vivo determination of skin surface topography
1819–27.
using an optical 3D device. Skin Res Technol. 2004;10:
4. Goodman G. Botulinum toxin for the correction of hyperki-
207–14.
netic facial lines. Australas J Dermatol. 1998;39:158–63.
17. Kane MA. Classification of crow’s feet patterns among Cau-
5. Honeck P, Weiss C, Sterry W, Rzany B. Reproducibility of a
casian women: The key to individualizing treatment. Plast
four-point clinical severity score for glabellar frown lines. Br
Reconstr Surg. 2003;112(5 suppl):33S–9S.
J Dermatol. 2003;149:306–10.
18. Alexiades-Armenakas M. A quantitative and comprehensive
6. Fleiss JL. Measuring nominal scale agreement among many
raters. Psychol Bull. 1971;76:378–82. grading scale for rhytides, laxity, and photoageing. J Drugs
7. Cohen J. A coefficient of agreement for nominal scales. Educ Dermatol. 2006;5:808–9.
Psychol Meas. 1960;20:37–46. 19. Carruthers A, Carruthers J, Hardas B, Kaur M, Goertelmeyer
8. Cicchetti DV, Allison T. A new procedure for assessing reli- R, Jones D, et al. A validated brow positioning grading scale.
ability of scoring EEG sleep recordings. Am J EEG Technol. Dermatol Surg. 2008;34(suppl 2):S150–S4.
1971;11:101–9. 20. Carruthers A, Carruthers J, Hardas B, Kaur M, Goertelmeyer
9. Hund T, Ascher B, Rzany B. Reproducibility of two four-point R, Jones D, et al. A validated grading scale for forehead lines.
clinical severity scores for lateral canthal lines (crow’s feet). Dermatol Surg. 2008;34(suppl 2):S155–S60.
Dermatol Surg. 2006;32:1256–60. 21. Carruthers A, Carruthers J, Hardas B, Kaur M, Goertelmeyer
10. Narins RS, Brandt F, Leyden J, Lorenc ZP, Rubin M, Smith R, Jones D, et al. A validated grading scale for marionette
S. A randomized, double-blind, multicenter comparison of lines. Dermatol Surg. 2008;34(suppl 2):S167–S72.
the efficacy and tolerability of Restylane versus Zyplast 22. Carruthers A, Carruthers J, Hardas B, Kaur M, Goertelmeyer
for the correction of nasolabial folds. Dermatol Surg. R, Jones D, et al. A validated grading scale for crow’s feet.
2003;29:588–95. Dermatol Surg. 2008;34(suppl 2):S173–S8.
11. Day D, Littler C, Swift R, Gottlieb SL. The wrinkle severity 23. Quan LT, Nikko A, Orengo I. Surgical pearl: Accurate docu-
rating scale: A validation study. Am J Clin Dermatol. mentation of facial lesions using only one landmark. J Am
2004;5:49–52. Acad Dermatol. 2001;44:1043–4.
12. Lemperle G, Holmes RE, Cohen SR, Lemperle SM. A 24. Carruthers A, Carruthers J, Flynn T. Surgical pearl: Accurate
classification of facial wrinkles. Plast Reconstr Surg. 2001; documentation of facial lesions using only one landmark.
108:1735–50. J Am Acad Dermatol. 2003;49:359–60.