Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Vol 467|9 September 2010

OPINION

Time to automate identification


Taxonomists should work with specialists in pattern recognition, machine learning and artificial intelligence,
say Norman MacLeod, Mark Benfield and Phil Culverhouse — more accuracy and less drudgery will result.

A
n imaging system designed to iden- virtually every other scientific discipline. animal group about 65.5 million years ago1

LefT To righT: NaT. hisT. Mus., LoNdoN; M. BeNfieLd/guLf serPeNT ProjecT; N. caTTLiN/fLPa
tify marine zooplankton was recently Automating species identification using resulted in species lists that were so differ-
adopted by scientists working for the technologies developed by researchers in ent as to make consensus impossible. Such
US government to monitor the Deepwater pattern recognition, artificial intelligence inconsistencies shouldn’t be a surprise given
Horizon oil spill. By measuring the size of oil and machine learning would transform alpha that, in controlled visual-cognition studies,
droplets produced after chemical dispersants taxonomy from a cottage industry depend- humans frequently miss items presented in
had broken up the oil, modellers could predict ent on the expertise of a few individuals to a scene, count some objects more than once
the depths at which the plume was accumu- a testable and verifiable science accessible to and misclassify others.
lating. Only two instruments exist that can anyone needing to recognize objects. Indeed, Hopes are high among researchers and
measure oil droplets while distinguishing a concerted interdisciplinary research and funding bodies that DNA bar-coding, by
them from other matter suspended in the development effort, within the next decade, which a species is recognized according to
water column, such as zooplankton, marine could lead to automated systems capable of a marker in its mitochondrial genome, will
snow and gas bubbles at depths of down to high-throughput identifications for hundreds increase the accuracy of identifications — and
1,500 metres. The deployment, by the US or thousands of categories of living as well as ease bottlenecks resulting from a shortage of
National Oceanic and Atmospheric Admin- non-living specimens. trained and experienced taxonomists. But bar
istration, of one — the digital holographic codes are generally used to assign organisms
imaging (DHI) system, developed jointly by Human error to taxonomic categories that have already
the Massachusetts Institute of Technology Many taxonomists use sophisticated tech- been defined on the basis of morphological
in Cambridge and the Woods Hole Oceano- nologies to capture images, sounds, even the traits. In other words, a bar code isn’t useful
graphic Institution in Massachusetts — is a smells and tastes of biological specimens. until the reference species has already been
working example of something that should But most routine identifica- identified by multiple experts.
be happening on a grand scale: the shared use tions involve a small group
across diverse disciplines of generalized auto- of experts scattered around
“Humans miss items, The technique is still relatively
expensive, slow and difficult
mated identification technologies. the world assessing diag- count some objects to implement in the field
Taxonomists who identify, describe and nostic data qualitatively — more than once and except in certain situations
name species (who practise alpha taxonomy, commonly the size, shape or — for example in laboratories
as it is known in the trade) are central to many texture of specimens, or the
misclassify others.” on oceanographic research
research programmes in applied biology, presence or absence of cer- vessels. Moreover, research-
ecology and conservation. University cuts tain features. Surprisingly few blind-test stud- ers frequently need to identify non-living
are shrinking this already small community. ies have been published to assess the accuracy objects as well as living ones. Ecologists study-
What’s more, there is no tradition of — much of taxonomists’ findings objectively1–7. Those ing plankton, for example, commonly count
less a requirement for — independent testing that have been carried out are worrying. For ‘fibres’, ‘detritus’ or ‘egg-like particles’ that may
and verification of the accuracy of the identi- instance, a blind test to resolve a controversy or may not be alive.
fications that taxonomists produce, unlike in about the pattern of extinction in one marine In focusing on bar-coding, stakeholders have
154
© 2010 Macmillan Publishers Limited. All rights reserved
NATURE|Vol 467|9 September 2010 OPINION

overlooked the greater promise of machine- of which may not be detectable by humans — within a standard set of complex but general-
learning to transform taxonomy and the into identification programs. For example, bat ized scenes, such as in photographs of a coral
identification of natural objects in general. echolocation calls that are outside the range of reef, would attract public interest and encour-
the human auditory system. Finally, software age diverse groups of scientists to explore the
DAISY, DAISY, give me your answer, do designers are improving the user interfaces of technologies available. This would be similar to
Computer systems now exist for classifying classification programs. the ‘visual object classes’ challenge recently set
objects into between 2 and 30 categories. These Currently, grant applications for such up by the EU-funded Network of Excellence
systems already deliver faster, more accurate interdisciplinary projects are falling between on Pattern Analysis, Statistical Modelling and
and more consistent semi- or fully automated the boundaries defined by funding bodies in Computational Learning (PASCAL), which
identifications than any human taxonomist. engineering and the life sciences. Funding promotes the development of computer sys-
For instance, a group of entomologists at the specifically for collaborations on automated tems that recognize types of common objects.
Natural History Museum in London have species identification should be supplied by This investment would pay huge dividends
used the Digital Automated Identification the European Union’s framework programmes across a range of disciplines. In the past 50 years,
System (DAISY) to identify with 100% accu- (Europe’s main instrument for funding academic centres worldwide have cut back or
racy 15 species of parasitic wasp from digital research), and national research councils such discontinued many taxonomic training and
images of wings, with each identification as the US National Science Foundation and the research programmes. As a result, there are
taking less than a second8. Similarly, oceanog- UK Natural Environment Research Council. only about 4,000–6,000 professional taxono-
raphers from the University of Plymouth, UK, Charitable organizations, such as the Wellcome mists worldwide10, only a subset of whom are
have used the Dinoflagellate Categorisation by routinely engaged in species identifications.

NaT. hisT. Mus., LoNdoN


Artificial Neural Network (DiCANN) system to Meanwhile, the demand for identifying natural
identify phytoplankton species with about 72% objects has escalated. Agriculturalists and border
accuracy — the same as experts7. People who security staff, for example, increasingly need to
make a particular assessment routinely, such as identify potential pest species where and when
counting the dorsal spines on sticklebacks, can they encounter them. Even areas of research not
return accuracies of 84–95% under test condi- directly concerned with species identification
tions. However, trained personnel often need to stand to benefit — as the Deepwater Horizon
deliver one-off species identifications. In tests oil droplets example demonstrates.
based on this scenario, people make choices Far from making alpha taxonomists
consistent with their own previous selections obsolete, automated identification systems
only 67–83% of the time, and consistency across would free them from the drudgery of routine
different identifiers can be as low as 43%9. identifications. This would allow them to
In practice, most current research pro- focus on the more conceptually difficult issues
grammes will require a considerable scale up of of discovering, describing and revising species
DAISY’s and DiCANN’s capacities. Biodiversity concepts, and establishing how species func-
assessment teams, biostratigraphers guiding tion within natural systems and fit into higher
drilling operations and zooplankton ecolo- taxonomic and ecological groups. ■
gists on oceanographic research vessels all Norman MacLeod is in the Palaeontology
need tools capable of identifying hundreds or Department of the Natural History Museum,
even thousands of different species. Automated Cromwell Road, London SW7 5BD, UK. Mark
systems for this level of diversity are not cur- Benfield is in the Department of Oceanography
rently available, but neither are they the stuff of and Coastal Sciences, School of the Coast and
science fiction as many researchers imagine. Environment, Louisiana State University, Baton
As a first step, taxonomists should team up Rouge, Louisiana 70803, USA. Phil Culverhouse
with specialists working in pattern recognition, is at the Centre for Robotics and Neural Systems,
machine learning and artificial intelligence — as University of Plymouth, Plymouth PL4 8AA. UK.
well as technology engineers, software design- Automated systems such as DAISY (screenshot, e-mails: n.macleod@nhm.ac.uk; mbenfie@lsu.
ers and mathematicians. (A good example of bottom) can be more reliable than experts when it edu; pculverhouse@plymouth.ac.uk
such a team is the Scientific Committee on Oce- comes to identifying specimens.
anic Research’s Working Group 130; www.scor- 1. colquhoun, W. P. Ergonomics 2, 367–372 (1959).
2. Zachariasse, W. j. et al. Utrecht Micropaleontological
wg130.net.) Artificial-intelligence researchers Trust in London, and the Alfred P. Sloan Foun- Bulletins 17, 1–265 (1978).
are using information from experimental psy- dation in New York, should follow suit. 3. simpson, r., culverhouse, P. f., ellis, r. & Williams, r. IEEE
chology and computational neuroscience to We estimate that a modest level of investment Conf. Neural Networks Ocean Eng. 223–230 (ieee, 1991).
4. ginsburg, r. N. Mar. Micropaleontol. 29, 67–68 (1997).
design computer systems that recognize human in these kinds of projects — roughly US$1 mil- 5. Kelly, M. g. Water Res. 35, 2784–2788 (2001).
faces or household objects based on their visual lion to $2 million per annum over 5–10 years 6. gobalet, K. W. J. Archaeol. Sci. 28, 377–386 (2001).
properties. Engineers are developing systems to — would encourage the development of large- 7. culverhouse, P. f., Williams, r., reguera, B., herry, V. &
gonzález-gil, s. Mar. Ecol. Prog. Ser. 247, 17–25 (2003).
acquire high-resolution, in situ, ‘hyperspectral’ scale automated identification systems, at 8. gauld, i. d., o’Neill, M. a. & gaston, K. j. in Hymenoptera:
images of organisms in three dimensions, by least for some high-profile groups across the Evolution, Biodiversity and Biological Control (eds austin, a.
sensing visible, ultraviolet and infrared light. taxonomic spectrum (for protists, plants and d. & dowton, M.) 303–312 (csiro, 2000).
9. culverhouse, P. f. in Automated Taxon Identification in
Machine learning and pattern-recognition animals). In addition, a public competition, in Systematics: Theory, Approaches and Applications (ed.
scientists are developing algorithms that incor- which researchers are asked to develop an auto- MacLeod, N.) 25 (crc Press, 2007).
porate various characteristics of objects — many mated system capable of identifying species 10. The global Taxonomy initiative; go.nature.com/fha3T2

155
© 2010 Macmillan Publishers Limited. All rights reserved

You might also like