Professional Documents
Culture Documents
Use of Machine Learning To Investigate Illegal Wildlife Trade On Social Media
Use of Machine Learning To Investigate Illegal Wildlife Trade On Social Media
Article Impact Statement: Machine learning can be used to monitor and assess the extent of
Unsustainable harvesting is one of the major threats driving the global extinction crisis
(Maxwell et al. 2016). Among those groups whose threat status has been comprehensively
assessed by the International Union for Conservation of Nature (IUCN) (IUCN 2016),
unsustainable harvesting for commercial trade, subsistence, or recreation is now the most
prevalent threat impacting threatened marine species, and the second most prevalent (after
agriculture and aquaculture) for terrestrial and freshwater species (Maxwell et al. 2016).
This article has been accepted for publication and undergone full peer review but has not been
through the copyediting, typesetting, pagination and proofreading process, which may lead to
differences between this version and the Version of Record. Please cite this article as doi:
10.1111/cobi.13104.
(Dalberg Global Development Advisors 2012). Wildlife trade escalates into a crisis when an
On land, illegal wildlife trade is threatening the persistence of high-profile species, such
as rhinoceroses (Di Minin et al. 2015a), as well as many lesser-known species (Rosen &
Smith 2010; Phelps & Webb 2015). Animals and plants are traded live as pets and collectors'
items, or dead for medicine, ornaments, meat, and trophies (UNODC 2016). Demand for
seafood is also increasing, with illegal fishing threatening the persistence of many species
(FAO 2016). Illegal wildlife trade is considered to be among the largest illegitimate
businesses after illegal narcotics, involving criminal organizations and terrorist groups
borders, corruption and weak regulations and enforcement, illegal wildlife trade continues to
In recent years, the scale and nature of illegal wildlife trade has changed dramatically.
solutions, vast outreach and anonymity for illegal wildlife traders (Hastie & McCrea-Steele
2014). While law enforcement actions have partially been successful in controlling illegal
wildlife trade on major e-commerce platforms, the trade appears to have moved to alternative
platforms, such as the ‘dark web’ (Harrison et al. 2016) and social media (Yu & Jia 2015).
Recent evidence suggests that illegal wildlife trade over the ‘dark web’ occurs in small
quantities (Roberts & Hernandez-Castro 2017). This might be partly because the ‘dark web’
lacks in popularity, and accessing the platform and locating illegal wildlife products requires
venue for illegal wildlife trade (Hastie & McCrea-Steele 2014; Yu & Jia 2015). Wildlife
dealers active on social media release photos and information about wildlife products to
attract and interact with potential customers, while also informing their existing network of
contacts about available products. Currently, the lack of tools for efficient monitoring of
high-volume social media data limits the capability of law enforcement agencies to curb
illegal wildlife trade. In fact, identification of species and/or wildlife products traded on
social media is often manual (Hinsley et al. 2016; Eid & Handal 2017) and time-consuming,
information extraction is therefore a crucial step in preventing illegal wildlife trade on social
media.
algorithms that learn from data without human guidance. In recent years, growing volume of
data and computational power has led to considerable advances in machine learning.
Particularly, the so-called deep learning algorithms have provided state-of-the-art results for
tasks in computer vision and natural language processing (LeCun et al. 2015). These tasks
include classifying image contents, locating objects and their outlines in images, or inferring
the meaning of a text. Applying these techniques to high-volumes social media data allows
investigating human behaviour at an unprecedented scale (Ruths & Pfeffer 2014). Despite
their potential, approaches combining new techniques and data sources are still rarely used in
Using machine learning to monitor and assess the extent of illegal wildlife trade on
framework in which machine learning is used to investigate illegal wildlife trade on social
allow accessing user-generated content via an application programming interface (API). Such
large-scale aggregate databases of social media activity include posts with images, videos,
and text, as well as information on networks of users (Fig. 1a). This information often
contains metadata for geographical location and a timestamp indicating when the content was
uploaded to the service. However, such ‘big’ data sources require filtering out information
irrelevant to illegal wildlife trade (e.g. ‘pangolin armoured vehicle’ as opposed to pangolin
taxa in Fig. 1a). Without automating the process, filtering high-volume content for relevant
content pertaining to illegal wildlife trade automatically. Neural networks, for instance, which
excel in recognising and classifying the content of photographic images (Krizhevsky et al.
2012), may be trained to detect which species or wildlife products appear in an image, while
also classifying their setting (a natural habitat as opposed to a marketplace) (Fig. 1b). When
processing video (Karpathy et al. 2014), neural networks can be trained to look for additional
cues in audio (e.g. bird-specific calls) in addition to the stream of images (Liao et al. 2013).
Natural language processing (Goldberg 2015), in turn, can be used to (i) infer the meaning of
a verbal description (whether an animal or plant is for sale or observed in nature); (ii) detect
locations and species mentioned on social media; and (iii) classify the sentiment of social
media users towards illegal wildlife trade. Most importantly, neural networks can process
However, in order to learn to associate inputs and outputs, such as images and their
respective labels, neural networks require human-verified training data. When provided with
example, used neural networks to identify species from camera trap images with 92%
accuracy (Norouzzadeh et al. 2017). Openly available datasets, such as ImageNet, which
includes 14 million images representing 22,000 classes, can provide initial training data for
many species (Jia Deng et al. 2009). However, more specific training data is needed to
identify wildlife products (e.g. pangolin scales or rhinoceros horn). Furthermore, there is also
need to consider whether a wildlife product can be traded legally or not, and to account for
the source of the specimens traded (e.g. captive-bred or wild-sourced) (Hinsley et al. 2016).
initiatives involving experts will be needed to create sufficiently specific datasets. Advances
in machine learning, coupled with rich training datasets, may even allow detecting alternative
Once the original information derived from social media is filtered and datasets are
created (Fig 1c), analysing data will help understand trends and patterns of illegal wildlife
trade on social media. The location metadata can be used to analyse the spatiotemporal
dynamics of illegal trade (e.g. the type and quantity of wildlife products traded; what are the
nodes for trade routes; what types of routes exist between trade nodes and how they change
over time; etc.). Using this information in combination with other biodiversity knowledge
products, such as the IUCN Red List, can help assess whether the species or products are
traded outside of species range, or whether the species is coded as threatened in the IUCN
Red List (IUCN 2016). Furthermore, information available on user profiles and the global
connections between them can help identify the key exporter, intermediary, and importer
countries, by using social network analysis techniques. Finally, natural language processing
will help assess which species or wildlife products are discussed on social media, and the
While the characteristics of social media data provide a great opportunity to track
illegal wildlife trade, there are still challenges and caveats (e.g. noisy or unreliable data, etc.)
associated with using social media content for research purposes (Di Minin et al. 2015b; Tsou
2017). In addition, scientists and practitioners have the ethical responsibility to minimize
potential harm to people who share illegal wildlife trade content on social media platforms
(Zook et al. 2017). Another problem is that a wealth of relevant data on illegal wildlife trade
is currently not open to research via APIs. For this reason, manual observation, filtering and
trade, remains important (Hinsley et al. 2016; Eid & Handal 2017). However, human effort,
which is currently used to manually process large amounts of data, should be rather directed
to help train models that can be used to automatically investigate illegal wildlife trade.
The proposed methods and analyses are relevant for the implementation of the
Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES).
Expert groups mobilized via CITES and the IUCN should be used to generate datasets to help
train neural networks. Given the global reach of social media, creating partnerships between
CITES parties, social media companies and scientists working on artificial intelligence will
generate adequate resources and momentum to help stop the illegal wildlife trade on social
media.
Acknowledgments
E.D.M thanks the Academy of Finland 2016–2019, Grant 296524, for support. C.F. thanks
the University of Helsinki for support via an Early Career Grant to E.D.M. T.H. was funded
References
Conway AJ, Moilanen A. 2015a. Identification of policies for a sustainable legal trade in
Di Minin E, Tenkanen H, Toivonen T. 2015b. Prospects and challenges for social media data
Eid E, Handal R. 2017. Illegal hunting in Jordan: using social media to assess impacts on
FAO. 2016. The state of world fisheries and aquaculture. FAO, Rome.
Goldberg Y. 2015. A Primer on neural network models for natural language processing.
arXiv:1510.00726.
Harrison JR, Roberts DL, Hernandez-Castro J. 2016. Assessing the extent and nature of
Hastie J, McCrea-Steele T. 2014. Wanted - dead or alive: exposing online wildlife trade.
Hinsley A, Lee TE, Harrison JR, Roberts DL. 2016. Estimating the extent and structure of
IUCN. 2016. The IUCN Red List of Threatened Species. IUCN, Gland.
Jia Deng, Wei Dong, Socher R, Li-Jia Li, Kai Li, Li Fei-Fei. 2009. ImageNet: A large-scale
Recognition:248–255.
video classification with convolutional neural networks. Computer Vision and Pattern
Systems:1–9.
Liao H, McDermott E, Senior A. 2013. Large scale deep neural network acoustic modeling
with semi-supervised training data for YouTube video transcription. 2013 IEEE
Proceedings:368–373.
Maxwell SL, Fuller RA, Brooks TM, Watson JEM. 2016. The ravages of guns, nets and
arXiv:1703.05830:1–12.
Phelps J, Webb EL. 2015. ‘Invisible’ wildlife trades: Southeast Asia’s undocumented illegal
Ripple WJ et al. 2016. Bushmeat hunting and extinction risk to the world’s mammals. Royal
Roberts DL, Hernandez-Castro J. 2017. Bycatch and illegal wildlife trade on the dark web.
Oryx 51:393–394.
Rosen GE, Smith KF. 2010. Summarizing the evidence on the international trade in illegal
Ruths D, Pfeffer J. 2014. Social media for large studies of behavior. Science 346:1063–1064.
Tsou M. 2017. Research challenges and opportunities in mapping social media and Big Data.
UNODC. 2016. World Wildlife Crime Report: Trafficking in protected species. UNODC,
Vienna.
Yu X, Jia W. 2015. Moving targets: tracking online sales of illegal wildlife. Traffic,
Cambridge.
Zook M et al. 2017. Ten simple rules for responsible big data research. PLOS Computational
Biology 13:e1005399.