Use of machine learning to investigate illegal wildlife trade on social media

Enrico Di Minin 1,2*, Christoph Fink 1, Tuomo Hiippala 1, Henrikki Tenkanen 1

1 Digital Geography Lab, Department of Geosciences and Geography, University of

Helsinki, FI-00014, Finland; 2 School of Life Sciences, University of KwaZulu-Natal,

Durban, 4000, South Africa.

* Corresponding author: Dr. Enrico Di Minin, email:

Running head: Wildlife trade

Article Impact Statement: Machine learning can be used to monitor and assess the extent of

illegal wildlife trade on social media platforms.

Unsustainable harvesting is one of the major threats driving the global extinction crisis

(Maxwell et al. 2016). Among those groups whose threat status has been comprehensively

assessed by the International Union for Conservation of Nature (IUCN) (IUCN 2016),

unsustainable harvesting for commercial trade, subsistence, or recreation is now the most

prevalent threat impacting threatened marine species, and the second most prevalent (after

agriculture and aquaculture) for terrestrial and freshwater species (Maxwell et al. 2016).

Wildlife trade is a multi-billion-dollar industry, in which thousands of animals, plants, and

associated products, are traded globally as food, pets, medicines, clothing, and trophies

(Dalberg Global Development Advisors 2012). Wildlife trade escalates into a crisis when an

increasing proportion is illegal and unsustainable, directly threatening the persistence of

many species in the wild (Ripple et al. 2016).

On land, illegal wildlife trade is threatening the persistence of high-profile species, such

as rhinoceroses (Di Minin et al. 2015a), as well as many lesser-known species (Rosen &

Smith 2010; Phelps & Webb 2015). Animals and plants are traded live as pets and collectors'

items, or dead for medicine, ornaments, meat, and trophies (UNODC 2016). Demand for

seafood is also increasing, with illegal fishing threatening the persistence of many species

(FAO 2016). Illegal wildlife trade is considered to be among the largest illegitimate

businesses after illegal narcotics, involving criminal organizations and terrorist groups

(Dalberg Global Development Advisors 2012). Encouraged by poverty, poorly monitored

borders, corruption and weak regulations and enforcement, illegal wildlife trade continues to

grow (Dalberg Global Development Advisors 2012; UNODC 2016).

In recent years, the scale and nature of illegal wildlife trade has changed dramatically.

Internet is becoming a major market for wildlife products, as it provides cost-effective

solutions, vast outreach and anonymity for illegal wildlife traders (Hastie & McCrea-Steele

2014). While law enforcement actions have partially been successful in controlling illegal

wildlife trade on major e-commerce platforms, the trade appears to have moved to alternative

platforms, such as the ‘dark web’ (Harrison et al. 2016) and social media (Yu & Jia 2015).

Recent evidence suggests that illegal wildlife trade over the ‘dark web’ occurs in small

quantities (Roberts & Hernandez-Castro 2017). This might be partly because the ‘dark web’

lacks in popularity, and accessing the platform and locating illegal wildlife products requires

technical skills and know-how.

With estimated 2.5 billion users, easy access has turned social media into an important

venue for illegal wildlife trade (Hastie & McCrea-Steele 2014; Yu & Jia 2015). Wildlife

dealers active on social media release photos and information about wildlife products to

attract and interact with potential customers, while also informing their existing network of

contacts about available products. Currently, the lack of tools for efficient monitoring of

high-volume social media data limits the capability of law enforcement agencies to curb

illegal wildlife trade. In fact, identification of species and/or wildlife products traded on

social media is often manual (Hinsley et al. 2016; Eid & Handal 2017) and time-consuming,

potentially leading to outdated and ineffective solutions to the problem. Automating

information extraction is therefore a crucial step in preventing illegal wildlife trade on social


Within the broader field of artificial intelligence, machine learning focuses on

algorithms that learn from data without human guidance. In recent years, growing volume of

data and computational power has led to considerable advances in machine learning.

Particularly, the so-called deep learning algorithms have provided state-of-the-art results for

tasks in computer vision and natural language processing (LeCun et al. 2015). These tasks

include classifying image contents, locating objects and their outlines in images, or inferring

the meaning of a text. Applying these techniques to high-volumes social media data allows

investigating human behaviour at an unprecedented scale (Ruths & Pfeffer 2014). Despite

their potential, approaches combining new techniques and data sources are still rarely used in

addressing the biodiversity crisis (Di Minin et al. 2015b).

Using machine learning to monitor and assess the extent of illegal wildlife trade on

social media platforms is a frontier topic in conservation science. Here, we propose a

framework in which machine learning is used to investigate illegal wildlife trade on social

media platforms (Fig. 1). Several platforms, such as Facebook, Twitter, Weibo, and Flickr

(see for a full list),

allow accessing user-generated content via an application programming interface (API). Such

large-scale aggregate databases of social media activity include posts with images, videos,

and text, as well as information on networks of users (Fig. 1a). This information often

contains metadata for geographical location and a timestamp indicating when the content was

uploaded to the service. However, such ‘big’ data sources require filtering out information

irrelevant to illegal wildlife trade (e.g. ‘pangolin armoured vehicle’ as opposed to pangolin

taxa in Fig. 1a). Without automating the process, filtering high-volume content for relevant

information demands excessive time and resources.

Machine learning, however, can be used to identify verbal, visual or audio-visual

content pertaining to illegal wildlife trade automatically. Neural networks, for instance, which

excel in recognising and classifying the content of photographic images (Krizhevsky et al.

2012), may be trained to detect which species or wildlife products appear in an image, while

also classifying their setting (a natural habitat as opposed to a marketplace) (Fig. 1b). When

processing video (Karpathy et al. 2014), neural networks can be trained to look for additional

cues in audio (e.g. bird-specific calls) in addition to the stream of images (Liao et al. 2013).

Natural language processing (Goldberg 2015), in turn, can be used to (i) infer the meaning of

a verbal description (whether an animal or plant is for sale or observed in nature); (ii) detect

locations and species mentioned on social media; and (iii) classify the sentiment of social

media users towards illegal wildlife trade. Most importantly, neural networks can process

verbal, visual and audio-visual content simultaneously.

However, in order to learn to associate inputs and outputs, such as images and their

respective labels, neural networks require human-verified training data. When provided with

consistently labelled data, neural networks perform at a high level. A recent paper, for

example, used neural networks to identify species from camera trap images with 92%

accuracy (Norouzzadeh et al. 2017). Openly available datasets, such as ImageNet, which

includes 14 million images representing 22,000 classes, can provide initial training data for

many species (Jia Deng et al. 2009). However, more specific training data is needed to

identify wildlife products (e.g. pangolin scales or rhinoceros horn). Furthermore, there is also

need to consider whether a wildlife product can be traded legally or not, and to account for

the source of the specimens traded (e.g. captive-bred or wild-sourced) (Hinsley et al. 2016).

Unfortunately, such training datasets are rarely available. Therefore, crowd-sourced

initiatives involving experts will be needed to create sufficiently specific datasets. Advances

in machine learning, coupled with rich training datasets, may even allow detecting alternative

terms used for selling wildlife products on social media.

Once the original information derived from social media is filtered and datasets are

created (Fig 1c), analysing data will help understand trends and patterns of illegal wildlife

trade on social media. The location metadata can be used to analyse the spatiotemporal

dynamics of illegal trade (e.g. the type and quantity of wildlife products traded; what are the

nodes for trade routes; what types of routes exist between trade nodes and how they change

over time; etc.). Using this information in combination with other biodiversity knowledge

products, such as the IUCN Red List, can help assess whether the species or products are

traded outside of species range, or whether the species is coded as threatened in the IUCN

Red List (IUCN 2016). Furthermore, information available on user profiles and the global

connections between them can help identify the key exporter, intermediary, and importer

countries, by using social network analysis techniques. Finally, natural language processing

will help assess which species or wildlife products are discussed on social media, and the

users’ preferences and sentiment towards them. Such information, in turn, can inform

campaigns for behavioural change.

While the characteristics of social media data provide a great opportunity to track

illegal wildlife trade, there are still challenges and caveats (e.g. noisy or unreliable data, etc.)

associated with using social media content for research purposes (Di Minin et al. 2015b; Tsou

2017). In addition, scientists and practitioners have the ethical responsibility to minimize

potential harm to people who share illegal wildlife trade content on social media platforms

(Zook et al. 2017). Another problem is that a wealth of relevant data on illegal wildlife trade

is currently not open to research via APIs. For this reason, manual observation, filtering and

classification of content, particularly to assess whether content pertains to legal or illegal

trade, remains important (Hinsley et al. 2016; Eid & Handal 2017). However, human effort,

which is currently used to manually process large amounts of data, should be rather directed

to help train models that can be used to automatically investigate illegal wildlife trade.

The proposed methods and analyses are relevant for the implementation of the

Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES).

Expert groups mobilized via CITES and the IUCN should be used to generate datasets to help

train neural networks. Given the global reach of social media, creating partnerships between

CITES parties, social media companies and scientists working on artificial intelligence will

generate adequate resources and momentum to help stop the illegal wildlife trade on social



E.D.M thanks the Academy of Finland 2016–2019, Grant 296524, for support. C.F. thanks

the University of Helsinki for support via an Early Career Grant to E.D.M. T.H. was funded

by the Finnish Cultural Foundation. H.T. thanks the DENVI doctoral programme at

University of Helsinki for support.


Figure 1. Framework to (a) mine, (b) filter, and (c) identify relevant data on the illegal wildlife trade
from social media platforms using machine learning. Photo in (c) was obtained from Twitter.

