Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

TRECVID 2010 – An Introduction to the Goals, Tasks, Data,

Evaluation Mechanisms, and Metrics


Paul Over {over@nist.gov} Martial Michel
George Awad {gawad@nist.gov} Systems Plus
Jon Fiscus {jfiscus@nist.gov} One Research Court, Suite 360
Brian Antonishek {brian.antonishek@nist.gov} Rockville, MD 20850
Information Access Division {martial.michel@nist.gov}
National Institute of Standards and Technology
Gaithersburg, MD 20899-8940, USA

November 2, 2010

1 Introduction 400 hours of short videos from the Internet Archive


(archive.org), available under Creative Commons li-
The TREC Video Retrieval Evaluation (TRECVID) censes (IACC), were used for semantic indexing,
2010 was a TREC-style video analysis and retrieval known-item search, and copy detection. Unlike previ-
evaluation, the goal of which remains to promote ously used professionally edited broadcast news and
progress in content-based exploitation of digital video educational programming, the IACC videos reflect a
via open, metrics-based evaluation. Over the last 10 wide variety of content, style, and source device - de-
years this effort has yielded a better understanding termined only by the self-selected donors. About 180
of how systems can effectively accomplish such pro- hours of Sound and Vision video was reused for the
cessing and how one can reliably benchmark their instance search pilot. 45 hours of airport surveillance
performance. TRECVID is funded by the NIST and video was reused for the surveillance event detection
other US government agencies. Many organizations task. About 100 hours from a new test collection of
and individuals worldwide contribute significant time Internet videos (HAVIC) was used for the multimedia
and effort. event detection pilot.
In 2010, TRECVID turned to new and different Copy detection submissions were evaluated at
data and to some new tasks. 73 teams (see Table 1) NIST based on ground truth created automatically
from various research organizations — 27 from Eu- with tools donated by the INRIA-IMEDIA group.
rope, 32 from Asia, 12 from North America, 1 from Instance search results were judged by NIST asses-
Africa, and 1 from South America — completed one sors - similarly for the semantic indexing task with
or more of six tasks: additional assessments done in France under the Eu-
ropean Quaero program (QUAERO, 2010). Known-
1. content-based copy detection
item search topics and associated ground truth were
created by NIST assessors, so submissions could be
2. instance search
scored automatically. Multimedia and surveillance
event detection were scored using ground truth cre-
3. known-item search
ated manually by the Linguistic Data Consortium un-
der contract to NIST.
4. semantic indexing
This paper is an introduction to the evaluation
5. surveillance event detection framework — the tasks, data, and measures for the
workshop. For detailed information about the ap-
6. multimedia event detection proaches and results, the reader should see the vari-

1
ous site reports and the results pages available at the (Heinrich Hertz) Institute in Berlin. These shots
back of the workshop notebook and on the TRECVID served as the predefined units of evaluation.
website. Roeland Ordelman and Marijn Huijbregts at the
Disclaimer: Certain commercial entities, equip- University of Twente had provided the output of
ment, or materials may be identified in this docu- an automatic speech recognition system run on the
ment in order to describe an experimental procedure Sound and Vision data. Christof Monz of Queen
or concept adequately. Such identification is not in- Mary, University London had contributed machine
tended to imply recommendation or endorsement by translation (Dutch to English) for the Sound and Vi-
the National Institute of Standards, nor is it intended sion video based on the University of Twente’s auto-
to imply that the entities, materials, or equipment are matic speech recognition (ASR). The LIMSI Spoken
necessarily the best available for the purpose. Language Processing Group had produced a speech
transcription for the TRECVID 2007-2009 Sound and
Vision data using its recently developed Dutch rec-
2 Data ognizer.

2.1 Video
iLIDS Multiple Camera Tracking Data
Internet Archive Creative Commons (IACC)
video The iLIDS Multiple Camera Tracking data consisted
of ∼150 hours of indoor airport surveillance video col-
Approximately 8000 Internet Archive videos (50GB, lected in a busy airport environment by the United
200 hours) with Creative Commons licenses in Kingdom (UK) Home Office Scientific Development
MPEG-4/H.264 and with durations between 10 sec- Branch (HOSDB). The dataset utilized 5, frame-
onds and 3.5 minutes were used as test data. Most synchronized cameras.
videos had some donor-supplied metadata available
The training video consisted of the ∼100 hours
e.g., title, keywords, and description. Another 3200
of data used for SED 2008 evaluation. The evalu-
IACC video (50GB, 200 hours) with durations be-
ation video consisted of an additional ∼50 hours of
tween 3.6 and 4.1 minutes were designated for use in
data from Imagery Library for Intelligent Detection
system development.
System’s (iLIDS) multiple camera tracking scenario
LIMSI and VecSys research provided automatic (UKHO-CPNI, 2007 (accessed June 30, 2009)).
speech recognition for the English speech in the IACC
One third of the evaluation video was annotated by
video.
the Linguistic Data Consortium using a triple-pass
Georges Quénot and Stéphane Ayache of LIG
annotation procedure. Seven of the ten annotated
(Laboratoire d’Informatique de Grenoble) again orga-
events were used for the 2010 evaluation.
nized a collaborative annotation by TRECVID par-
ticipants of 130 features against the IACC videos. us-
ing an active learning scheme designed to improve the Heterogeneous Audio Visual Internet
efficiency of the process (Ayache & Quénot, 2008). (HAVIC) Corpus

Sound and Vision data The Heterogeneous Audio Visual Internet (HAVIC)
Corpus is a new, large corpus of Internet multime-
In 2006 the Netherlands Institute for Sound and Vi- dia files collected by the Linguistic Data Consor-
sion generously provided 400 hours of Dutch TV tium. The corpus contained ∼3400 video clips con-
news magazine, science news, news reports, docu- sists of ∼114 hours of MPEG-4 (MPEG-4, 2010) for-
mentaries, educational programming, and archival matted files containing H.264 (H.264, 2010) encoded
video in MPEG-1 format for use within TRECVID. video and MPEG-4’s Advanced Audio Coding (ACC)
About 180 hours of Sound and Vision video, pre- (ACC, 2010) encoded audio. The data was collected
viously used for testing feature extraction and ad to specifically contain 100 instances of three events:
hoc search, were reused in 2010 for testing instance ”Assembling a Shelter”, ”Batting in a run”, ”Making
search. a Cake”. The data was evenly divided up into a ∼57
The video had already been automatically divided hour development set and a ∼57 hour evaluation set
into shots by Christian Petersohn at the Fraunhofer – each set containing ∼50 instances per event.
3 Semantic indexing Two types of submissions are considered: full sub-
missions in which participants submit results for all
A potentially important asset to help video 130 concepts and lite submissions in which partici-
search/navigation is the ability to automatically pants submit results for only 10 concepts. TRECVID
identify the occurrence of various semantic fea- only evaluated 30 concepts - 20 based on judgments
tures/concepts such as “Indoor/Outdoor”,”People”, done at NIST and 10 done under the Quaero pro-
“Speech” etc., which occur frequently in video infor- gram in France. The 10 features from the lite set
mation. The ability to detect features is an interest- were included in the 20 judged at NIST. The 10 light
ing challenge by itself but takes on added importance concepts overlap with 2008 and 2009 high-level fea-
to the extent it can serve as a reusable, extensible ture task. The 2010 30 evaluated concepts were as
basis for query formation and search. The seman- follows. Those marked with an asterisk are the light
tic indexing task was a follow-on to the feature ex- concepts:
traction task. It was coordinated by NIST and by [4] * Airplane-flying, [6] Animal, [7] Asian-People,
Georges Quénot under the Quaero program and had [13] Bicycling, [15] * Boat-Ship, [19] * Bus, [22]
the following additional, new objectives: Car-Racing,[27] Cheering, [28] * Cityscape, [29] *
Classroom, [38] Dancing, [39] Dark-skinned-People,
• to increase the number of semantic concepts
[41] * Demonstration-Or-Protest, [44] Doorway, [49]
most systems can extract and the number eval-
Explosion-Fire, [52] Female-Human-Face-Closeup,
uated
[53] Flowers, [58] Ground-Vehicles, [59] * Hand, [81]
• to support experiments using relations in a sim- Mountain, [84] * Nighttime, [86] Old-People, [100]
ple ontology among the concepts to be detected Running, [105] * Singing, [107] Sitting-down, [115]
Swimming, [117] * Telephone, [120] Throwing, [126]
• to offer a “lite” version of the task to encourage Vehicle, and [127] Walking.
new participation Concepts were defined in terms a human judge
could understand. Some participating groups made
The semantic indexing task was as follows. Given
their feature detection output available to partici-
a standard set of shot boundaries for the semantic in-
pants in the search task which really helped in the
dexing test collection and a list of concept definitions,
search task and contributed to the collaborative na-
participants were asked to return for each concept in
ture of TRECVID.
the full set of concepts, at most the top 2 000 video
shots from the standard set, ranked according to the The fuller concept definitions provided to system
highest possibility of detecting the presence of the developers and NIST assessors are listed with the de-
concept. The presence of each concept was assumed tailed semantic indexing runs at the back of the work-
to be binary, i.e., it was either present or absent in the shop notebook and in Appendix B in this paper.
given standard video shot. If the concept was true for Work at Northeastern University (Yilmaz &
some frame (sequence) within the shot, then it was Aslam, 2006) has resulted in methods for estimat-
true for the shot. This is a simplification adopted ing standard system performance measures using rel-
for the benefits it afforded in pooling of results and atively small samples of the usual judgment sets so
approximating the basis for calculating recall. that larger numbers of features can be evaluated us-
130 concepts were been selected for the TRECVID ing the same amount of judging effort. Tests on past
2010 semantic indexing task. These include all the data showed the new measure (inferred average pre-
TRECVID ”high level features” from 2005 to 2009 cision) to be a good estimator of average precision
plus a selection of LSCOM concepts so that we end (Over, Ianeva, Kraaij, & Smeaton, 2006). This year
up with a number of generic-specific relations among mean extended inferred average precision (mean xin-
them. The goal is to promote research on methods fAP) was used which permits sampling density to
for indexing many concepts and using ontology rela- vary (Yilmaz, Kanoulas, & Aslam, 2008). This al-
tions between them. Also it is expected that these lowed the evaluation to be more sensitive to shots re-
concepts will be useful for the content-based (known turned below the lowest rank ( 100) previously pooled
item) search task. Including TRECVID 2005 to 2009 and judged. It also allowed adjustment of the sam-
features favors the reuse of already available anno- pling density to be greater among the highest ranked
tations and judgments and encourages cross-domain items that contribute more average precision than
evaluations. those ranked lower.
3.1 Data believes it is contained in a collection, but doesn’t
know where to look. To begin the search process,
As mentioned earlier, the IACC test collection con-
the searcher formulates a text-only description, which
tained approximately 8000 files/videos in MPEG-
captures what the searcher remembers about the
4/H.264 format and 146 788 shots. Development
target video. This task is very different from the
data contained 3200 files/videos and approx. 119 685
TRECVID ad hoc search task in which the systems
shots. Testing concept detection and known item
began with a textual description of the need together
search on the same data offered the opportunity to
with several image and video examples of what was
assess the quality of concepts being used in search.
being looked for.

3.2 Evaluation 4.1 Task


Each group was allowed to submit up to 4 runs and in
Given a text-only description of the video desired (i.e.
fact 39 groups submitted a total of 150 runs including
a topic) and a test collection of video with associated
100 full runs and 50 lite runs.
metadata:
For each concept, 3 pools were created and ran-
domly sampled as follows. The top pool sampled • automatically return a list of up to 100 video
100% of shots ranked 1-10 across all submissions. The IDs ranked by probability to be the one sought.
medium pool sampled 20% of ranked 11-100 and not There was no time limit on automatic searches
already included in a pool. The bottom pool sam- but the elapsed time for each search - from the
pled 5% of shots ranked 101-2000 and not already time the topic is presented to the system until
in a pool. Human judges (assessors) were presented the search result for the topic is frozen as com-
with the pools - one assessor per concept - and they plete - had to be submitted with the system out-
judged each shot by watching the associated video put. or
and listening to the audio. In all, 117 058 shots were
judged.1 537 314 shots fell into the unjudged part of • interactively return the ID of the sought video
the overall samples. All full runs were also treated as and elapsed time to find it. No more than 5
lite runs by looking at their performance on just the minutes could elapse from the time the topic is
10-feature lite subset. presented to the system/searcher until the search
result for the topic was frozen as complete. In-
teractive systems were able to query a web-based
3.3 Measures
service to find out if a given video file was the
The sample eval software, a tool implementing xin- known-item sought - this to simulate the fact
fAP, was used to calculate inferred recall, inferred that searchers looking for their own known-item
precision, inferred average precision, etc., for each re- would recognize it if they found it and stop the
sult, given the sampling plan and a submitted run. search. Each such query was logged and all logs
Since all runs provided results for all evaluated con- published with the TRECVID workshop results.
cepts, runs can be compared in terms of the mean
inferred average precision across all 30 (or 10 lite) The topic also contained a list of 1-5 words or short
evaluated concepts. The results also provide some phrases, each identifying an object/person/location
information about “within concept” performance. that should be visible in the target video

3.4 Results 4.2 Data


Readers should see the results section at the back The test data set (IACC.1.A) was 200 hours drawn
of the notebook and on the TRECVID website for from the IACC.1 collection using videos with dura-
details about the performance of each run. tions between 10 seconds and 3.5 minutes.

4.3 Topics
4 Known-item search
300 text-only topics were created by NIST assessors.
The known-item search task models the situation in For each of the random sample of IACC videos as-
which someone knows of a video, has seen it before, signed to them, they were told to watch the video at
least once, pause, and then formulate a brief textual person, object, or place entity in some example video,
query that would likely be satisfied only by the video locate for each query the 1000 shots most likely to
they just watched. Finally they were asked to choose contain a recognizable instance of the entity. Each
from the topic 1 to 5 objects, people, or events and query consisted of a set of
list those as part of the topic.
• 5 or so example frame images drawn at intervals
from a video containing the item of interest. For
4.4 Evaluation each frame image:
Since the target video was determined for each topic
– the rectangular region within the frame im-
as during topic creation, evaluation could be auto-
age, containing the item of interest
matic.
– a binary mask of an inner region of interest
within the rectangle
4.5 Measures
– the inner region against a gray background
Automatic runs were scored against the ground truth
using mean inverted rank at which the known item – the frame image with the inner region re-
is found or zero if not found. For interactive runs, gion outlined in red
which returned either one or no known items per – a list of vertices for the inner region region
topic, mean inverted rank measures the fraction of all
topics for which the known item was found. For in- • the video from which the images were selected
teractive runs elapsed time and user satisfaction were
• an indication of the target type taken from this
also measured.
set of strings (PERSON, CHARACTER, LOCA-
TION, OBJECT)
4.6 Results
For results please see the results page for each run at 5.2 Data
the back of the workshop notebook.
Test data: Sound and Vision data from TRECVID
2007-2009 (tv9.sv.test).
5 Instance search pilot
5.3 Topics
An important need in many situations involving
video collections (archive video search/reuse, per- In a first approximation to the full task, most queries
sonal video organization/search, surveillance, law en- were created by NIST and targeted actors that ap-
forcement, protection of brand/logo use) is to find peared as themselves or as characters in Sound and
more video segments of a certain specific person, ob- Vision programs - in different clothes, costumes, set-
ject, or place, given one or more visual examples of tings, etc. In a few cases objects (including logos)
the specific item. and locations were targeted.
In 2010 this was a pilot task - evaluated by NIST As this was pilot task, participants were en-
but intended mainly to explore task definition and couraged to help by examining the test data and
evaluation issues using data and an evaluation frame- contributing up to 5 topics per team with non-
work in hand. The task was a first approximation to person/character targets. Several teams did so. See
the desired full task using a smaller number of top- Appendix A for a listing of the topics.
ics, a simpler identification of the target entity, and
less accuracy in locating the instance than would be
5.4 Evaluation, Measures
desirable in a full evaluation of the task.
This pilot version of the task was treated as a form
of search and evaluated accordingly with average pre-
5.1 Task
cision for each query in each run and per-run mean
The instance search task for the systems was as fol- average precision over all queries. While speed and
lows. Given a collection of test videos, a master shot location accuracy were also definitely of interest here,
reference, and a collection of queries that delimit a of these two, only speed was measured in the pilot.
5.5 Results The 2010 MED evaluation was a pilot for two main
reasons. First, only three events, Assembling a Shel-
For results please see the results page for each run at
ter, Batting in a run, and Making a cake, were used
the back of the workshop notebook.
for the evaluation. Future evaluations will involve
ten events and new events will be tested each year.
6 Multimedia event detection Second, the data resources were small on the order
of 100 hours as opposed to 1000s of hours for future
pilot evaluations.
The 2010 Multimedia Event Detection (MED) Pi-
lot evaluation was the first evaluation of technologies 6.1 Data
that search multimedia video clips for events of inter-
A new collection of Internet multimedia (i.e., video
est to a user. An event for MED:
clips containing both audio and video streams) was
• is a complex activity occurring at a specific place provided to MED participants. The data, which was
and time; collected and distributed by the Linguistic Data Con-
sortium, consists of publicly available, user-generated
• involves people interacting with other people content posted to the various Internet video hosting
and/or objects; sites. Instances of the events were collected by specif-
• consists of a number of human actions, processes, ically searching for target events using text-based In-
and activities that are loosely or tightly orga- ternet search engines. All video data was reviewed to
nized and that have significant temporal and se- protect privacy, remove offensive material, etc., prior
mantic relationships to the overarching activity; to inclusion in the corpus.
Video clips were provided in MPEG-4 formatted
• is directly observable. files. The video was encoded to the H.264 standard.
The audio was encoded using MPEG-4’s Advanced
A user searching for events in multimedia mate-
Audio Coding (AAC) standard.
rial may be interested in a wide variety of potential
The video data collection will be divided into two
events. Since it is an intractable task to build special
data sets:
purpose detectors for each event a priori, a technol-
ogy is needed that can take as input a human-centric
• Development data consisting of 1746 total clips
definition of an event that developers (and eventu-
(c. 56 hours). The development data set includes
ally systems) can use to build a search query. The
nominally 50 instances of each of the three MED
events for MED were defined via an event kit which
events and the rest of the clips are not on any of
consisted of:
the three MED events.
• An event name which is an mnemonic title for
the event. • Evaluation data consisting of 1742 total clips (c.
59 hours). The evaluation data set included nor-
• An event definition which is a textual definition mally 50 instances per event.
of the event.
• An evidential description which is a textual list- 6.2 Evaluation
ing of the attributes that are indicative of an
Sites submitted system outputs for any combination
event instance. The evidential description pro-
of the three events. Outputs included a detection
vides a notion of some potential types of visual
score which expresses the strength of evidence sup-
and acoustic evidence indicating the event’s ex-
porting the existence of the event and detection de-
istence but it is not an exhaustive list nor is it
cision (yes/no) for each event observation.
to be interpreted as required evidence.
Submission performance was computed using the
• A set of illustrative video examples each contain- Framework for Detection Evaluation (F4DE) toolkit.
ing an instance of the event. The examples are Groups were required to submit a primary run, which
illustrative in the sense they help form the defini- is the run they expect to be their best performing sys-
tion of the event but they do not demonstrate all tem and optionally allowed to submit multiple runs
the inherent variability or potential realizations. with contrastive conditions.
6.3 Measures queries using tools developed by IMEDIA to include
some randomization at various decision points in the
Since detection system performance is a tradeoff be- construction of the query set. In total 1608 video-only
tween probability of miss and false alarms, this task queries were constructed. For each query, the tools
used the Normalized Detection Cost (NDC) mea- took a segment from the test collection, optionally
sure for evaluating system performance. NDC is a transformed it, embedded it in some video segment
weighted linear combination of the system’s missed which did not occur in the test collection, and then
detection probability and false alarm probability. finally applied one or more transformations to the en-
Participants were provided a graph of the Decision tire query segment. Some queries contained no test
Error Tradeoff (DET) curve for each event their sys- segment; others were composed entirely of the test
tem detected; the DET curves were plotted for all segment. Video transformations included camcording
events. simulation, picture-in-picture, insertion of patterns,
reencoding, change of gamma, decreasing the quality,
6.4 Results and post production alterations. Video transforma-
tions used were documented in detail as part of the
Readers are asked to see the results pages at the back TRECVID Guidelines.
of the workshop notebook and on the TRECVID web- 1407 audio-only queries were generated by Dan El-
site for information about each run’s performance. lis at Columbia University along the same lines as the
video-only queries: an audio-only version of the set of
201 base queries was transformed by seven techniques
7 Copy detection that were intended to be typical of those that would
occur in real reuse scenarios: (1) bandwidth limita-
As used here, a copy is a segment of video de- tion (2) other coding-related distortion (e.g. subband
rived from another video, usually by means of var- quantization noise) (3) variable mixing with unre-
ious transformations such as addition, deletion, mod- lated audio content.
ification (of aspect, color, contrast, encoding, ...),
A script to construct 11256 audio + video queries
camcording, etc. Detecting copies is important for
was provided by NIST. These queries comprised all
copyright control, business intelligence and advertise-
the combinations of transformed audio(7) and trans-
ment tracking, law enforcement investigations, etc.
formed video (8) from a given base audio+video
Content-based copy detection offers an alternative to
query (201).
watermarking.
As the audio plays an important role in detecting
copied videos, this year systems were required to sub-
mit runs for only one required query type task (video
7.1 Data
+ audio queries). Systems had the option to individu- The new Internet Archive video collection was used as
ally evaluate video-only and audio-only query types. a source for reference and non-reference videos. This
Two application profiles were required to be simu- year’s testing and development videos (11200 files) of
lated. One that requires a balanced cost for misses 400 hours and duration less than 4.1 min were used as
and false alarms. And one that requires no false a source from which the test query generation tools
alarms (thus very high cost for false alarms). systems chose reference video. While the non-reference video
are required to submit a decision score threshold be- collection was selected from a set of 12480 videos with
lieved to correspond to the best performance for the total duration of 4000 hours and duration between
run. 10-30 min.
The required system task was as follows: given a
test collection of videos and a set of 11256 queries,
determine for each query the place, if any, that some 7.2 Evaluation
part of the query occurs, with possible transforma-
tions, in the test collection. Two thirds of the queries In total in 2010, 22 participant teams submitted 78
contained copies. runs for evaluation. 41 runs were submitted as bal-
A set of 8 possible video transformations was se- anced runs and 37 as no false alarms. Copy detection
lected to reflect actually occurring video transforma- submissions were evaluated separately for each trans-
tions and applied to each of 201 untransformed (base) formation, according to:
• How many queries they find the reference data 7.4 Results
for or correctly tell us there is none to find
Readers are asked to see the results pages at the back
• When a copy is detected, how accurately the run of the workshop notebook and on the TRECVID web-
locates the reference data in the test data. site for information about each run’s performance.

• How much elapsed time is required for query pro-


cessing 8 Surveillance event detection
After creating the query set, it was found that 5 The 2010 Surveillance Event Detection evaluation
base queries have to be dropped from evaluation be- was the third evaluation focused on event detection
cause there exist multiple answers for them in the in the surveillance video domain. The first such eval-
reference set or because some were taken from origi- uation was conducted as part of the 2008 TRECVID
nal corrupted videos. conference series (Rose, Fiscus, Over, Garofolo, &
Michel, 2009) and the second was in 2009. It was de-
signed to move computer vision technology towards
7.3 Measures (per transformation) robustness and scalability while increasing core com-
• Minimal Normalized Detection Cost Rate: a petency in detecting human activities within video.
cost-weighted combination of the probability of The approach used was to employ real surveillance
missing a true copy and the false alarm rate. For data, orders of magnitude larger than previous com-
TRECVID 2010 the cost model assumed copies puter vision tests, and consisting of multiple, syn-
are very rare (e.g. 0.005/hr) then two applica- chronized camera views.
tion profiles were required. The “Balanced” pro- In 2008, NIST collaborated with the Linguistics
file in which misses and false alarms are assigned Data Consortium (LDC) and the research community
a cost of 1, and the “Nofa” profile in which a to select a set of naturally occurring events with vary-
false alarm is assigned a cost of 1000 times the ing occurrence frequencies and expected difficulty.
cost of a miss. Other realistic scenarios were For this evaluation, we define an event to be an ob-
of course possible. Normalized minimal detec- servable state change, either in the movement or in-
tion cost rate (minNDCR) reduced in 2010 to teraction of people with other people or objects. As
two terms involving two variables: probability of such, the evidence for an event depends directly on
a miss (Pmiss) and the number of false alarms what can be seen in the video and does not require
(FA). The total length of queries per transfor- higher level inference. The same set of 10 events were
mation in hours was 4.6. For the “Nofa” profile: used for the 2009 evaluation and 7 of the 10 events
were used for the 2010 evaluation.
minN DCR = P miss + 108.7 ∗ F A For 2010, the evaluation re-used the 2009 test cor-
pus. The test data was the Imagery Library for In-
For the same queries under the “Balanced” pro- telligent Detection System’s (iLIDS) (UKHO-CPNI,
file: 2007 (accessed June 30, 2009)) Multiple Camera
Tracking Scenario Training (MCTTR) data set col-
minN DCR = P miss + 0.1 ∗ F A lected by the United Kingdom’s Home Office Science
and Development Branch.
• Copy location accuracy: mean F1 score combin- In 2010, only the retrospective event detection was
ing the precision and recall of the asserted copy supported. The task is defined as follows: given a set
location versus the ground truth location of video sequences, detect as many event observations
as possible in each sequence. For this evaluation,
• Copy detection processing time: mean process- a single-camera condition was used as the required
ing time (s) condition (multiple-camera input was allowed as a
contrastive condition). Furthermore, systems could
Finally, the submitted run threshold was used to cal- perform multiple passes over the video prior to out-
culate the actual NDCR and F1 and those results putting a list of putative events observations (i.e., the
were compared to the minNDCR and F1 using the task was retrospective).
optimal threshold calculated by the DET curve. The annotation guidelines were developed to ex-
press the requirements for each event. To determine Groups were allowed to submit multiple runs with
if the observed action is a taggable event, a reason- contrastive conditions. System submissions were
able interpretation rule was used. The rule was, “if aligned to the reference annotations scored for missed
according to a reasonable interpretation of the video, detections / false alarms.
the event must have occurred, then it is a taggable
event”. Importantly, the annotation guidelines were
designed to capture events that can be detected by 8.3 Measures
human observers, such that the ground truth would Since detection system performance is a tradeoff be-
contain observations that would be relevant to an op- tween probability of miss vs. rate of false alarms,
erator/analyst. In what follows we distinguish be- this task used the Normalized Detection Cost Rate
tween event types (e.g., parcel passed from one person (NDCR) measure for evaluating system performance.
to another), event instance (an example of an event NDCR is a weighted linear combination of the sys-
type that takes place at a specific time and place), tem’s Missed Detection Probability and False Alarm
and an event observation (event instance captured Rate (measured per unit time). Participants were
by a specific camera). provided a graph of the Decision Error Tradeoff
(DET) curve for each event their system detected;
8.1 Data the DET curves were plotted over all events (i.e., all
days and cameras) in the evaluation set.
The development data consisted of the full 100 hours
data set used for the 2008 Event Detection (Rose
et al., 2009) evaluation. The video for the evaluation 8.4 Results
corpus came from the approximate 50 hour iLIDS
Readers are asked to see the results pages at the back
MCTTR data set. Both data sets were collected in
of the workshop notebook and on the TRECVID web-
the same busy airport environment. The entire video site for information about each run’s performance.
corpus was distributed as MPEG-2 in Phase Alter-
nating Line (PAL) format (resolution 720 x 576), 25
frames/sec, either via hard drive or Internet down-
load.
9 Summing up and moving on
System performance was assessed on the same 15-
This introduction to TRECVID 2010 has provided
hour subset of the evaluation corpus as the 2009 Eval-
basic information on the goals, data, evaluation
uation. Event annotation was performed be the LDC
mechanisms and metrics used. Further details about
using a three-pass annotation scheme. The multi-
each particular group’s approach and performance for
pass process improves the human annotation recall
each task can be found in that group’s site report.
rates.
The raw results for each submitted run can be found
The videos were annotated using the Video Per- in the results section at the back of the notebook.
formance Evaluation Resource (ViPER) tool. Events
were represented in ViPER format using an annota-
tion schema that specified each event observation’s
time interval.
10 Authors’ note
TRECVID would not have happened in 2010 with-
8.2 Evaluation out support from the National Institute of Standards
and Technology (NIST), the Intelligence Advanced
Sites submitted system outputs for the detection Research Projects Activity (IARPA), and the De-
of any 3 of 7 possible events (PersonRuns, Cell- partment of Homeland Security (DHS). The research
ToEar, ObjectPut, PeopleMeet, PeopleSplitUp, Em- community is very grateful for this. Beyond that, var-
brace, and Pointing). Outputs included the tempo- ious individuals and groups deserve special thanks:
ral extent as well as a confidence score and detection
decision (yes/no) for each event observation. Devel- • Brewster Kahle (Internet Archive’s founder) and
opers were advised to target a low miss, high false R. Manmatha (U. Mass, Amherst) suggested in
alarm scenario, in order to maximize the number of December of 2008 that TRECVID take another
event observations. look at the resources of the Archive.
• Cara Binder and Raj Kumar helped explain how 9005 CHARACTER - Professor Fetze Alsvanouds
to query and download automatically from the from the University of Harderwijk (Aart Staart-
Internet Archive. jes)

• Shin’ichi Satoh at NII along with Alan Smeaton 9006 PERSON - Prince Bernhard
and Brian Boyle at DCU arranged for the mir-
9007 CHARACTER - The Cook (Alberdinck Thijn:
roring of the video data.
Gijs de Lange)
• Georges Quénot with Franck Thollard, Andy
9008 PERSON - Jeroen Kramer
Tseng, Bahjat Safadi from LIG and Stéphane
Ayache from LIF shared coordination of the se- 9009 CHARACTER - Two old ladies, Ta en To
mantic indexing task, organized the community
annotation of 130 features, and provided judg- 9010 CHARACTER - one of two officeworkers
ments for 10 features under the Quaero program. (Kwelder of Benema en Kwelder: Harry van Ri-
jthoven)
• Georges Quénot provided the master shot refer-
ence for the IACC.1 videos. 9011 PERSON - Colin Powell

9012 PERSON - Midas Dekkers


• At DCU Colum Foley and Kevin McGuinness
helped segment the instance search topic exam- 9013 OBJECT - IKEA logo on clothing
ples and built the oracle for interative instance
search. 9014 CHARACTER - Boy Zonderman (actor in
leopard tights and mesh top: Frank Groothof)
• The LIMSI Spoken Language Processing Group
and VexSys Research provided ASR for the 9015 OBJECT - black robes with white bibs worn
IACC.1 videos. by Dutch judges and lawyers

• Laurent Joyeux (INRIA-Roquencourt) updated 9016 OBJECT - zebra stripes on pedestrian crossing
the copy detection query generation code. 9017 OBJECT - KLM Logo
• Matthijs Douze from INRIA-LEAR volunteered 9018 LOCATION - interior of the Dutch parliament
a camcorder simulator to automate the camcord-
ing transformation for the copy detection task. 9019 OBJECT - Kappa logo

• Emine Yilmaz (Microsoft Research) and Evange- 9020 OBJECT - Umbro logo
los Kanoulas (U. Sheffield) updated their xinfAP 9021 OBJECT - tank
code (sample eval.pl) to estimate additional val-
ues and made it available. 9022 OBJECT - Willem Wever van

Finally we want to thank all the participants and


other contributors on the mailing list for their enthu- 12 Appendix B: Con-
siasm and diligence.
cepts/Features
The features labeled with an asterisk comprise the
11 Appendix A: Instance “lite” subset.
search topics 004 *Airplane Flying - An airplane flying in the sky
9001 PERSON - George W. Bush 006 Animal - Shots depicting an animal (no humans)
9002 PERSON - George H. W. Bush 007 Asian People - People of Asian ethnicity
9003 PERSON - J. P. Balkenende 013 Bicycling - A person riding a bicycle

9004 PERSON - Bart Bosch 015 *Boat-Ship - Shots of a boat or ship


019 *Bus - Shots of a bus 084 *Nighttime - Shots that take place (outdoors)
at night. Included is the continuation of story if
022 Car Racing - Shot of scenes at car races ambiguous (if a story starts at night, it probably
ends at night). Excluded are sports events under
027 Cheering - One or more people cheering or ap- lights
plauding
086 Old People - Seniors or elderly people
028 *Cityscape - View of a large urban setting,
showing skylines and building tops. (Not just 100 Running - One or more people running
street-level views of urban life)
105 *Singing - One or more people singing
029 *Classroom - Images of school or university style
107 Sitting Down - Person in the act of sitting down
classroom scenes
115 Swimming - One or more people swimming
038 Dancing - one or more, not necessarilyt with ea-
chother 117 *Telephones - All kinds of phones. If only the
headset is visible, it was not included
039 Dark-skinned People - People who are dark
skinned due to African or African/American de- 120 Throwing - A person throwing some object
scent (ethnicity)
126 Vehicle - Any thing used for transporting people
041 *Demonstration or Protest - One or more peo- or goods, such as a car, bus, truck, cart, plane,
ple protesting. May or may not have banners or etc.
signs
127 Walking - One or more people walking
044 Doorway - An opening you can walk through
into a room or building
References
049 Explosion Fire - Shots of an explosion or a fire
ACC. (2010). http://www.iso.org/iso/
052 Female-Human-Face-Closeup - Closeup of a fe- catalogue detail.htm?csnumber=42739.
male human’s face (face must clearly fill more
than 1/2 of height or width of a frame but can Ayache, S., & Quénot, G. (2008). Video Corpus An-
be from any angle and need not be completely notation Using Active Learning,. In Proceedings
visible) of the 30th european conference on information
retrieval (ecir’08) (pp. 187–198). Glasgow, UK.
053 Flowers - Pictures of flowers
H.264. (2010). http://www.itu.int/rec/
058 Ground vehicles - Vehicles refers to ground ve- T-REC-H.264-201003-I/en.
hicles, which includes any of the following: Agri-
MPEG-4. (2010). http://mpeg.chiariglione.
cultural vehicle (tractor,combine), Armored ve-
org/.
hicle, Automobile, Bicycle, Bus, Construction
vehicle, Emergency vehicle, Limousine, Live- Over, P., Ianeva, T., Kraaij, W., & Smeaton,
stock carrier, Motor Scooter, Motorcycle, Truck, A. F. (2006). TRECVID 2006 Overview.
Box truck, Pickup truck, RV, bulldozer, quads. www-nlpir.nist.gov/projects/tvpubs/
Excludes interior of cars, vehicles badly de- tv6.papers/tv6overview.pdf.
stroyed
QUAERO. (2010). QUAERO homepage. www.
059 *Hand - A close-up view of one or more human quaero.org/modules/movie/scenes/home/.
hands, where the hand is the primary focus of
the shot Rose, T., Fiscus, J., Over, P., Garofolo, J., & Michel,
M. (2009). The TRECVid 2008 Event Detec-
081 Mountain - Shots depicting a mountain or tion Evaluation. In IEEE Workshop on Appli-
mountain range with the slopes visible cations of Computer Vision (WACV). IEEE.
UKHO-CPNI. (2007 (accessed June 30, 2009)). Im-
agery library for intelligent detection systems.
http://scienceandresearch.homeoffice.
gov.uk/hosdb/cctv-imaging-technology/
video-based-detection-systems/i-lids/.

Yilmaz, E., & Aslam, J. A. (2006). Estimating


Average Precision with Incomplete and Imper-
fect Judgments. In Proceedings of the Fifteenth
ACM International Conference on Information
and Knowledge Management (CIKM). Arling-
ton, VA, USA.

Yilmaz, E., Kanoulas, E., & Aslam, J. A. (2008). A


simple and efficient sampling method for esti-
mating AP and NDCG. In SIGIR ’08: Proceed-
ings of the 31st Annual International ACM SI-
GIR Conference on Research and Development
in Information Retrieval (pp. 603–610). New
York, NY, USA: ACM.
Table 1: Participants and tasks

Task Location Participants


−−− ∗∗∗ KIS ∗∗∗ −−− SIN Europe Aalto University School of Science & Technology
−−− −−− −−− −−− −−− SIN Europe Aristotle University of Thessaloniki
CCD −−− −−− −−− −−− −−− Asia Asahikasei Co.
CCD INS ∗∗∗ ∗∗∗ −−− ∗∗∗ NorthAm AT&T Labs - Research
−−− −−− −−− ∗∗∗ SED −−− Asia Beijing Jiaotong Univ.
CCD INS KIS −−− SED SIN Asia Beijing Univ. of Posts and Telecom.-MCPRL
CCD ∗∗∗ −−− ∗∗∗ −−− SIN Europe Brno Univ. of Technology
−−− ∗∗∗ KIS MED SED SIN NorthAm Carnegie Mellon Univ.
∗∗∗ ∗∗∗ KIS −−− −−− ∗∗∗ Asia Chinese Academy of Sciences - MCG
CCD −−− KIS −−− ∗∗∗ SIN Asia City Univ. of Hong Kong
−−− ∗∗∗ −−− MED −−− SIN NorthAm Columbia Univ.
−−− −−− −−− −−− SED −−− NorthAm Computer Research Inst. of Montreal
−−− ∗∗∗ −−− −−− −−− SIN Europe DFKI-MADM
−−− INS KIS −−− −−− −−− Europe Dublin City Univ.
−−− ∗∗∗ −−− ∗∗∗ ∗∗∗ SIN Europe EURECOM
−−− ∗∗∗ −−− −−− −−− SIN NorthAm Florida International Univ.
−−− ∗∗∗ −−− −−− −−− SIN Asia France Telecom Orange Labs (Beijing)
−−− −−− −−− −−− −−− SIN Asia Fudan Univ.
∗∗∗ −−− −−− −−− −−− SIN Asia Fuzhou Univ.
∗∗∗ INS KIS −−− −−− ∗∗∗ Europe Hungarian Academy of Sciences
CCD ∗∗∗ ∗∗∗ MED −−− ∗∗∗ NorthAm IBM T. J. Watson Research Center
−−− INS KIS MED −−− SIN Europe Informatics and Telematics Inst.
CCD ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ Europe INRIA-TEXMEX
−−− −−− −−− ∗∗∗ SED SIN Europe INRIA-willow
−−− ∗∗∗ −−− −−− −−− SIN Europe IRIT - Equipe SoratoriesAMoVA
−−− −−− KIS −−− −−− −−− Asia Inst.for Infocomm Research
CCD −−− −−− −−− −−− −−− Europe Istanbul Technical Univ.
−−− INS −−− −−− ∗∗∗ SIN Europe JOANNEUM RESEARCH
−−− INS KIS MED ∗∗∗ SIN NorthAm KB Video Retrieval
CCD −−− −−− ∗∗∗ ∗∗∗ ∗∗∗ Asia KDDI R&D Labs and SRI International
−−− −−− −−− −−− −−− SIN Europe Lab. d’Informatique Fondamentale de Marseille
−−− INS ∗∗∗ ∗∗∗ −−− SIN Europe Lab. d’Informatique de Grenoble for IRIM
−−− −−− −−− −−− −−− SIN Europe LSIS / UMR CNRS & USTV
−−− −−− −−− MED −−− −−− NorthAm Mayachitra, Inc.
CCD INS −−− −−− ∗∗∗ −−− Asia Nanjing Univ.
CCD −−− −−− −−− −−− −−− Asia National Chung Cheng Univ.
CCD INS ∗∗∗ ∗∗∗ ∗∗∗ SIN Asia National Inst. of Informatics
−−− ∗∗∗ −−− −−− −−− SIN Asia National Taiwan Univ.
−−− ∗∗∗ KIS ∗∗∗ ∗∗∗ −−− Asia National Univ. of Singapore
∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ SED SIN Asia NHK Science and Technical Research Labs
−−− −−− −−− MED −−− −−− Asia Nikon Corporation
CCD −−− −−− −−− −−− −−− Asia NTNU and Academia Sinica
CCD −−− −−− −−− −−− −−− Asia NTT Communication Science Labs-CSL
−−− INS −−− −−− −−− −−− Asia NTT Communication Science Labs-NII
−−− −−− KIS −−− −−− SIN Asia NTT Communication Science Labs-UT
−−− ∗∗∗ ∗∗∗ −−− −−− SIN Europe Oxford/IIIT
CCD −−− −−− −−− SED −−− Asia Peking Univ.-IDM
−−− −−− −−− ∗∗∗ −−− SIN Europe Quaero consortium
−−− −−− −−− −−− SED −−− Europe Queen Mary, Univ. of London
−−− −−− ∗∗∗ −−− −−− SIN Asia Ritsumeikan Univ.
CCD −−− ∗∗∗ −−− −−− ∗∗∗ Asia Shandong Univ.
−−− −−− −−− −−− −−− SIN Asia SHANGHAI JIAOTONG UNIVERSITY-IS
−−− −−− −−− −−− SED −−− NorthAm Simon Fraser Univ.
CCD −−− −−− −−− ∗∗∗ ∗∗∗ Asia Sun Yat-sen Univ. - GITL
CCD −−− −−− −−− −−− −−− Europe Telefonica Research
Task legend. CCD:copy detection; INS:instance search; KIS:known-item search; MED:multimedia event detection;
SED: surveillance event detection; SIN:semantic indexing; − − −:no run planned; ∗ ∗ ∗:planned but not submitted
Table 2: Participants and tasks

Task Location Participants


∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ SED SIN Asia Tianjin Univ.
−−− INS −−− −−− −−− −−− Europe TNO ICT - Multimedia Technology
∗∗∗ INS −−− −−− −−− −−− Asia Tokushima Univ.
−−− ∗∗∗ −−− ∗∗∗ SED SIN Asia/NorthAm Tokyo & Inst. of Techn.+ Georgia Inst. of Techn.
CCD ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ Asia Tsinghua Univ.-IMG
CCD ∗∗∗ −−− −−− ∗∗∗ SIN Europe TUBITAK - Space Technologies Research Inst.
−−− −−− −−− −−− −−− SIN Europe Universidad Carlos III de Madrid
−−− INS KIS ∗∗∗ ∗∗∗ SIN Europe Univ. of Amsterdam
CCD −−− −−− −−− −−− −−− Europe Univ. of Brescia
CCD −−− −−− −−− −−− −−− SouthAm Univ. of Chile
−−− ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ SIN Asia Univ. of Electro-Communications
−−− −−− −−− ∗∗∗ ∗∗∗ SIN NorthAm Univ. of Illinois at U-C & NEC Labs America
∗∗∗ ∗∗∗ KIS −−− −−− −−− Europe Univ. of Klagenfurt
∗∗∗ ∗∗∗ −−− ∗∗∗ −−− SIN Europe Univ. of Marburg
∗∗∗ ∗∗∗ ∗∗∗ −−− ∗∗∗ SIN Africa Univ. of Sfax
−−− −−− ∗∗∗ −−− ∗∗∗ SIN Asia Waseda Univ.
∗∗∗ INS ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ Asia Xi’an Jiaotong Univ.
∗∗∗ −−− KIS ∗∗∗ ∗∗∗ ∗∗∗ NorthAm York Univ.
Task legend. CCD:copy detection; INS:instance search; KIS:known-item search; MED:multimedia event detection;
SED: surveillance event detection; SIN:semantic indexing; − − −:no run planned; ∗ ∗ ∗:planned but not submitted

Table 3: Instance search pooling and judging statistics

% % %
Max.
total unique judged
Topic Total Unique result Number Number
that that that
number submitted submitted depth judged relevant
were were were
pooled
unique judged relevant
9001 36874 18349 49.8 150 3514 19.2 61 1.7
9002 36779 18153 49.4 100 2514 13.8 28 1.1
9003 36828 18800 51.0 150 3730 19.8 140 3.8
9004 35507 17745 50.0 160 3897 22.0 140 3.6
9005 36836 19423 52.7 100 2706 13.9 36 1.3
9006 37578 18904 50.3 120 3066 16.2 27 0.9
9007 38722 19965 51.6 100 2712 13.6 14 0.5
9008 36850 19544 53.0 260 6397 32.7 135 2.1
9009 37515 18797 50.1 130 3381 18.0 75 2.2
9010 36457 19531 53.6 120 3165 16.2 68 2.1
9011 36256 16508 45.5 100 2333 14.1 4 0.2
9012 36476 19287 52.9 170 4376 22.7 174 4.0
9013 37230 18276 49.1 100 2367 13.0 25 1.1
9014 38082 17974 47.2 100 2583 14.4 15 0.6
9015 38126 18173 47.7 140 3286 18.1 80 2.4
9016 35811 16873 47.1 160 3412 20.2 27 0.8
9017 35540 16249 45.7 110 2335 14.4 20 0.9
9018 37201 16996 45.7 120 2795 16.4 52 1.9
9019 32932 15374 46.7 100 1925 12.5 6 0.3
9020 36035 18134 50.3 100 2254 12.4 38 1.7
9021 35162 16496 46.9 170 3622 22.0 28 0.8
9022 38585 18207 47.2 100 2400 13.2 15 0.6
Table 4: 2010 Participants not submitting any runs

Task Location Participants


−−− −−− −−− −−− −−− ∗∗∗ NorthAm AKiiRA Media Systems Inc.
−−− −−− −−− ∗∗∗ ∗∗∗ −−− Asia Beijing University of Post-Telecom.-MCU
∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ NorthAm Binatix Inc.
−−− −−− ∗∗∗ ∗∗∗ ∗∗∗ −−− Asia Chinese Academy of Sciences - AIBT
∗∗∗ −−− −−− −−− −−− −−− NorthAm CMART Systems
−−− ∗∗∗ ∗∗∗ −−− −−− ∗∗∗ Europe Commissariat á l’Energie Atomique, LIST
−−− ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ NorthAm CompuSensor Technology Corporation Address
−−− −−− ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ Europe Consorzio Milano Ricerche
∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ −−− NorthAm Florida Atlantic University
−−− ∗∗∗ ∗∗∗ ∗∗∗ −−− −−− NorthAm Harvard
∗∗∗ −−− −−− −−− ∗∗∗ −−− Asia Hong Kong Polytechnic University
−−− ∗∗∗ −−− −−− ∗∗∗ −−− Europe Imagelab - University of Modena and Reggio Emilia
−−− ∗∗∗ −−− −−− ∗∗∗ ∗∗∗ NorthAm ITT Geospatial Systems
−−− ∗∗∗ ∗∗∗ −−− −−− ∗∗∗ Asia Kobe University
∗∗∗ −−− −−− −−− −−− −−− Asia Korea Advanced Institute of Science and Technology
−−− ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ Asia Mangalore University
−−− −−− −−− −−− −−− ∗∗∗ Asia National Cheng Kung University
−−− ∗∗∗ ∗∗∗ −−− −−− ∗∗∗ Europe Open University
∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ Asia Peking University-ICST
−−− −−− ∗∗∗ ∗∗∗ −−− −−− Europe Politecnico Di Milano
−−− ∗∗∗ −−− −−− −−− −−− Austrail RMIT University School of CS&IT
∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ Asia Shanghai Jiao Tong Univresity-IICIP
−−− −−− ∗∗∗ −−− −−− −−− Asia Sichuan University of China
∗∗∗ −−− −−− −−− −−− −−− Asia Sun Yat-sen University - IST
−−− ∗∗∗ −−− −−− −−− −−− Asia Tsinghua University-THEEIE
−−− −−− −−− −−− ∗∗∗ ∗∗∗ Europe Universidad Autnoma de Madrid
−−− −−− −−− −−− ∗∗∗ −−− SouthAm Universidade Federal do Paran
−−− −−− ∗∗∗ −−− −−− −−− NorthAm University of South Carolina
∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ Europe University of Ioannina
−−− −−− ∗∗∗ −−− −−− −−− Asia University of Malaya
−−− −−− ∗∗∗ −−− −−− ∗∗∗ NorthAm University of Maryland
∗∗∗ ∗∗∗ −−− ∗∗∗ −−− −−− NorthAm University of Ottawa
∗∗∗ ∗∗∗ ∗∗∗ −−− −−− ∗∗∗ SouthAm University of Sao Paulo
−−− −−− −−− ∗∗∗ −−− ∗∗∗ Europe University of Sheffield
∗∗∗ −−− −−− −−− −−− ∗∗∗ Europe University of Surrey
∗∗∗ ∗∗∗ −−− ∗∗∗ −−− ∗∗∗ Asia Zhejiang University
Task legend. CCD:copy detection; INS:instance search; KIS:known-item search; MED:multimedia event detection;
SED: surveillance event detection; SIN:semantic indexing; − − −:no run planned; ∗ ∗ ∗:planned but not submitted

You might also like