2007, Video Recog. Systems - Rail & Transit

Video Recognition Systems Video Technology for security:
Problems and Solutions

http://vrs.iit.nrc.ca
www.perceptual-vision.com
Dr. Dmitry Gorodnichy
“Computer Vision allows computers to see. Rail and Urban Transit Security Workshop
Perceptual Vision allows computers to understand what they see. ” Montreal
November 2007
Outline History of Video

Approaches developed
Intro: Video Technology (VT)

Video recognition systems:
– History of VT
– NRC/IIT Video Recognition Systems project
when became possible to process video frames fast (>12 fps)
– VT within GoC (VT4NS meeting) XXI century
– Important VT facts: What vendors don’t tell
Part 1: On Intelligent Surveillance Digital video systems Computer Video
– Need-to-know facts: status quo, real challenges, real solutions
Pattern Vision Recognition
• On “motion-detection”
• Next-generation surveillance: object-detection based Recognition
– Introducing ACE Surveillance™ (Annotated Critical Evidence)
– Case study/Demo: NRC Commissioners use ACE Surveillance™ Analog video systems (capture/storage)
Part 2: On Video-based Face Recognition

– Very different from Photograph-based:
First video (motion picture):
• Different constraints, Different approaches, Different applications when became possible to display video frames fast (>12fps)
– Solutions/Demo: Recognizing actors in movies XX century
3. Video Recognition Systems (Dmitry Gorodnichy) 4. Video Recognition Systems (Dmitry Gorodnichy)
Video Recognition – Video Recognition
area of XXIst century Systems
• Aka
- Video Analysis and Content Extraction (VACE), • Started 2001 within NRC/IIT
- Intelligent Video, Smart Video, … – Formerly, as Perceptual Vision project
- Perceptual Vision • Do both research / services and development / licensing
• IS NOT about capturing data (better lenses, grabbers, coders, – Worked on Canadarm2 (2001-2)
transmitters), but about understanding captured data (better – Known for Nouse® (Nose as mouse) tool for disabled (2003-7)
theory) • Emphasis on Security & Surveillance since 2004
• Very young area and IS NOT: – Intelligent surveillance
• Pattern (Face) Recognition & Machine Learning – Face recognition from Video
• Computer Vision & Image Processing • Work with Industry, Academics & OGDs:
• Neurobiology & Biological Vision – Esp. CBSA, RCMP, DRDC.
– But requires a mixture of expertise in all of the above! • Partner of USA DTO / VACE program Å handout
(Disruptive Technology Office / Video Analysis and Content Extraction)
VRS Role VRS expertise

Social values
Other clients:
CBSA, RCMP Health
driver DRDC, DND/CF Media
• Object detection and tracking
CSIS, PPTC, CIC, PCO Education
impact papers, PSERC, TC, CATSA – Automated Teleoperator
conferences – ACE Surveillance™
CVPR, CRV
CPFC
Knowledge/ Technology/
Discovery Services
• Faces in Video
– Face detection, tracking
VRS IIT
– Face recognition from Video
NRC Acoustics / IMS • Other

Universities Flight Facility
OGDs – Image Search (Roth)
…
Industry – Marker-based tracking (Fiala)
7. Video Recognition Systems (Dmitry Gorodnichy) Industry 8. Video Recognition Systems (Dmitry Gorodnichy)
Events organized VT4NS’07 attendees
Since 2004: ** USA DTO/VACE (Disruptive Technology Office/Video Analysis and Content Extraction)
** NRC/IIT/Video Recognition Systems
IEEE-archived Intern. Workshops on + NRC/Administrative Services and Property Management Branch / Security Operations
Video Processing and Recognition (VideoRec’08 - in * NRC/ Institute for Aerospace Research/Flight Research Laboratory
* CRC (Industry of Canada, Communications Research Centre)/Advanced Video Systems
Windsor, May 27-30, 2008) ** CRIM (Computer Research Institute of Montreal)
Goal: Focus academic effort on newly emerged area. + CBSA (Canada Border Services Agency)/Laboratory and Scientific Services Directorate
+* RCMP/ Surveillance Technology Section / Covert Video (CV), Remote Sensing
Technologies (RST) and Special Purpose Vehicle (SPV) units
+ RCMP/ Technical Security Branch
Ottawa, June 5, 2007: +* DRDC/Automated Intelligent Systems/UAV
First federal departments meeting on +* DRDC/Network Information Operations Section
+* DRDC/Centre for Operations Research & Analysis (CORA)
Deploying Video Technologies for National Security +* CPRC (Canadian Police Research Center)
(VT4NS’07) + Transport Canada / Security Technology / Security and Emergency Preparedness
+ Office of the Privacy Commissioner of Canada
+ DND/Forces (several depts.)
Goal: Discuss the ways to synchronize the effort in developing VT * VT developers
+ VT users
solutions and setting VT standards for the new century within GoC.
VT4NS Links VT4NT Report

(to help translation only)
• NRC-Administrative Services and Property Management Branch (NRC-ASPM) Security
Operations ( link: http://www.nrc-cnrc.gc.ca/institutes/aspm_e.html),
• No national / regional VT program yet.
• NRC-Institute for Aerospace Research (NRC-IAR) Flight Research Laboratory ( link:
http://iar-ira.nrc-cnrc.gc.ca/flight_main_e.html). • Decisions influenced by vendors / short-term solutions
• Communications Research Centre Canada (CRC) Advanced Video Systems ( link:
http://www.crc.ca/en/html/crc/home/research/broadcast/advanced_video),
Æ No national standards for capturing /saving video data.
• Canada Border Services Agency (CBSA) Laboratory and Scientific Services Directorate (
link: http://www.cbsa-asfc.gc.ca/media/facts-faits/035-eng.html),
• E.g. over 30 different video systems deployed in Ottawa
• Royal Canadian Mounted Police (RCMP) Surveillance Technology Section: ( link: Æ No policy to handle evidence:
http://www.rcmp-grc.gc.ca/bc/lmd/surrey/content/services/fis_e.htm),
• RCMP Technical Security Branch ( link: http://www.rcmp-grc.gc.ca/tsb/) • E.g. is data original, not altered
• Defence Research & Development Canada (DRDC) Automated Intelligent Systems ( link:
http://www.drdc-rddc.gc.ca/researchtech/tis/activ2_e.asp), • Many local initiatives, not coordinated
• DRDC Network Information Operations Section ( link: http://www.ottawa.drdc-
rddc.gc.ca/html/NIO-102-section_e.html), – City of Calgary (traffic abnormalities detection with CCTV cams)
• DRDC Centre for Operations Research & Analysis ( link: http://www.drdc-
rddc.gc.ca/researchcentres_e.asp), – Cornwall Canada US border* (DVR). Pilot project #1 “port-runner”
• Transport Canada Security Technology ( link: http://www.tc.gc.ca/en/menu.htm),
• Office of the Privacy Commissioner of Canada ( link: http://www.privcom.gc.ca/), – Ottawa/Montreal Airports* (CCTV, PTZ DVR), …
• several National Defense and the Canadian Forces (DND/Forces) departments ( link:
http://www.forces.gc.ca/site/home_e.asp), •This is about to be changed (2007)
• Computer Research Institute of Montreal's Vision and Imaging Group (In French only) (
link: http://www.crim.ca/fr/index.html), •Follow the USA DTO/VACE model
• Canadian Police Research Center ( link: http://www.cprc.org/)
Facts to know Problems
(what VT vendors may not tell)
1. Video capture is no longer expensive or bad 1. Environment/Setup – light/weather, field of view …

– Composite video/RCA (CCTV analog) 2. Objects/Activities – non-collaborative actions
– USB2 cams and digitizers 3. Misconceptions (in interest of vendors)
– Firewire cams 1. The more, the better – NO
– Wireless & IP cameras 2. A human can see, so the system will (one day) – NO
– Multi-channel framegrabers for CCTV 3. “Baggage of the past”: using old tools for NEW problems
2. Beware of “high resolution” cameras 4. Real-time constraint – for “alarm” systems
1. It’s unlikely the real resolution 5. Resolution
2. It doesn’t help making video more “intelligent” 1. Video image is small: 720x480 NTSC)
2. Objects occupy small part: <1/8 of image
3. It’s Intelligence that’s missing But is resolution really a problem ?
More on resolutions & formats: Recorded from TV

video sources
(320 x 240 video)
Humans watch movies on TV without a problem…

Despite “bad” resolution + orientation, expression, occlusion
(Faces are 30x30 pixels!)
• NTSC: You don’t have problem recognizing people & activities
– Vert. res.(fixed) + active 487 (interlaced) out of 525
– Horiz. res.(variable) + 330 (TV), 210 (VHS) - Due to fuzziness!
– 60 half-frames / second
• VCD: 320x240 mpeg1 – for TV (VHS tape) PLAY

VIDEO
• DVD: 720x240 mpeg2 – most suited for digital recordings
Yet computers fail… - Is something wrong with computer
• HDTV: 1920x1080, but… for humans approaches?
it’s sound (DolbyDigital) not video that makes the difference! (Even on a studio taken video with perfect FOV and lighting!)
Two Big problems
1. Storage space consumption

Intelligent Surveillance: • Typical assignment:
2-16 cameras, 7 or 30 days of recording, 2-10 Mb per
problems & solutions min.
Î1.5 GB per day per camera / 20 - 700 GB total !
2. Data management and retrieval

• London bombing video backtracking experience:
“Manual browsing of millions of hours of digitized video from

thousands of cameras proved impossible within time-
sensed period”
[by the Scotland Yard trying to back-track the suspects]
Main bottleneck Intelligent Surveillance

Objective
This is now affordable: To replace / assist human personnel

• “highest picture quality and resolution”, To make video data manageable (esp. for long-term monitoring)
• “complete Pan/Tilt control”, To make surveillance affordable: time-wise, space-wise
• “powerful 44X Zoom",
• ``total remoteness", “ “For video surveillance to be operational,
• “multi-channel support of up-to 32 cameras", it is critical to store only that video data which is useful,
i.e. the data containing new evidence”.
• ``extra fast capture of 240 fps”
1. Evidence = objects, events of interest

but…
2. New = succinct and non-redundant.
• you may just not have time to browse it all in order to
detect the important information.
Possible only with video recognition!
Misconception about Noise & changes in video
“Motion-based” capture (demo)
• Term “Motion-based” is coined to make people believe – Changing light / weather (esp. in 24/7 monitoring)
that video recognition is happening, which is not! • Wind, precipitations
– Against sun/light, out of focus, blurred, thru glass
• It’s actually illumination-change-based, as it uses • Reflections, diffraction, optical interferences
simple pixel brightness comparison: – Image transmission, compression losses
| Bij(t) – Bij(t-1) | > N for K pixels Î “alarm”
– Which often happens not because of motion!
• Light changes
• Noise
– Especially: Outdoors & in long-term monitoring
Next-generation ACE Surveillance™

surveillance (Annotated Critical Evidence)
Solution: Definition: Critical Evidence Snapshot (CES) - video snapshot

- Do as much as possible Video Recognition in real-time that provides information that is both useful and new.
BEFORE saving video !
- Object-based surveillance (not change-based) ! Definition: ACE Surveillance - surveillance that deals with
extraction and manipulation of Annotated Critical Evidence.
Example: ACE Surveillance™ technology - Based on recent advances in object detection / tracking.
- Replaces video clips with annotated JPG images
– Compresses 1 Gb of video into 2 Mb of easy to browse still images
(can hold several years of evidence on a single computer).
– Shown annotations: size, velocity, colour of detected objects.
- Enables new Zoom-on-Evidence™ browsing
Object Detection and Motion-based capture
Tracking results
• Tested 24/7 in many outdoor and indoor setups

• On ordinary computer, in real-time with up to 8
cameras.
• Demo: 24 hours of monitoring outdoors

(long-term, low traffic) Å video
•Many captured snapshots are
useless: either noise or redundant
• Demo: Hockey players tracking indoor
•Without visual annotation, motion
(high-traffic, multiple fast-moving objects) Å video information is lost.
•Hourly distribution of snapshots is
not very useful
ACE Capture ACE Surveillance

Applications / Limitations
Ready to
1. For existing CCTV systems
• Works with stationery cameras only
2. For security desks with a computer
• Upto 8 cameras on a single (3GHz / 2Gb RAM) pc
NRC commissioner example:

•Each captured snapshot is useful. – Installed: January 2007.
•Object location and velocity shown
using graphical annotation – Archived of more than 6 months of evidence data.
•Hourly distribution of snapshots is – 2 entrances (ADT-installed CCTV cams) + USB webcam
indicative of what happened in at the desk
each hour, provides good
summarization of activities over
– Became an indispensable daily routine
27. Video Recognition Systems (Dmitry Gorodnichy)
long period of time. 28. Video Recognition Systems (Dmitry Gorodnichy)
Example: Example:
Monitoring in XX-th century Monitoring in XXI-st century
• In real-time mode: watch closely if alarm sounds.
• If away from desk: Last captured CES shows whether anything

happened. Then play-back all CES-es.
A dedicated officer has to look at the monitors at all times.
If he is away / looked elsewhere, an event may pass unnoticed. • In archival mode: “zoom on evidence” – zoom on a day, on
hour, then on event - point and click (for high res as needed)
Zoom-on-the-evidence™ Demo
Browsing
Back Door Entry Delivery Entry
• Monitoring NRC premises with ACE Surveillance

On week-day
• Browsing data with Zoom-on-evidence ACE Browser

On week-end
Future trends
• In software (video recognition algorithms):

– Better object detection & tracking
Video-based
• For complex motions Face Recognition:
• For moving cameras
– Better annotation: activity recognition problems & solutions
• In hardware:
– Smart PTZ cams: PTZ on objects
– Smart IP cams: send only when/what is needed
– Video + hi/res photocamera / other sensors
– Synchronized cameras
• In mentality / logistics:
– More inter-department VT initiatives
– More constrained/proper setups and tasks
Why in video? Biometric modalities

summary
Hierarchy of affordability / applicability
of different biometrics modalities
(from NATO Biometrics workshop, Ottawa, Oct.2004)
Public level Unconstrained environment

Video-based information is
CCTV Æ
- most available
- least intrusive
Video provides:
- soft biometrics # bio-measurements # registered ids
- identification at distance
+
Passport Æ
Ready infrastructure
(CCTV already used
Detainee’s level Controlled environment
everywhere for surveillance)
Current situation: Intentional misconception?
computers fail, humans succeed
Face recognition systems performance

(from biometrics forums, NATO Biometrics workshop, Ottawa, Oct. 2004) Over last 5 years over $XX.XXX.XXX already spent on
100
applying face recognition to video data…
80
60 And what?
By humans
40
Face Recognition Vendor Test (www.frvt.org) is still seen: “in
By computers
making the video data of better quality”
20
0
In In Approaching NEW problem with OLD tools ?
photos video
Instead of developing approaches which can deal with low-
While humans easily recognize a person in video (with faces < 40 pixels!), resolution data
computer performance on video is much worse than that on photos!
Important Photos vs Video
Photos: Video:
- High spatial resolution - Low spatial resolution
Photographic facial data and video-acquired facial - No temporal knowledge - High temporal resolution
data are two different image-based modalities ( Individual frames of poor quality)
E.g. faces:
– different nature of data 1. in controlled environment 1. in unconstrained environment
(similar to fingerprint (in a “hidden” camera setup)
– different biometrics
registration) 2. don’t look into camera, don’t
– different approaches 2. “nicely” forced-positioned even face camera
– different testing benchmarks 3. 60 pixels IOD 3. 10-20 pixels IOD
(IOD = intra-ocular distance)
Face recognition in video requires

video-based framework Yet, for humans, video (even of this “low” quality) is often even
more informative than a photograph !
Canonical Face Model Face Detection results
used in passports
Adopted by ICAO’02 for • Psychological study: people recognize faces starting from
passport-type documents IOD > 10 pixels
(used in Canada, USA, EU)
• Good news (2002): computers can also detect faces
– with i.o.d >= 10 pixels
• One picture per person
– in poor illumination,
• IOD=60 (Width=120 pixels)
– with different orientations: +/- 45o
– different facial expressions
Used
- To store faces in databases
- In recognition algorithms
• But can it be used for video?

• Should it be used ? – motion rejects with spuriously detected faces
Canonical face model Proper approaches

suitable for video-based
recognition
1. Work on low-res images (as long as i.o.d.>10)
Proposed in 2004: 2. .IOD
i.o.d 2. Accumulate facial data over time
•IOD=12 pixel
•24 x 24 size is sufficient Good video-based recognition implies accumulation of
24 data over time!
• Is much easier to extract from video Anything based on a single frame won’t be good.
(by computers).
• Eyes can be automatically aligned. Types of multi-frame facial data fusion
• Many of these can be extracted for • Super-resolution
the same person, as s/he is being
• Neuro-biological (synaptic adaptation)
tracked.
• 3D face models
VRS technology Discussion
(person annotation in TV programs)
• IOD < 10, body/gait biometrics should be used

Problem: Recognize faces in 160x120 mpeg1 video
• IOD > 11, “some” face recognition from video can be performed:
– Accumulation over time techniques may, in some cases (many
shapshots under good angles), enable “1 to many” ICAO identification.
– Sufficient for many “1 to few” recognition tasks.
Good applications:
- Monitoring of limited-access premises
- Multiple-camera person tracking and backtracking,
- Verification (e.g. with access card)
Bottleneck:
- Angle of view, quality of video (see Introduction)
Demo: 98% real-time recognition of four people in a video clip. - General “1 to many (>1000)” recognition is still unresolved
Future Trend:
Approach: neural network based accumulation of low-res facial data
- Forced face registration (as in check-ins / “hidden” eye-level cameras)

2007, Video Recog. Systems - Rail &amp; Transit

Uploaded by

Copyright:

Available Formats

You might also like

2007, Video Recog. Systems - Rail &amp; Transit

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2007, Video Recog. Systems - Rail &amp; Transit

Uploaded by

Copyright:

Available Formats

Video Recognition Systems Video Technology for security:

Problems and Solutions

Outline History of Video

Intro: Video Technology (VT)

Part 2: On Video-based Face Recognition

VRS Role VRS expertise

NRC Acoustics / IMS • Other

VT4NS Links VT4NT Report

1. Video capture is no longer expensive or bad 1. Environment/Setup – light/weather, field of view …

More on resolutions & formats: Recorded from TV

Humans watch movies on TV without a problem…

• VCD: 320x240 mpeg1 – for TV (VHS tape) PLAY

1. Storage space consumption

2. Data management and retrieval

“Manual browsing of millions of hours of digitized video from

Main bottleneck Intelligent Surveillance

This is now affordable: To replace / assist human personnel

1. Evidence = objects, events of interest

Next-generation ACE Surveillance™

Solution: Definition: Critical Evidence Snapshot (CES) - video snapshot

• Tested 24/7 in many outdoor and indoor setups

• Demo: 24 hours of monitoring outdoors

ACE Capture ACE Surveillance

NRC commissioner example:

• In real-time mode: watch closely if alarm sounds.

• If away from desk: Last captured CES shows whether anything

• Monitoring NRC premises with ACE Surveillance

• Browsing data with Zoom-on-evidence ACE Browser

• In software (video recognition algorithms):

Why in video? Biometric modalities

Public level Unconstrained environment

Face recognition systems performance

Important Photos vs Video

Face recognition in video requires

• But can it be used for video?

Canonical face model Proper approaches

• IOD < 10, body/gait biometrics should be used

You might also like

2007, Video Recog. Systems - Rail & Transit

2007, Video Recog. Systems - Rail & Transit

2007, Video Recog. Systems - Rail & Transit