Object Recognition: The Ventral Visual Processing Stream

1 Object Recognition
Object Recognition
The Ventral Visual Processing Stream

- occipitotemporal → temporal regions
- essentially synonymous with the inferotemporal cortex
o three segments: posterior, central and anterior inferotemporal cortex
o receives information flowing from the primary visual cortex via extrastriate regions
such as V2, V3 and V4
o Cells in posterior regions fire to relatively simple stimuli, cells further along in the
ventral stream fire to more complex and specific stimuli
- Receptive fields become larger as we move further along the ventral stream.
o Useful for object recognition because it can allow an object to be identified regardless
of ist size or where it is located in space
o Almost always include the foveal or central region of processing
- Cells in the ventral processing stream are often sensitive to colour
o important for figure ground separation
- Some evidence indicates a columnar structure in which nearby cells tend to respond best to
similar properties
- Clusters of cells within the ventral stream appear to code for particular visual categories (e.g.
faces or body parts)
Deficits in Visual Object Recognition

Visual Agnosia: Syndrome that deprives a person of the ability to use information in a particular
sensory modality to recognize objects, manifests in only one of the senses!
1. Apperceptive and Associative Agnosias
- Apperceptive:
o fundamental difficulty in forming a percept (= a mental impression of something
perceived by the senses)
o Rudimentary visual processing is intact, but person has lost the ability to connect this
basic visual information into an entity or a whole
o Little or no ability to discriminate between shapes or to copy/match them
o Caused by diffuse damage to the occipital lobe and the surrounding areas
- Associative:
o basic visual information can be integrated to form a meaningful perceptual whole, yet
that particular perceptual whole cannot be linked to stored knowledge
➔ Patient could copy a picture, but not draw the same picture from memory
o ability to perform easy perceptual grouping is intact, but there are subtle deficits in
more complex perception
o Caused by damage to the occipitotemporal regions of both hemispheres and
subadjacent white matter
2. Prosopagnosia
- = selective inability to recognize or differentiate among faces
- Tends to occur with damage to the ventral stream of the right hemisphere
(whereas visual agnosia for words tends to occur with damage to comparable regions of the
left hemisphere)
- Patients can determine that a face is a face (some aspects of high-level visual processing
seems to be intact), but have lost the ability to link a particular face to a particular person
- Compensation by relying on distinctive visual nonfacial information (hairstyle etc.)
- Acquired through neurological injury or „developmental prosopagnosia“
- Developmental prosopagnosia: Anterior portions of the temporal lobe are not as highly
activated by images of faces
- In some cases prosopagnostic patients show evidence of some degree of face recognition even
though they do not have conscious access to that information:
o P300 ERP response differed for familiar versus unfamiliar faces in one patient with
acquired prosopagnosia
o Evidence for implicit face knowledge in patients with developmental prosopagnosia
o Interference effect between familiarity of faces and performance in some patients
➔ Some patients retain information about faces in memory, although it is not
available in a way that allows for the explicit naming or categorizing of faces
3. Category-Specific Deficits in Object Recognition

- Category-specific deficit: A patient can also have trouble identifying a certain category of
objects, even though the ability to recognize other categories of items in that same modality is
undisturbed.
- Often not true agnostic deficits, but deficits within the semantic memory system
Theoretical Issues in Visual Object Recognition
1. Sparse vs. Population Coding for Objects
What kind of code does the Brain use to represent Objects?

a) Sparse Coding?
= small but specific group of cells responds to the presence of a given object
extreme form: grandmother cell theory („there is a particular cell in your ventral
processing stream for each object“)
➔ Very unlikely (what if that particular cell dies? How is newly learned
information stored?)
b) Population Coding = pattern of activity across a large population of cells codes for
individual objects
a. Would be more resilient than grandmother cell system, because losing a
couple of cells would not alter any given representation very much
b. Would not allow us to integrate the visual information with other, stored
information because all cells are needed to code for the visual information
c) Answer probably lies somewhere in-between, researchers are still trying to understand
what point along the continuum best describes how information is represented
2. The Problem of Invariance in Recognition
- Form-cue invariance: the brains categorization of an object is constant regardless of the form
of the cue that represents that object (e.g. painting, photo or logo of an apple)
- Perceptual Constancy: Objects can be recognized even if seen from different angles, at
different positions or sizes, and under different kinds of illumination
→ Our mental representation of objects seems to be fairly abstract and independent of the original
stimulus conditions
Adaptation method:
o Helpful in telling whether a particular brain region supports such abstract
representations of objects
o Participants become adapted to an item after looking at it for some time.
→ presentation of same object: brain activity remains at a low level
→ presentation of new object: brain activity increases
o If a brain region continues to show adaptation to a new depiction of the same type of
object, the brain region is trating that new depiction just like the old version to which
it was already adapted, and therefore it is showing evidence of form-cue-invariance
Lateral occipital complex (LOC)

o Located at the section of the ventral stream anterior to the retinotopically mapped
extrastriate areas such as V2 and V4
o More responsive to shapes than to textures
o Perceptual constancy across variations in size and location of the shape
o Exhibits a similar response to both line drawings and photographs of the same object,
indicating form-cue invariance
o Not selective for particular categories of objects
→ The LOC represents a stage in visual processing in which retinotopic representations are
transformed into relatively abstract shape representations that support recognition across
variation in size, precise form, and location
Position invariance:
- ability to recognize an object regardless of where it appears
- arises from ventral stream cells that have position preferences
Viewpoint invariance:
- Debate: do neural representations depend on the viewpoint from which the object is
seen? How does the brain take the two-dimensional information from the retina and
create a three dimensional representation so that it can be recognized from any
viewpoint?
Explanation 1 (David Marr): Explanation 2:
The brain creates a viewpoint - Recognition of objects depends on
independent three dimensional some kind of systematic integration or
representation of an object that is built interpolation of viewer-centered
up from two-dimensional information representations
1. Primal sketch: segments dark from - The system makes a guess about what
light regions, groups them together via an object might be, compares that to
gestalt principles stored representations of objects,
2. From the Primal Sketch, the visual measures the difference and generates a
system deduces the relative depth of different hypothesis if the match is too
different surfaces and edges, constructs poor
a representation of what parts of an
object are in front or behind
3. Finally, the system develops a full
three-dimensional, viewpoint-
independent representation
- Evidence supports some degree of viewpoint dependency in recognition: Brain uses

both viewpoint-independent and view-point-dependent codes in different subsets of
cells
- There is a progression along the ventral stream from viewpoint-dependence at earlier
stages to viewpoint-invariance at later stages
3. Feature-Based Versus Configural Coding of Objects

Integrating parts of objects into whole objects is an important function of the visual system and
requires additional processing beyond that of individual parts (remember apperceptive agnosia!)
a. Do we tend to rely more on individual features or on the way those features are
put together when we attempt to identify an object?
Features and configural information matter differently to the two hemispheres:
o Lesions to the temporal lobe of the left hemisphere:
disrupt the ability to perceive local features, but not global (holistic) features of
the item
o Lesions to the temporal lobe of the right hemisphere:
opposite effect
→ Two distinct neural mechanisms:
▪ one lateralized to the left ventral stream (important for analyzing the
parts of objects)
▪ one lateralized to the right ventral stream (important for analyzing
whole forms)
Configural information is more important for object categories for which we have a
lot of expertise:
Inversion effect: the typical disruption in performance/configural processing when
stimuli are turned upside down
Especially notable for…
• face recognition
• object categories the participants have expertise in
• same-race faces
b. How are individual features combined into whole shapes?

Conjunctive encoding Nonlocal binding
assumes that features are explicitly assumes that whole object is represented
conjoined through hierarchical simply by the co-activation of units that
processing in which lower-level regions represent the parts of the object in
representing features send their output particular locations. The whole is
to higher-level regions representing the perceived when the units representing all
shapes that result from the joining of the constituent parts are simultaneously
those features activated.
➔ Still debated, evidence tends to support the conjunctive coding model

4. Category Speficity: Are there specific neural modules within the ventral stream
specialized for recognizing specific categories of objects?
Evidence exists for at least some specialized modules:

• Fusiform Face Ares (FFA)
• Parahippocampal Place Area (PPA)
• Extrastriate Body Area (EBA)
• Visual Word Form Area (VWFA)
1. Does the brain treat faces differently than other objects?

o Researchers found face-selective clusters of cells & also cells that are selective to only
components/aspects of faces (e.g. eyes) in the right inferotemporal cortex of primates
o Double dissociations in Humans with Prosopagnosia/Agnosia for Objects confirm that
face recognition relies on a separate neural system from other object recognition
o Brain Imaging Studies also support this idea:
- Recordings of electronical potentials indicate that the brain responds
differently to an image that is perceived as a face versus some other object
- PET and fMRI studies: Ventral regions of the extrastriate cortex (later
referred to as the FFA) in the right hemisphere are critical for discriminating
among the physical features of faces
- Additional evidence indicates: recognition of faces does not rely only on the
FFA, but a variety of regions within the ventral stream region,
▪ e.g. also the occipital face area (OFA)
▪ Anatomatical imaging studies:
strong anatomical connection between FFA and OFA
▪ Superior temporal sulcus (STS):
most important for processing the changeable aspects of the face (eye
gaze, expression, lip movement,…), sensitive to facial gestures,
important role in interpreting movements that have social significance
➔ Several areas within occipital and ventral temporal regions are involved in face
processing, with the right hemisphere playing a predominant role
- Posterior regions: create configural representation of a face and extract the
invariants of the face that make it unique
2. Why are faces special? Are they really special?

- Hypothesis: The FFA might code for configural processing and we are just highly
experienced with faces:
o Expertise with a particular object category leads to increased activity in
the FFA
o Activation in the FFA increases as people become better trained to
recognize novel objects
- Critique: The used objects in these findings are face-like (cars, birds,..), as they
have a configural structure!
- Answer: The FFA is innately specialized for processing faces, but also relied upon
for other stimulus categories when they become more and more face-like.
3. Bodies, Places and Words are special too

a. Bodies:
- Based on configural information too (just like faces, supported by inversion
studies)
- Neural responses to bodies in regions of the temporal lobe
- Multiple areas in the visual cortex are responsive to images of bodies:

o Extrastriate body area (EBA) in the occipitotemporal cortex
o Fusiform body area (FBA) located in the fusiform gyrus
(right next to the FFA → together they can represent a whole person)
b. Places:
- Parahippocampal place area (PPA) located in a vental cortical region
bordering the hippocampus: responds strongly to visual scenes such as
landscapes, rooms, houses, streets
c. Visually presented words:
- Visual word form area (VWFA) in a ventral stream region in the left
hemisphere
- The VWFA must gain ist specialization due to experience (whereas the areas
mentioned above are innately programmed)
The ventral stream has functional units in order to represent critical kinds of objects, but some
researchers propose that the processing of visual objects occurs in a more distributed mannor
across the entire ventral visual processing stream: specific object categories might each be
associated with a specific pattern of activation across this wide expanse of neural tissue
Object Recognition in Tactile and Auditory Modalities

There are similarities in brain organization across modalities (touch, sound, vision):
- E.g. early cortical areas always code basic features, whereas higher-level areas are critical for
organizing sensations into representations of recognizable objects
- Lesions in higher-level regions → deficits in object recognition
1. Agnosias in Other Modalities
- Auditory Agnosia:
normal processing of auditory information but an inability to link that sensory information to
meaning
o Verbal auditory agnosia (or pure-word-deafness): words cannot be understood,
although the ability to attach meaning to nonverbal sounds is intact
o Nonverbal auditory agnosia: ability to attach meaning to words is intact, but the
ability to do so for nonverbal sounds is disrupted (e.g. for a car horn, dog bark,.. )
o Mixed auditory agnosia: ability to attach meaning to both verbal and nonverbal
sounds is affected. Ability to hear the sounds is intact, they are not deaf!
- Somatosensory Agnosia:
Person is unable to recognize an item by touch but can recognize the object in other modalities
o One type: inability to use tactile information to create a percept
o Second type (sometimes called tactile asymbolia): percept is more or less intact but
cannot associated with meaning
2. Tactile Object Recognition

Relevant brain regions:
- Secondary somatosensory regions
- Insula
- LOC (codes object properties that are shared between vision and touch but not audition)
3. Auditory Object Recognition

Higher levels of the auditory cortex are crucial in auditory pattern recognition (e.g. water
dripping, egg cracking,…)
- Animal vocalization: superior temporal gyrus (bilaterally)
- Tool sounds: numerous areas in the left hemisphere, including motor areas
- Voices: superior temporal lobe
- Voices of familiar people also activate the FFA more than the voices of unfamiliar people
4. What Versus Where Across Modalities

The distinction between „what“ and „where“ pathways seems to be a basic organizational
feature that transcends any specific modality

Object Recognition: The Ventral Visual Processing Stream

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Object Recognition: The Ventral Visual Processing Stream

Uploaded by

Copyright:

Available Formats

1 Object Recognition

The Ventral Visual Processing Stream

Deficits in Visual Object Recognition

3. Category-Specific Deficits in Object Recognition

Theoretical Issues in Visual Object Recognition

1. Sparse vs. Population Coding for Objects

What kind of code does the Brain use to represent Objects?

2. The Problem of Invariance in Recognition

Lateral occipital complex (LOC)

- Evidence supports some degree of viewpoint dependency in recognition: Brain uses

3. Feature-Based Versus Configural Coding of Objects

b. How are individual features combined into whole shapes?

➔ Still debated, evidence tends to support the conjunctive coding model

Evidence exists for at least some specialized modules:

1. Does the brain treat faces differently than other objects?

2. Why are faces special? Are they really special?

3. Bodies, Places and Words are special too

- Multiple areas in the visual cortex are responsive to images of bodies:

Object Recognition in Tactile and Auditory Modalities

1. Agnosias in Other Modalities

2. Tactile Object Recognition

3. Auditory Object Recognition

4. What Versus Where Across Modalities

You might also like