How The Brain Interprets Complex Visual Scenes Is An Enduring Mystery For Researchers

How the brain interprets complex visual scenes is an enduring mystery for researchers.
This
process occurs extremely rapidly - the "meaning" of a scene is interpreted within 1/20th of a
second, and, even though the information processed by the brain may be incomplete, the
interpretation is usually correct.
Occasionally, however, visual stimuli are open to interpretation. This is the case with
ambiguous figures - images which can be interpreted in more than one way. When an
ambiguous image is viewed, a single image impinges upon the retina, but higher order
processing in the visual cortex leads to a number of different interpretations of that image.
Only one of these interpretations is available to our conscious awareness at any one time.
Repeated viewing of the image leads to perceptual reversal, whereby first one, and then the
other, interpretation is perceived. For psychologists and neuroscientists, ambiguous
figures provide a means by which the functioning of the human visual system can be
investigated.
Salvador Dali's 1940 painting Slave Market with the Disappearing Bust of Voltaire(top) is
an example of an ambiguous figure. In this painting, the two nuns just left of centre can also
be perceived as the bust of the French writer and philosopher Voltaire. When looking at the
painting, our perception of the painting switches from one interpretation to the other.
In a study published in 2002, Lizann Bonnar, then at the University of Glasgow, and her
colleagues, investigated the stimuli which drive perception of the visual scene depicted in
Dali's painting. Participants were presented with a cropped greyscale version of the painting,
consisting solely of the area containing the nuns. A "bubble" filter was used to enhance or
obscure certain features of that part of the painting. They found that the participants
reported seeing the bust of Voltaire when the finer details of the painting were obscured, and
reported seeing the nuns when large scale features were obscured.
This experiment showed the importance of scale information in perception. The researchers
specifically manipulated the spatial resolution of the painting (that is, the periodicity with
which image intensity changes). Large scale features change little over a given distance, and
therefore have a low spatial resolution, while fine-grained features change much more over
the same distance, and so have a high spatial resolution.
In a second experiment, the participants were shown random noise patterns before the
cropped greyscale painting. One group was shown a pattern with a high spatial resolution,
the other a pattern with a low spatial resolution. Afterwards, the former reported seeing the
bust of Voltaire, while the latter reported seeing the nuns. This showed that previous
experience is an important factor in perception. The participants had selectively perceived
the frequency channels presented to them before they viewed the image.
Aude Oliva, head of the Computational Visual

Cognition Laboratory at the Massachusettes Institute of Technology, has been using a similar
approach to gain a better understanding of the processing of information in the visual cortex.
For more than 10 years, Oliva and her colleagues have been creating and using hybrid
images that consist of two superimposed images, both of which have been altered with
specialized filtering software.
Using these filters, sharp facial features, such as wrinkles and other blemishes, are removed
from one image, and coarse features, such as the shape of the mouth or nose, are removed
from the other. The two images are then superimposed; because features with a high spatial
frequency are visible only from up close, and those with low spatial frequencies are only
visible from further away, superimposition of the two produces a single image whose
perception changes as a function of viewing distance.
Thus, the hybrid is a single image with two stable percepts; at a given distance, only one of
the images is visible, and it is this image that dominates processing in the visual system; the
other image is perceived as something lacking internal organization (noise).
Above is an example of the hybrid images created by Oliva's group. From up close, the image
is perceived as Albert Einstein, because only the sharp features are visible; but if you step a
few metres away from the monitor, the blurred features become visible, and the image of
Marilyn Monroe emerges.
Oliva's group has been using this and similar images to investigate the role of different
frequency channels for image recognition, and the time course over which this process
occurs. What they have found is that when participants are shown hybrid images for
durations of 30 milliseconds, they only recognized the low spatial resolution component of
the image; when the images were displayed for 150 milliseconds, they only recognized the
high spatial resolution component; In both cases, the participants were oblivious to the other
interpretation of the image.
Participants were also shown hybrid images consisting of sad and angry faces (high and low
spatial resolution, respectively) of superimposed male and female faces. When the images
were displayed for 50 milliseconds, and the participants were asked to determine the
emotion of the face they had seen, they always reported seeing an angry face; but when asked
to determine the sex of the person in the image, they reported seeing a male as often as they
reported seeing a female, although the two faces had different spatial resolutions.
Thus, selection of frequency bands during fast image recognition appears to be flexible - in
some cases, the brain picks out characteristics with a low spatial resolution, while in others, it
discriminates those with a high resolution. It seems that the brain is adept at selecting the
frequncy band containing the most information relevant to a particular task. Again, the
participants were unaware that the images they viewed contained information in the other
frequency range.
The work carried out by Oliva's group shows that the brain extracts large-scale
features slightly earlier than fine-grained features. Large scale features are processed within
50 milliseconds, giving an overall impression of the visual scene. The processing of fine-
grained details begins slightly later, at around 100 milliseconds. The fine- and coarse-grained
features are extracted separately, and processed in parallel through different channels, in
successively higher order areas of the visual cortex. In a process called perceptual grouping,
the information from the channels is then seamlessly recombined at visual cortical areas of
the highest order to produce a coherent, and usually unambiguous, image.
Count the face
Nature Sadness Illusion
Illusion of man’s face with depressed looks.
Crying Illusion
Optical illusion of a man’s face in rocks with crying looks
Count The Number Of Blocks
Illusionary picture of number of blocks shown
Have you ever experienced that moment, when the drawing of theold witch transforms into a
young lady? Or stared at a Magic Eye picture of coloured dots to suddenly see a butterfly emerge?
Or that moment in the film The Matrix when Neo sees the data that makes the world in zeros and
ones?

How The Brain Interprets Complex Visual Scenes Is An Enduring Mystery For Researchers

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

How The Brain Interprets Complex Visual Scenes Is An Enduring Mystery For Researchers

Uploaded by

Copyright:

Available Formats

How the brain interprets complex visual scenes is an enduring mystery for researchers.

Aude Oliva, head of the Computational Visual

You might also like