Saccade-Confounded Image Statistics Explain Visual Crowding

a r t ic l e s
Saccade-confounded image statistics explain

visual crowding
Anirvan S Nandy1,3 & Bosco S Tjan1,2
Processing of shape information in human peripheral visual fields is impeded beyond what can be expected by poor spatial
resolution. Visual crowding, the inability to identify objects in clutter, has been shown to be the primary factor limiting shape
perception in peripheral vision. Despite the well-documented effects of crowding, its underlying causes remain poorly understood.
Given that spatial attention both facilitates learning of image statistics and directs saccadic eye movements, we propose that
the acquisition of image statistics in peripheral visual fields is confounded by eye-movement artifacts. Specifically, the image
statistics acquired under a peripherally deployed spotlight of attention are systematically biased by saccade-induced image
© 2012 Nature America, Inc. All rights reserved.
displacements. These erroneously represented image statistics lead to inappropriate contextual interactions in the periphery and
cause crowding.
In humans, the central 2° of the visual field is extensively represented on the target in that an outward flanker, that is, the one more eccentric
in both the retina (with the highest cone density in the fovea) and than the target, has a greater crowding effect than an equally spaced
the primary visual cortex (V1), and is therefore well suited for tasks inward (less eccentric) flanker8. We refer to this as inward-outward
that require resolving fine visual details. Saccadic eye movements asymmetry. Third, the crowding zone is not circular, but is instead
allow the visual system to bring objects of interest to the central visual markedly elongated along the radial axis so that radially positioned
field (that is, to foveate). The rest of visual space falls on the peri flankers produce more interference than laterally (that is, tangentially)
pheral retina. Shape perception, or form vision, suffers in the visual positioned flankers9. We refer to this as radial-tangential anisotropy.
periphery. Among the various deficits in peripheral form vision, per Any viable model of crowding must reproduce these well-defined
haps the most disruptive ones are those attributable to visual crowd properties of the crowding zone.
ing. Crowding is the inability to recognize target objects in clutter Previous studies have offered explanations for some, but not all, of
(Fig. 1a). In the periphery, surrounding objects (flankers) that are these characteristics. Bouma’s law on scaling has been addressed in
within a critical distance of the target impair target identification. terms of ‘combining fields’ that are implemented by a fixed number
This deficit cannot be explained by the lower spatial resolution in of cortical neurons irrespective of eccentricity10. The inward-outward
the periphery. In normally sighted individuals, crowding is less con asymmetry has been explained in terms of asymmetric cortical dis
sequential because it is compensated by foveating saccades. However, tances of near and far flankers that are otherwise equidistant from
npg
crowding is detrimental for individuals who do not have a functioning the target in visual space11. It has also been speculated that ‘ecologi
fovea, as they must rely on their peripheral visual fields for everyday cal factors’, including optic flow and saccadic eye movements, might
tasks, such as reading and object recognition. underlie the inward-outward asymmetry12. Currently, there is no
There has been persistent, but unresolved, debate about the neural satisfactory explanation for radial-tangential anisotropy. Past studies
underpinnings of crowding (see ref. 1 for a review) since Bouma for have chosen to incorporate anisotropy as an assumption in their
mally described crowding four decades ago2. Many theories invoke models13. Moreover, no existing model of crowding can simultane
some form of pre-attentive processing in the early stages of visual ously account for all three spatial characteristics of the crowding zone
processing, such as inappropriate feature integration 3–5 and posi much less explain the possible neural underpinnings of crowding.
tional averaging6, as the underlying cause of crowding. Others claim We propose such a unified model, with all but one of the parameters
a lack of spatial resolution in the attentional mechanism itself as the constrained by anatomical and behavioral data from studies unrelated
primary cause7. to crowding, and provide testable predictions of the model, some with
The crowding zone, the spatial extent over which flankers affect pertinent clinical implications.
target identification, exhibits several robust characteristics (Fig. 1b). Statistical regularities of the visual environment are thought to be
First, the size of the crowding zone scales linearly with eccentricity. important for shaping the connectivity and the response properties
Along the radial axis (the line connecting the fovea to the target), the of the visual cortex14. Receptive field properties of neurons in V1
crowding zone extends roughly to half the target eccentricity2. This is can be derived from the statistics of natural images15,16. The response
referred to as Bouma’s law. Second, flankers have an asymmetric effect of a V1 neuron further depends on the context surrounding the
1Department of Psychology, University of Southern California, Los Angeles, California, USA. 2Neuroscience Graduate Program, University of Southern California,
Los Angeles, California, USA. 3Present address: Systems and Computational Neurobiology Laboratories, The Salk Institute for Biological Studies, La Jolla, California,
USA. Correspondence should be addressed to A.S.N. (nandy@salk.edu) or B.S.T. (btjan@usc.edu).
Received 20 September 2011; accepted 17 November 2011; published online 8 January 2012; doi:10.1038/nn.3021
nature NEUROSCIENCE VOLUME 15 | NUMBER 3 | MARCH 2012 463

a r t ic l e s
Figure 1 Characteristics of crowding in peripheral vision. a

(a) Demonstration of crowding: when fixating on the red ‘–’ it should be
s +asn
easy to identify the letter s on the left; the equidistant s on the right,
which is flanked (crowded) by other letters, is much harder to identify.
When fixation is shifted to the green ‘+’, the formerly crowded s becomes b 10° 5° 2.5°
easier to identify. Image is adapted from ref. 10. (b) The extent of
crowding (crowding zone, orange polygons) can be estimated by measuring
a
ve
Fo
target-identification performance at peripheral locations (demarcated
by *) with flankers placed at various relative positions around the target.
The estimated zones have three robust signatures: they scale up linearly
with eccentricity of the target (Bouma’s law), they are markedly elongated
along the axis connecting the target to the fovea (radial axis), and when
tested with a single flanker (green dashed contour), as opposed to a pair
(orange solid contours), flankers that are more eccentric than the target
in
d
are more effective in crowding the target than are flankers that are less
eccentric. Image is adapted from ref. 9.
Radial
t
ou
d
neuron’s classical receptive field17. Such contextual interactions are
Tangential
mediated in part by anatomical connections extending laterally across
multiple cortical columns18. The patterns of these lateral connec
tions suggest that orientation statistics of natural images have shaped
their formation19.
We argue that the acquisition of the orientation statistics of
natural images in peripheral vision is confounded by eye movements.

Specially, we propose that the same spatial attentional mechanism Another important role of spatial attention is to drive saccadic eye
that directs gaze and helps to acquire relevant image statistics in movements. Covert shifts of spatial attention to salient objects in the
central vision causes an acquisition of misrepresented image periphery are typically followed by a saccadic eye movement that
statistics in peripheral vision. Given the fundamental impor brings the fovea to the peripheral target23 (Fig. 2a). If we assume
tance of orientation statistics in form vision20, these erroneously that the onset of the saccade happens before the retraction of the
represented image statistics would, in turn, lead to contextual inter attentional spotlight, a critical difference between central and periph
actions in the periphery that are inappropriate for form vision and eral vision emerges: the window of temporal overlap between spatial
cause crowding. attention and saccade-produced image displacement is present in the
Attending a spatial location promotes neural responses and periphery, but not in the fovea (Fig. 2b).
enhances contextual effects at the attended location21. Spatial atten Thus, the learning of orientation statistics at any particular periph
tion also mediates learning in the visual cortex22. We assume that eral cortical location in V1 will essentially be confounded by the
spatial attention is important for promoting the learning of image saccade-produced image displacement, which is not part of the natural
statistics at the attended location and facilitating the formation of scene. This should cause an overestimation of repeated patterns along
cortical connectivity that conforms to these statistics. By image statis the direction that connects the saccade target in the periphery to the
tics, we specifically refer to the pair-wise statistics of oriented edges, fovea (radial direction). We refer to such misrepresented statistics
which we assume are encoded in terms of the functional weights of as saccade-confounded image statistics. These saccade-confounded
lateral connections. In other words, if two nearby neurons tend to be statistics, if represented in lateral connections, would lead to inap
npg
correlated in their stimulus-evoked activity, then, under the spotlight propriate and radially biased contextual interactions in the periphery.
of attention, they are more likely to form long-lasting lateral connec The inappropriate contextual interactions would lead to crowding
tions that encode this correlation. and form the basis of an elongated crowding zone, with the long axis
pointing toward the fovea.
a
Fixation RESULTS
Here we present a quantitative model that implements our theory
on the anisotropic processing of image statistics in peripheral vision.
Attention t0 t1 t2+ t3
b t0 t1 t2 t3 Figure 2 The interaction of spatial attention and saccades. (a) Typical

sequence of fixation, covert deployment of attention to a salient object
in the periphery and subsequent saccade to the attended spot. All
Temporal modulation of attention
Fovea illustrations are in retinal coordinates. The arrow (third panel) shows the
direction of image displacement during the saccade. (b) Schematic of
temporal modulation of spatial attention at the fovea and at the peripheral
retinal location where covert attention was deployed. Because the saccade
is elicited by covert attention, temporal overlap between attention and eye
Periphery
movement is most likely to occur when attention is at the covertly attended
location in the periphery. We assume that at this peripheral location,
Time
during the time interval from the start of the execution of the saccade (t2)
Eye movement Image affected by till the next fixation (t3) (red box), attention and eye movement overlap
Saccade planning eye movement with a nonzero probability. The image displacement resulting from the
Deploying attention saccade confounds the image statistics acquired during the overlap.
464 VOLUME 15 | NUMBER 3 | MARCH 2012 nature NEUROSCIENCE

a r t ic l e s
Figure 3 Spatial consequences of isotropic lateral interaction zone in V1. a Lateral interaction zone b 9
10
(a) A simple geometry of V1 is assumed with cortical hypercolumns Reference 8
V1 7
arranged in a hexagonal mosaic. The receptive fields of the computational hyper-column 6
5
elements in the hypercolumns scale up linearly with eccentricity. 3
4
Each hypercolumn is assumed to have lateral (long-range horizontal) 6 2
5 1
connections in an isotropic neighborhood of hypercolumns on the cortex Receptive fields 4 Fovea
3
(lateral interaction zone). The radius of the neighborhood in cortical (visual space) 2
1
distance or, equivalently, the number of hypercolumns is independent Fovea
of eccentricity. (b) The extent of the lateral interaction zone is projected
to visual space for three reference hypercolumns at eccentricities 2°,
d = 0.1 × eccentricity + d0
4° and 6°. The radius of the zones is six hypercolumns as suggested by
several studies (Online Methods). (c) Half the end-to-end distances of c d
the interaction zones along the radial axis (the line joining the receptive 3.0
4
field center of the hypercolumn to the fovea) are plotted against the Bouma’s law
2.5
eccentricity of the corresponding reference hypercolumn. The dashed
0.5 × Dradial
line is the prediction of Bouma’s law2 (Fig. 1b). (d) The radial distance 2.0
from the receptive field center of a reference hypercolumn to the outer 1.5
extremity (dout) and to the inner extremity (din) of the interaction zone is
1.0
plotted against the eccentricity of the reference hypercolumn. That dout
is always greater than din (for nonzero eccentricities) explains the inward- 0.5
outward asymmetry. 0
0 1 2 3 4 5 6
6
Eccentricity (deg)
d 3 dout
Our model is based on three crucial and specific assumptions, the first
din
two of which have been well established. The first assumption is that 2
Distance (deg)
dout
the acquisition of image statistics occurs primarily at attended spatial 1
1 2 3 4 5 6
locations21,22. It is worth noting that the feedback connections from 0
Eccentricity (deg)
din
the secondary visual cortex (V2) to V1, which may mediate top-down 1 din dout
attention, have roughly the same anatomical spread (~6 mm in radius, 2 Dradial
independent of eccentricity) in V1 as do the lateral connections in
V1 (ref. 18). Thus, our second assumption is that the physiological about six hypercolumns (Online Methods), which is consistent with
footprint of spatial attention, be it defined by the spatial extent of the measured extent of lateral connections in V1 (ref. 18).
the lateral or the feedback connections, is constant in size (6 mm in Furthermore, when we split the end-to-end radial extent into two
radius) in V1 and is independent of eccentricity. Finally, we assume parts, the distance from the receptive field center of the reference
that spatial attention and any subsequent eye movement that it elicits hypercolumn to the outer and to the inner extremity of the radial
overlap in time; that is, the eyes move before the spotlight of attention extent, the asymmetry becomes readily apparent (Fig. 3d). If we
that elicited the eye movement is fully retracted. consider distances from the receptive field of the reference hyper
The underlying cortical architecture of our model consists of a column, the receptive fields at the outer extremity are farther away
mosaic of cortical ‘hypercolumns’ (Fig. 3a and Online Methods) in visual space than the receptive fields at the inner extremity, but
that are laterally connected. Each model hypercolumn consists of a the corresponding hypercolumns are equidistant from the reference
set of filters that extract orientation information from a local region hypercolumn on the cortex11.
of visual space (receptive field) in a fashion that is analogous to that As has been proposed previously10,11, we verified that the simple
npg
of orientation tuned neurons in a V1 cortical column. The recep assumption of constant-sized lateral interaction zones in V1 explains
tive fields of the hypercolumns tile visual space and their sizes scale both the properties of scaling (Bouma’s law) and the inward-outward
linearly with eccentricity24. We refer to the set of hypercolumns with asymmetry of the crowding zone. However, to explain the radial-
which a reference hypercolumn has lateral connections as the lateral tangential anisotropy of the crowding zone and to account for crowd
interaction zone (Fig. 3a). ing, we need to know the strength of the lateral interactions.
Geometry of lateral interactions Saccade-confounded image statistics

Our model assumes that the physiological footprint of the spotlight Our basic premise is that spatial attention serves a dual role of driving
of spatial attention and the lateral interaction zone of a reference saccadic eye movements and facilitating the learning of orientation
hypercolumn are approximately the same in V1. They are isotropic on statistics of the visual world. We further assume a temporal over
the cortex and independent of eccentricity. Assuming that the lateral lap between the deployment of spatial attention and the subsequent
connections in an interaction zone are modified under the spotlight saccade that it drives. To examine the nature of image statistics
of spatial attention, a geometric analysis of the footprint of spatial seen at a peripheral location under such conditions, we performed
attention or, equivalently, the lateral interaction zone, should reveal simulations of saccadic eye movements. The simulated system makes
the maximum spatial extent of crowding. saccades to different attended locations in the periphery.
The spatial extent of lateral interaction zones of constant cortical In the context of a visual scene, we measured pair-wise joint spik
size scales up with eccentricity (Fig. 3b). To quantify this result, we ing statistics (mutual information; Online Methods, equation (11))
calculated the end-to-end extent of the receptive fields along the radial between each of the oriented filters in a reference hypercolumn
axis (Fig. 3c). The coincidence with Bouma’s law is simply a result of and each of the oriented filters in neighboring hypercolumns in the
the linear scaling of the receptive fields with eccentricity and the corti lateral interaction zone (Fig. 4a). Such pair-wise statistics may deter
cal size of the interaction zone being independent of eccentricity. The mine the strength of lateral interactions between V1 neurons. In our
radius of the interaction zone that is required to match Bouma’s law is simulations, spatial attention (Fig. 2b) enables the learning of these

a r t ic l e s
Figure 4 Pair-wise image statistics. (a) Each

model hypercolumn (blue circles) consists a b Veridical statistics
d Saccade-confounded statistics
Tangential
of a set of eight oriented filters that extract
4
orientation information from a local patch of a
3
visual scene. The highlighted red circle shows
2
a reference hypercolumn whose receptive
1
field is centered at 2° in the peripheral field.
Fovea
The ensemble of circles depicts the extent of Radial
the lateral interaction zone of this reference.
Pair-wise mutual information (Online Methods,
equation (12)) was calculated between an
oriented filter in the reference hypercolumn Reference
hypercolumn
and each of the neighboring oriented filters
in the interaction zone. Two such filters in
c e
the reference hypercolumn were selected for
illustration: one of the filters (green) is oriented
along the radial axis, whereas the other (blue)
I (ΘR,i; ΘN,j)
is oriented along the orthogonal tangential
Pair-wise
axis. (b,c) True (veridical) statistics of the mutual
simulated visual environment: pair-wise mutual information
information between the reference filters (green

in b, blue in c) and all neighboring filters in the
interaction zone, gathered under a spotlight Neighbor
hypercolumn Low Mutual information High
of attention without eye movements. At each
neighboring location, the oriented thick and thin bars depict the oriented filters at this location. The colors of the oriented bars depict the magnitude of
the mutual information. For each hypercolumn, the oriented filter with the highest mutual information is highlighted with a thick line. (d,e) Saccade-
confounded statistics: pair-wise statistics of the same simulated visual environment gathered under the attentional spotlight during temporally
overlapping eye movements. Although the veridical statistics implicate smooth continuation of contours (illustrated with overlaid dashed circles in c),
the saccade-confounded ones favor repetition of co-oriented fragments (overlaid dashed lines in e). The time constant of the decay of spatial attention
(λ) was 16 ms for these simulations.
pair-wise joint statistics in the interaction zone at the attended loca inappropriate integration is elongated along the radial axis and has
tion. The time constant, λ, which represents the expected temporal the shape and anisotropic extent of the psychophysically measured
overlap between spatial attention and a saccade, is the only unde zone of crowding (Fig. 1b), with an aspect ratio between 1.54 and
termined parameter of our simulated system (others were set by 2.48, depending on eccentricity. Furthermore, we found a zone of
anatomical and behavioral data from previous studies unrelated to under-integration in the proximal neighborhood of the reference
visual crowding). hypercolumn and a zone of over-integration in the distal neighbor
The true (veridical) statistics of the simulated visual environment hood. This suggests that the process of inappropriate integration,
consist of co-circular patterns consistent with second-order orien which is thought to be the underlying cause of crowding, is twofold:
tation statistics of natural images17,20 and with patterns that have features from the target object are weakly bound, whereas features
been proposed in mathematical models25 (Fig. 4b,c). Such statistics, from the clutter surrounding the target are excessively bound. The
if encoded in the weights of the lateral connections, would facilitate qualitative shape of the zones (radial elongation, proximal under-
grouping of related features into continuous contours. In contrast, integration, distal over-integration) was preserved across moder
npg
saccade-confounded statistics (30,000 simulated saccades, λ = 16 ms; ate values of the parameter λ (4 or 8 ms; Supplementary Fig. 1).
Fig. 4d,e) deviate from true statistics in two ways. First, there is a However, the simulations suggest that a larger time constant (about
preference for co-orientation18,26. Second, the spatial extent of the 16 ms; Fig. 5) is necessary to match the spatial extent dictated by
mismatch between the veridical and the confounded statistics has a Bouma’s law. Notably, the learning process in our simulations is ideal
strong radial bias irrespective of the reference orientation. Essentially, ized and noise free. In a real neural system, the process of attention-
as a result of the saccade, a given representation of a visual pattern gated learning will essentially be subjected to noise and will reflect
under the attentional spotlight repeats itself across hypercolumns the frequency of exposure to the statistics.
whose receptive fields lie along the eye-movement trajectory.
Contextual interactions, if dominated by co-orientation, as in the Predictions of the theory
saccade-confounded statistics, should bias either the integration of We argue that the patterns of lateral connectivity in the cortex, and
features of similar orientation into a texture field or the estimation of therefore the nature and extent of crowding, reflect the patterns of
orientations to be the average of the surround. This provides a basis saccadic eye movements and the statistics of the visual world. Besides
for the finding that crowding appears to be a process of averaging27,28. providing a unified and coherent account of the existing data on
It also offers a rationale for the recent proposals that crowding is crowding, our theory also predicts that changes in the pattern of
caused by a peripheral visual system that encodes texture as opposed saccades or in the statistics of the visual world will lead to reorgani
to form information29,30 and justifies the future development of these zation in the patterns of lateral connectivity and to the shape of the
theories to include anisotropic texture processing. crowding zone. Here we offer several empirically testable predictions
To provide an estimate of the spatial extent and strength of the puta regarding the shape of the crowding zone in situations in which either
tive inappropriate feature interactions in the peripheral field result the pattern of saccades or the image statistics deviate from the normal
ing from the use of non-veridical image statistics, we computed the scenario that we have considered so far.
deviation of saccade-confounded statistics from veridical statistics Given that most saccades in humans have magnitudes of 15° or
(Fig. 5 and Online Methods, equation (13)). The overall region of less31, our theory predicts that the radial-tangential anisotropy

a r t ic l e s
10 Normalized arrangements (Supplementary Fig. 4a and Supplementary Note).

∆ mutual
9
information Data from six observers revealed that flankers oriented orthogo
8
nally to the radial axis had a greater extent of crowding irrespec
7
Increase
6
tive of their spatial arrangement, consistent with our prediction
5 (Supplementary Fig. 4b).
4 Many individuals with central vision loss as a result of age-related
3 macular degeneration develop the use of a stable retinal location in the
2
periphery for fixation during form-vision tasks. This is known as
1
the preferred retinal locus (PRL) and is typically located just outside
Fovea
the central scotoma. Given that the stable PRL is used for fixation,
saccadic eye movements for some of these individuals are radial with
respect to the PRL34 and not to the anatomical fovea, which is in the
scotoma. Under such circumstances, the visual system is exposed to
Decrease
PRL-centric saccade statistics, and our theory would predict that the
crowding zone measured at the PRL should no longer be elongated, as
the PRL no longer experiences a radial bias in eye movements, and that
the elongated axes of the crowding zones at other peripheral locations
should point toward the PRL (Supplementary Fig. 5). Preliminary
Figure 5 Zones of inappropriate integration. The pooled and normalized results measured with a scanning laser ophthalmoscope from indi
differences (Online Methods, equation (13)) between saccade-confounded viduals with age-related macular degeneration suggest that the zone
(Fig. 4d,e) and veridical (Fig. 4b,c) image statistics (mutual information) of crowding measured at the PRL is indeed circular (S.T.L. Chung &
between a reference hypercolumn and neighboring hypercolumns are Y. Lin, ARVO Abstr., 49:1509, 2008). Further studies are needed to
shown in visual space for three reference hypercolumns at 2°, 4° and 6°.
The color scale shows the magnitude and sign of the deviation from the
determine the shape of the crowding zones at non-PRL locations and
veridical statistics, indicative of inappropriate integration: shades of red to assess the time course of the predicted reorganization.
indicate that the mutual information between a reference hypercolumn
and an adjacent hypercolumn is higher in saccade-confounded statistics DISCUSSION
than in veridical statistics, implying over-integration; shades of blue Our model explains the qualitative differences in form vision between
indicate lower mutual information than the veridical, implying under- the fovea and periphery, as exemplified by visual crowding, with
integration. Elliptical fits (dashed lines at 40% of peak normalized
out having to postulate a specialized mechanism that is not shared
difference) illustrate the elongated shape of the spatial extent of
inappropriate integration. The time constant of the decay of spatial between central and peripheral vision. We began by assuming that
attention (λ) was set at 16 ms. lateral interaction zones in V1 are isotropic and constant in size on
the cortex. Specific interactions in the zones are learned under the
spotlight of attention, which overlaps in time with the subsequent
would be less pronounced for eccentricities beyond 15° and should saccadic eye movements that it elicits. We found that this minimal set
approach the aspect ratio predicted by the cortically isotropic lateral of assumptions can explain form vision deficits in peripheral vision.
interaction zone alone (about 1.05). The crowding zone in infants Specifically, we found that the scaling law and the inward-outward
should be similarly circular and defined mainly by the geometry of asymmetry of crowding are consequences of the extent of lateral con
the lateral interaction zone, as their visual systems would not have nections in V1 being isotropic and independent of eccentricity, and
had sufficient exposure to the biased statistics resulting from saccades. the sizes of the receptive fields of V1 neurons increasing linearly with
npg
Amblyopic individuals with strong foveal crowding32 are also likely eccentricity. The elliptical shape of the crowding zone can be caused
to have circular crowding zones at the fovea, where there should not by distorted image statistics encoded in lateral connections between
be a directional bias in the saccade statistics, particularly in cases of V1 hypercolumns. The distortion is attributable to the fact that spatial
anisometropic amblyopia in which there is no gaze offset between the attention facilitates the acquisition of image statistics at the attended
amblyopic and the fellow eye. retinal location and that there is temporal overlap between the dura
Prevailing differences in image statistics between the upper and tion of the spatial attention at a retinal location and the saccade that
lower visual fields33 should result in different shapes and spatial prop it elicits. Given that saccades in normal vision are generally radial
erties of crowding zones across the horizontal meridian. We made with respect to the fovea, the acquired image statistics are mostly
detailed measurements of the crowding zone (Supplementary Fig. 2 confounded in the radial direction.
and Supplementary Note) in the lower and upper visual field and Our quantitative results illustrate an important aspect of the anom
found evidence that the crowding zones in the upper visual field were alous contextual interactions underlying crowding that has not been
less elongated than those in the lower visual field (Supplementary fully explored empirically: diminished binding of target features35
Fig. 3 and Supplementary Table 1). This difference likely reflects resulting from proximal weakening of connectivity combined with
the greater incidence of oriented structure that is typically present in inappropriate and spurious binding of distracter features resulting
the lower visual field compared with the upper field. These statistics from distal strengthening of connectivity in the lateral interaction
would help drive the greater elongation in the lower field. zone. This dual nature of the binding deficiency explains our previous
Neurons in V1 typically respond most vigorously to moving stim finding with classification images that crowding reduces the use of
uli whose orientation is orthogonal to the direction of motion. Our valid features while increasing the number of invalid features used by
theory would predict a greater spatial extent of crowding for flankers the visual system5. Furthermore, the co-oriented connectivity pattern
oriented orthogonally to the radial axis as compared to those ori (Fig. 4d,e) suggests a texture-like processing of the peripheral field29,
ented in parallel to the radial axis. We measured the spatial extent rather than a Gestalt-like smooth contour integration process. We
of crowding for such oriented flankers in both radial and tangential surmise that such a texture-like representation of the peripheral field,

a r t ic l e s
although insufficient for accurate object identification, may serve a the lateral interaction zone (Fig. 3b), this anisotropy, with an average
useful purpose such that there is no ecological reason to impose a aspect ratio of 1.05, is insufficient to explain the human data (aspect
strict temporal separation between covert spatial attention and the ratio ≈ 2.2). This finding lends credence to our theory that crowd
subsequent saccade that it elicits. Suppression of detailed form infor ing originates in V1 as a result of extra-classical interactions. At the
mation from the vast expanse of the peripheral fields might prevent same time, our theory does not preclude the possibility that crowding
upstream object processing areas (for example, LOC) from getting occurs at multiple levels in the visual system47.
overloaded, whereas the texture-like representation may aid in the The V4 finding46 further suggests that the anisotropy in the crowd
detection of salient objects36. Three issues raised by our theory war ing zone cannot be a result of any anisotropy in the CMF along the
rant additional scrutiny: temporal overlap between attention and sac radial and tangential axes10. With the purported anisotropy in the
cades, saccadic suppression, and the neural loci of crowding. CMF, as was suggested in a functional magnetic resonance imaging
study48, a circle on the cortex will project to an ellipse in visual space,
Temporal overlap between attention and saccades but with the major axis along the tangential direction, orthogonal to
Although attention has been a highly active area of research, the the observed crowding zone.
temporal dynamics concerning the extinction of attention during
saccadic eye movements, as opposed to immediately before or after Conclusion
a saccade, has not been characterized. We assumed an exponential Form vision in the periphery is markedly degraded beyond its limited
decay function to model the temporal overlap between attention and spatial resolution, as demonstrated by the phenomenon of crowding.
a saccade and chose to parametrically explore the effect of varying We found that a small amount of temporal overlap between spatial
the time constant of the decay. Our simulation results indicate that attention and saccadic eye movements can cause the acquisition of
even moderate values of overlap between attention and saccadic eye erroneous image statistics by the neurons in the visual cortex that
movement, as little as 4 ms, are able to produce anisotropy in lateral serve peripheral vision.
connection weights. Electrophysiological and psychophysical experi The misrepresented statistics exhibit preferences for co-orientation
ments are needed to confirm the parameters of the overlap. and repetition and are spatially elongated along the radial axis. The
It is possible that there could be a small, but significant, temporal consequent contextual interactions would thus render object iden
overlap between spatial attention and eye movements, even at the tification against a cluttered background particularly difficult in
fovea. For example, this could happen if attention is divided between the periphery. The spatial extents of the inappropriate interactions
the fovea and the periphery. In this case, the periphery will continue dictated by our theory quantitatively match the observed size and
to exhibit the radial bias, whereas the bias at the fovea will essentially scaling of the zones of visual crowding.
be isotropic. This is consistent with the finding that interaction zones
in the fovea are approximately circular9. Methods
Methods and any associated references are available in the online version
Saccadic suppression of the paper at http://www.nature.com/natureneuroscience/.
One of the objections that could be raised to our model is that the
Note: Supplementary information is available on the Nature Neuroscience website.
phenomenon of saccadic suppression would prevent the retinal motion
blur from affecting the plasticity of the early visual cortex. There is con Acknowledgments
siderable debate about the mechanisms underlying saccadic suppres The authors would like to thank I. Biederman, A. Disney, J. Hirsch, M. Jadi,
sion; some have argued for an extra-retinal suppressive mechanism37, R. Millin and J. Reynolds for helpful comments on the manuscript. This research
was supported by US National Institutes of Health grant EY017707 (B.S.T.).
whereas others have argued for a visual-masking mechanism38.
Although both mechanisms might contribute toward saccadic
npg
AUTHOR CONTRIBUTIONS
suppression, albeit unequally39, there is little evidence of complete A.S.N. and B.S.T. developed the theory, designed the experiments and collected
suppression in the early visual cortex40. Instead there is a growing the data. A.S.N. analyzed the data and ran the model simulations. A.S.N. and B.S.T.
consensus that peri-saccadic stimuli are indeed processed by the early wrote the manuscript.
visual system41 and that these signals are prevented from reaching COMPETING FINANCIAL INTERESTS
awareness at a later stage in visual processing42. By attributing crowd The authors declare no competing financial interests.
ing to contextual interactions in V1, we allow crowding to be shaped
by retinal motion blur, yet remain consistent with the observation Published online at http://www.nature.com/natureneuroscience/.
that any such eye-movement induced motion is not perceived under Reprints and permissions information is available online at http://www.nature.com/
normal circumstances43. reprints/index.html.
Anisotropy and the neural loci of crowding 1. Levi, D.M. Crowding—an essential bottleneck for object recognition: a mini-review.
Area V2 has been proposed as a possible locus of crowding because Vision Res. 48, 635–654 (2008).
2. Bouma, H. Interaction effects in parafoveal letter recognition. Nature 226, 177–178
the scaling of its receptive fields with eccentricity matches that of the (1970).
crowding zones30; however, radial-tangential anisotropy is not evident 3. Levi, D.M., Hariharan, S. & Klein, S. Suppressive and facilitatory spatial interactions
in peripheral vision: peripheral crowding is neither size invariant nor simple contrast
in V2 receptive fields44 and the theory that implicated V2 did not masking. J. Vis. 2, 167–177 (2002).
address the issue of anisotropy. Area V4 has also been proposed 1 as a 4. Pelli, D.G., Palomares, M.C. & Majaj, N.J. Crowding is unlike ordinary masking:
result of the reported anisotropy in V4 receptive field size45. There is distinguishing feature integration from detection. J. Vis. 4, 1136–1169
(2004).
evidence that a V4 receptive field represents a convergence of infor 5. Nandy, A.S. & Tjan, B.S. The nature of letter crowding as revealed by first- and
mation from a circular patch of V1 (ref. 46). The observed asymmetry second-order classification images. J. Vis. 7, 1–26 (2007).
and anisotropy in a V4 receptive field is completely determined by the 6. Greenwood, J.A., Bex, P. & Dakin, S.C. Positional averaging explains crowding with
letter-like stimuli. Proc. Natl. Acad. Sci. USA 106, 13130–13135 (2009).
transformation of visual space according to the cortical magnifica 7. He, S., Cavanagh, P. & Intriligator, J. Attentional resolution and the locus of visual
tion factor (CMF) of V1. As illustrated in our geometric analysis of awareness. Nature 383, 334–337 (1996).

a r t ic l e s
8. Petrov, Y., Popple, A.V. & McKee, S.P. Crowding and surround suppression: not to 27. Greenwood, J.A., Bex, P.J. & Dakin, S.C. Crowding changes appearance. Curr. Biol.
be confused. J. Vis. 7, 1–9 (2007). 20, 496–501 (2010).
9. Toet, A. & Levi, D.M. The two-dimensional shape of spatial interaction zones in the 28. Parkes, L., Lund, J., Angelucci, A., Solomon, J.A. & Morgan, M. Compulsory averaging
parafovea. Vision Res. 32, 1349–1357 (1992). of crowded orientation signals in human vision. Nat. Neurosci. 4, 739–744
10. Pelli, D.G. Crowding: a cortical constraint on object recognition. Curr. Opin. (2001).
Neurobiol. 18, 445–451 (2008). 29. Levi, D.M. & Carney, T. Crowding in peripheral vision: why bigger is better. Curr.
11. Motter, B.C. & Simoni, D.A. The roles of cortical image separation and size in active Biol. 19, 1988–1993 (2009).
visual search performance. J. Vis. 7, 1–15 (2007). 30. Freeman, J. & Simoncelli, E.P. Metamers of the ventral stream. Nat. Neurosci. 14,
12. Petrov, Y. & Popple, A.V. Crowding is directed to the fovea and preserves only feature 1195–1201 (2011).
contrast. J. Vis. 7, 1–9 (2007). 31. Bahill, A.T., Adler, D. & Stark, L. Most naturally occurring human saccades have
13. van den Berg, R. & Roerdink, J.B.T.M. & Cornelissen, F.W. A neurophysiologically magnitudes of 15 degrees or less. Invest. Ophthalmol. Vis. Sci. 14, 68–69 (1975).
plausible population code model for feature integration explains visual crowding. 32. Levi, D.M., Song, S. & Pelli, D.G. Amblyopic reading is crowded. J. Vis. 7, 1–17
PLoS Comput. Biol. 6, e1000646 (2010). (2007).
14. Geisler, W.S. Visual perception and the statistical properties of natural scenes. 33. Bruce, N.D.B. & Tsotsos, J.K. A statistical basis for visual field anisotropies.
Annu. Rev. Psychol. 59, 167–192 (2008). Neurocomputing 69, 1301–1304 (2006).
15. Karklin, Y. & Lewicki, M.S. Emergence of complex cell properties by learning to 34. White, J.M. & Bedell, H.E. The oculomotor reference in humans with bilateral
generalize in natural scenes. Nature 457, 83–86 (2009). macular disease. Invest. Ophthalmol. Vis. Sci. 31, 1149–1161 (1990).
16. Olshausen, B.A. & Field, D.J. Emergence of simple-cell receptive field properties 35. Neri, P. & Levi, D.M. Spatial resolution for feature binding is impaired in peripheral
by learning a sparse code for natural images. Nature 381, 607–609 (1996). and amblyopic vision. J. Neurophysiol. 96, 142–153 (2006).
17. Kapadia, M.K., Ito, M., Gilbert, C.D. & Westheimer, G. Improvement in visual 36. Itti, L. & Koch, C. Computational modeling of visual attention. Nat. Rev. Neurosci.
sensitivity by changes in local context: parallel studies in human observers and in 2, 194–203 (2001).
V1 of alert monkeys. Neuron 15, 843–856 (1995). 37. Diamond, M.R., Ross, J. & Morrone, M.C. Extraretinal control of saccadic
18. Stettler, D.D., Das, A., Bennett, J. & Gilbert, C.D. Lateral connectivity and contextual suppression. J. Neurosci. 20, 3449–3455 (2000).
interactions in macaque primary visual cortex. Neuron 36, 739–750 (2002). 38. Campbell, F.W. & Wurtz, R.H. Saccadic omission: why we do not see a grey-out
19. Sigman, M., Cecchi, G.A., Gilbert, C.D. & Magnasco, M.O. On a common circle: during a saccadic eye movement. Vision Res. 18, 1297–1303 (1978).
natural scenes and Gestalt rules. Proc. Natl. Acad. Sci. USA 98, 1935–1940 39. Wurtz, R.H. Neuronal mechanisms of visual stability. Vision Res. 48, 2070–2089
(2001). (2008).
20. Geisler, W.S., Perry, J.S., Super, B.J. & Gallogly, D.P. Edge co-occurrence in 40. García-Pérez, M.A. & Peli, E. Intrasaccadic perception. J. Neurosci. 21, 7313–7322
natural images predicts contour grouping performance. Vision Res. 41, 711–724 (2001).
(2001). 41. De Pisapia, N., Kaunitz, L. & Melcher, D. Backward masking and unmasking across
21. Ito, M. & Gilbert, C.D. Attention modulates contextual influences in the primary saccadic eye movements. Curr. Biol. 20, 613–617 (2010).
visual cortex of alert monkeys. Neuron 22, 593–604 (1999). 42. Watson, T.L. & Krekelberg, B. The relationship between saccadic suppression and
22. Gilbert, C., Ito, M., Kapadia, M.K. & Westheimer, G. Interactions between attention, perceptual stability. Curr. Biol. 19, 1040–1043 (2009).
context and learning in primary visual cortex. Vision Res. 40, 1217–1226 43. Castet, E., Jeanjean, S. & Masson, G.S. Motion perception of saccade-induced
(2000). retinal translation. Proc. Natl. Acad. Sci. USA 99, 15159–15163 (2002).
23. Deubel, H. & Schneider, W.X. Saccade target selection and object recognition: 44. Gattass, R., Gross, C.G. & Sandell, J.H. Visual topography of V2 in the macaque.
evidence for a common attentional mechanism. Vision Res. 36, 1827–1837 J. Comp. Neurol. 201, 519–539 (1981).
(1996). 45. Piñon, M.C., Gattass, R. & Sousa, A.P. Area V4 in Cebus monkey: extent and
24. Motter, B.C. Crowding and object integration within the receptive field of V4 neurons. visuotopic organization. Cereb. Cortex 8, 685–701 (1998).
J. Vis. 2, 274 (2002). 46. Motter, B.C. Central V4 receptive fields are scaled by the V1 cortical magnification
25. Ben-Shahar, O. & Zucker, S. Geometrical computations explain projection patterns and correspond to a constant-sized sampling of the V1 surface. J. Neurosci. 29,
of long-range horizontal connections in visual cortex. Neural Comput. 16, 445–476 5749–5757 (2009).
(2004). 47. Whitney, D. & Levi, D.M. Visual crowding: a fundamental limit on conscious
26. Bosking, W.H., Zhang, Y., Schofield, B. & Fitzpatrick, D. Orientation selectivity and perception and object recognition. Trends Cogn. Sci. 15, 160–168 (2011).
the arrangement of horizontal connections in tree shrew striate cortex. J. Neurosci. 48. Larsson, J. & Heeger, D.J. Two retinotopic visual areas in human lateral occipital
17, 2112–2127 (1997). cortex. J. Neurosci. 26, 13128–13142 (2006).
npg

ONLINE METHODS between successive fixations in degrees of visual angle), T the duration and v(t)
Geometry of V1. We assumed a columnar architecture of the primary visual the velocity profile of a saccade. v(t) must satisfy the following conditions50
cortex (V1) with the cortical hypercolumns packed hexagonally in cortical space
(Fig. 2a). The computational units in each hypercolumn have receptive fields v(0) = 0
centered at the same retinal location, but each was tuned to different orientations
v(T ) = 0
and spatial scales. For our purposes, we were concerned mainly with the orienta (5)
tion tuning of these units. We used eight broadband oriented filters to extract
the corresponding orientation energy in the image patch under the receptive field
( )
v peak = v T2 = k A
p dv
(q = [0, 1, …, 7] × radians; Fig. 4a). The average diameter of the receptive fields =0
8 dt t = T
increase linearly with eccentricity with a slope of 0.1 (ref. 24). 2
We found that this geometry captures the logarithmic cortical magnification
as follows. Let d0 be the receptive field diameter (in degrees) at eccentricity φ = 0, where k is a constant. A sinusoidal velocity profile of the following form satisfies
s be the slope of the linear function relating receptive field diameters to eccentric the constraints in equation (5) in the range 0 ≤ t ≤ T
ity and γ be the proportion of receptive field diameter overlap between adjacent
hypercolumns (assumed constant across all eccentricities). In a V1 hypercolumn, pt
v(t ) = k A sin (6)
the eccentricity φ of an receptive field at the cortical position p, measured in units T
of hypercolumns (p = 1 represents the center of the fovea), is given by
T
Given that A = ∫ v(t ) dt , we have
0
a
f( p) = (b p −1 − 1) (1)
b −1 p (7)
T= A
2k
where α = d0(1 − γ ) and β = 1 + s(1 − γ ). For our model, d0 = 0.01, s = 0.1
(ref. 24) and γ = 0.3 (average value in ref. 49). Away from the immediate vicinity The distance traversed, D(τ), in time τ is therefore given by
of the center of gaze, equation (1) can be written in approximate form as

t
D(t ) = ∫ v(t ) dt
a 0
f( p) ≈ bp (2) (8)
b (b − 1) A 2kt 
=  1 − cos 
2 A
Inverting the function, we can express p as a function of φ as
The distribution of saccade amplitudes along the radial axis from the fovea
ln(f ) (3) was modeled as an exponential distribution31 with the following probability
p(f ) = −K 1
ln(b ) density function: f (x ) = le − l x, l = . The distribution along the iso-eccentric
axis was assumed to be uniform. 7 .6
where  a 
ln  Eye-movement simulations and image statistics. Using the distribution of
 b (b − 1)  .
K = saccade amplitudes and the corresponding velocity profile described above, we
ln(b )
simulated saccadic eye movements in which the visual stimulus presented to the
Equation (3) thus gives the logarithmic cortical magnification outside of the system was a random clutter of uppercase letters (Palatino font) at various sizes
immediate center of gaze. and orientations. For computational tractability, we calculated the outputs of the
If we assume the critical spacing in the visual field for crowding to be Df = bf , set of eight broadband oriented filters in each hypercolumn at discrete time points
where b is Bouma’s constant (about 0.5), then the critical spacing in the cortex is in the interval [0…T ]. Each filter measures the contrast energy along a given
orientation in the image patch that is in the receptive field of the hypercolumn.
npg
D p = p(f + Df ) − p(f ) Let r(t) denote the response of one of the filters at time t. The cumulative response
ln(f + D f ) − ln(f ) of the filter over the time course of the eye movement is
=
ln(b ) T −
t
(4)
=
ln(f + bf ) − ln(f ) r= ∑ r[t ]e l (9)
ln(b ) t =0
ln(1 + b) where the modulation of spatial attention during its overlap with saccadic eye
=
ln(b ) movement (Fig. 2b) is modeled as an exponential decay function with a time
constant λ, a free parameter of the model. Such a characterization captures the
Using b = 0.5, we get ∆p ≈ 6. That is, the critical spacing for crowding corre probabilistic distribution of the overlap period. For the purpose of calculating
sponds to six hypercolumns in V1, independent of eccentricity. This is consistent joint image statistics, the cumulative filter response, r, is first converted into a fir
with the anatomical extent of lateral (long-range horizontal) connections in V1 ing probability p with a saturating nonlinearity, with k as an arbitrary constant
(ref. 18) and with the estimated extent of combining fields in V1 (ref. 10) if each
hypercolumn is roughly 1 mm on the cortex. p = tanh(kr ) (10)
Conversely, equation (4) shows that, if a computational unit in a particular
hypercolumn has lateral connections to all computational units in neighboring Let Θi,R be a random variable associated with a filter with orientation θi in
hypercolumns up to an isotropic extent of a constant number of hypercolumns the reference hypercolumn R. Θi,R is equal to 1 if the cell has fired within a
on the cortex, then the resulting spatial interaction in the visual field must temporal window, else it is zero. For simplicity, the temporal window used in
follow Bouma’s law of linear scaling (Df = bf ). We refer to the set of hypercolumns our simulations was the entire duration of a saccade. The joint probability dis
(Fig. 2a) to which a reference hypercolumn (Fig. 2a) has lateral connections as tribution P(Θi,R, Θj,N), between the oriented filter in the reference hypercolumn
the lateral interaction zone. In our model, we set the radius of the lateral inter and another oriented filter in a neighboring hypercolumn (N) can be calcu
action zone equal to six hypercolumns. lated by accumulating and averaging the joint firing probabilities across many
eye-movement traces (30,000 in our simulations). To obtain robust estimates
Saccadic eye movements. For the eye-movement simulations, the saccadic veloc of the joint probability distribution, we used the bootstrap procedure. For any
ity profile was modeled as follows. Let A be the saccade amplitude (the distance saccade trace, the probabilities are accumulated only if both the reference and
nature NEUROSCIENCE doi:10.1038/nn.3021

the neighboring hypercolumn are under the spotlight of attention. Finally, the term of the normalized difference between saccade-confounded and veridical
statistical dependence between Θi,R and Θj,N can be calculated in terms of the mutual information
pair-wise mutual information ISC (R ; N) − I V (R ; N)
∆I (R ; N) = (13)
P (Θi , R , Θ j, N ) I V (R ; N)
I (Θ i , R ; Θ j , N ) = ∑ P (Θ i , R , Θ j, N ) log 2
P (Θi , R )P (Θ j, N ) (11) This normalized difference when plotted in visual space for all neighboring
Θ i , R ={0, 1}
Θ j , N ={0, 1} hypercolumns maps the amplitude and spatial extent of inappropriate feature
integration for a reference hypercolumn. Image features from hypercolumns with
The mutual information is zero when the two random variables are negative difference (reduced mutual information in the saccade-confounded
statistically independent. statistics as compared with the true statistics) would have weaker interactions
For a reference hypercolumn R, pooled mutual information (pooled across all and thus be only loosely bound to the reference features, leading to an under-
orientations) between R and a neighboring hypercolumn N is defined as integration of features. Conversely, features from hypercolumns with positive
difference (excessive mutual information in the saccade-confounded statistics)
ISC (R ; N) = ∑ ∑ ISC (Θ i , R ; Θ j, N ) would strongly influence the reference, leading to excessive and possibly errone
i j ous feature integration.
(12)
I V (R ; N) = ∑ ∑ I V (Θ i , R ; Θ j, N )
i j
49. Wilson, J.R. & Sherman, S.M. Receptive-field characteristics of neurons in cat
striate cortex: changes with visual field eccentricity. J. Neurophysiol. 39, 512–533
where ISC and IV are the pair-wise mutual information for the saccade- (1976).
confounded and veridical conditions respectively (equation (11)). We express 50. Lebedev, S., Gelder, P.V. & Tsui, W.H. Square-root relations between main saccadic
the gross difference between the saccade-confounded and veridical statistics in parameters. Invest. Ophthalmol. Vis. Sci. 37, 2750–2758 (1996).
npg
doi:10.1038/nn.3021 nature NEUROSCIENCE

Saccade-Confounded Image Statistics Explain Visual Crowding

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Saccade-Confounded Image Statistics Explain Visual Crowding

Uploaded by

Copyright:

Available Formats

a r t ic l e s

Saccade-confounded image statistics explain

nature NEUROSCIENCE VOLUME 15 | NUMBER 3 | MARCH 2012 463

Figure 1 Characteristics of crowding in peripheral vision. a

natural images in peripheral vision is confounded by eye movements.

b t0 t1 t2 t3 Figure 2 The interaction of spatial attention and saccades. (a) Typical

464 VOLUME 15 | NUMBER 3 | MARCH 2012 nature NEUROSCIENCE

Geometry of lateral interactions Saccade-confounded image statistics

nature NEUROSCIENCE VOLUME 15 | NUMBER 3 | MARCH 2012 465

Figure 4 Pair-wise image statistics. (a) Each

information between the reference filters (green

466 VOLUME 15 | NUMBER 3 | MARCH 2012 nature NEUROSCIENCE

10 Normalized arrangements (Supplementary Fig. 4a and Supplementary Note).

nature NEUROSCIENCE VOLUME 15 | NUMBER 3 | MARCH 2012 467

468 VOLUME 15 | NUMBER 3 | MARCH 2012 nature NEUROSCIENCE

nature NEUROSCIENCE VOLUME 15 | NUMBER 3 | MARCH 2012 469

of the center of gaze, equation (1) can be written in approximate form as

nature NEUROSCIENCE doi:10.1038/nn.3021

doi:10.1038/nn.3021 nature NEUROSCIENCE

You might also like