Breaking the Visual Stimulus into Parts

(to First page) (to Research Overview page) (to Multiple Analyzers page)

 

Norma Graham

Published in Current Directions in Psychological Science, 1992, 1, 55-61.

Notes: This paper is a summary of my book Visual Pattern Analyzers, published in 1989 by Oxford University Press.

This web page was made from the typescript of the final version as submitted. A number of minor changes were made by the editors.

Links to the submitted versions of Figures and the Table and Illustration of the dimensions in the table are given where appropriate as well as in this sentence. A pdf is available at: PDF (484K)

 

 

To perceive is to know "what is where." To perceive visually is to obtain this knowledge through the eyes. At a global level visual perception may be thought of as a two-fold process: The visual system first breaks the information that is contained in the visual stimulus into parts; then the visual system puts the information back together again. But why take it apart in the first place? Because the proximal stimulus -- the light falling on the retina -- bears little direct resemblance to the important aspects of the world that must be perceived, that is, to the distal stimulus. The lack of resemblance between the proximal and distal stimuli makes the task of visual perception inherently difficult. Presumably, the information in the proximal stimulus is analyzed into parts to make economical and feasible the reassembly of the information into a representation of the distal stimulus.

This review deals with the elementary parts into which visual information is initially analyzed, in particular with the elementary parts that are relevant for seeing patterns in space and time. (Color and three-dimensionality are not discussed, but much is known about them as well.) Three or four decades ago we knew very little about the elementary parts of visual patterns; we still know very little about the processes putting the information from the parts back together again.

Two events were crucial in initiating the successful approach to discovering visual patterns' elementary parts. One was the discovery of the specialized receptive fields of the neurons in primary visual cortex (area V1) along with the discovery that the parameters of these receptive fields varied in orderly ways from neuron to neuron (Hubel and Wiesel, 1962, 1977). The receptive field of a visual neuron is the area on the retina where light patterns can drive its firing (by way of neural pathways that connect the cortical neuron to that small region of the retina). Within the receptive field of a neuron in the visual cortex, there are excitatory and inhibitory subareas. These subareas form adjacent elongated stripes as shown in panel A of Figure 1. Light falling on an excitatory subarea increases the neurons' firing; light falling in an inhibitory area decreases it.

(to Figure 1)

The second crucial event was the application of Fourier or linear systems analysis to the problem of quantitatively defining pattern sensitivity both neurophysiologically and psychophysically (Campbell and Robson, 1968). This application introduced the use of sinusoidal gratings ("fuzzy stripes" -- see example in Fig. 1 panel B) as elementary stimuli because, according to the theorems underlying Fourier analysis, any visual pattern is equivalent to the sum of many sinusoidal grating stimuli.

These two events led to the suggestion that different neurons -- or, in the more physiologically-neutral language preferred by psychophysicists, different analyzers (sometimes called detectors, channels, units, or pathways)-- were selectively sensitive to different aspects of spatial and temporal patterns. The dimensions along which the selectivity occurs are the dimensions that arise naturally in discussions of Fourier analysis -- the orientations of the stripes, their spatial frequencies (widths), and their temporal frequencies (how rapidly a pattern oscillated or drifted with time). Thus, the fuzzy stripes that form the elementary stimuli in Fourier analysis were just what was required to define the selective sensitivity to local light patterns that is conferred on neurons in the primary visual cortex by the pattern of excitatory and inhibitory subregions in their receptive fields. Using these patterns, psychophysical and electrophysiological work has fleshed out the hypothesis that a fundamental stage of visual pattern processing is a stage of multiple analyzers, acting in parallel. This stage breaks the proximal visual stimulus down into parts because each analyzers responds only to a limited range of spatial frequencies, orientations, etc. (Although the parts responded to by these low-level analyzers are like sinusoidal patches and do serve as elementary units of visual processing, they clearly do not represent elementary units in the finished percept. Visual scenes do not appear to be composed of many small striped patches!)

These original events inspired an enormous amount of research, both neurophysiological and psychophysical. (Early, seminal psychophysical studies include Blakemore and Campbell, 1969, Pantle and Sekuler 1968a, Sachs, Nachmias, and Robson, 1971, and Thomas, 1970.) The reports of these studies were scattered over the pages of many different journals , and many of the researchers themselves had little idea of how coherent the whole story was becoming. In fact, when in 1980 I started writing a survey of this vast and dispersed literature, I planned to briefly describe the overall framework of multiple-analyzers models and then go on to point out at length all the problems with the overall picture (the discrepancies between different sets of results, the holes in existing models, etc.). Instead, I was surprised to discover that almost all of the difficulties and inconsistencies were in psychophysical studies done with high-contrast patterns (suprathreshold patterns). If you considered only the psychophysical studies using patterns of near-threshold contrast (contrast so low that the pattern is imperfectly discriminable from a steady blank field of the same space-average luminance), there were still hundreds of studies to consider employing a wide variety of psychophysical tasks, and the conclusions from different investigators using different methods agreed remarkably well. (A reason for this difference between near-threshold and suprathreshold results is considered briefly below.)

 

THE MULTIPLE-ANALYZER MODEL

The receptive fields of analyzers could differ in the size of their sub-regions and, therefore, in preferred spatial frequency. (The largest size of subregion and, therefore, the lowest preferred spatial frequency is in panels D and E of Fig. 1; the smallest size and, therefore, the highest spatial frequency is in panel C.) The receptive fields could differ as well as in orientation (vertical in panels A, D, and E, oblique in C). In general, they could differ in a number of other properties as well, for example, in symmetry (e.g. even symmetric fields in panel D versus an odd-symmetric field in panel E), spatial position (the location in the visual field of the receptive field center) and spatial extent (how long each sub-region is relative to its width, how many sub-regions there are). Similarly, although harder to diagram, the temporal responses of neurons or analyzers can differ in many properties. The rows of Table 1 give the dimensions necessary to specify the spatiotemporal receptive field of a neuron or analyzer.

(to Figure 1 - second time)

Alternatively, these dimensions in Table 1 can be described as dimensions necessary to characterize a particular visual stimulus -- a sinusoidal patch like that in panel B of Fig. 1 (which may, however, be sinusoidal in space and/or time)

(see dimension labeling rows of Table 1)

Adequately explaining why these dimensions might be basic dimensions for pattern vision is beyond our scope here. Briefly, however, as mentioned earlier, any visual stimulus is equivalent to the sum of many sinusoidal-patch stimuli. Thus, if you knew how a visual system responded to sinusoidal patches, you might hope to be able to compute its response to any stimulus at all. This would be strictly possible if the system were what is known as a linear system, that is, if the response to a sum of two stimuli was exactly the sum of the responses to each stimulus alone. One can hope that it would be approximately true for the visual system we actually have or, at least, for some of its sub-parts. As it turns out, the multiple analyzers are usually assumed to be linear but strict linearity is not crucial for the conclusions here.

To connect the outputs of these multiple analyzers to the observer's responses in psychophysical experiments, a very simple decision rule is typically used. The kind of rule obviously depends on the kind of experiment since the form of the observer's responses varies from one type of experiment to the next. For detection-threshold experiments (where the task of the observer is simply to say whether a non-blank pattern of near-threshold contrast has been presented or not), the most common rule is probably the maximum-output rule, namely: The observer says "yes, a pattern is present" if and only if the largest of all the analyzers' outputs is greater than some criterion.

In the last twenty years, it has becoming increasingly likely that the physiological substrate for the multiple analyzers is cortical area V1 (and perhaps V2). V1 is the primary visual receiving area of the cortex, that is, it is the first place in the cortex that receives information about the visual stimulus, as shown in the left of Fig. 2. Over the last twenty years, it has also become increasingly clear that V1 and V2 are only two of the many different areas in the cerebral cortex concerned with vision, and some of the other areas are also shown in Fig. 2. Recent estimates suggest that 25-40% of human cerebral cortex is concerned with processing the information from visual stimuli but only about 10-20% of that volume is thought to be V1.

(to Fig. 2)

The simple decision rule in the psychophysical model is thus a vastly oversimplified representation of 80 or 90% of visual cortex, a vastly oversimplified representation of all the many visual cortical areas higher than V1 or V2! Nonetheless, a model consisting of multiple analyzers coupled with this kind of simple decision rule is sufficient to explain at a quantitative level the results of near-threshold psychophysical experiments. It is as if the simplicity of the near-threshold experimental situation has made all the higher levels of visual processing transparent, allowing the properties of the low-level multiple analyzers to shine through.

In response to near-threshold stimuli, only a very small minority of the analyzers send above-baseline responses upstream. Perhaps this limits the kinds of processing that the higher levels can perform. Simple decision rules might well be inadequate for almost any suprathreshold experiment, however, since, in response to a suprathreshold stimulus, many multiple analyzers will send above-baseline responses upstream thus opening up many kinds of possible processing to the higher levels. It is perhaps no wonder that initial attempts to explain the results of suprathreshold experiments, using models that emphasized the multiple-analyzer stage without trying to build sophisticated later stages, were inadequate.

The results of near-threshold studies are summarized in Table 1 in terms of 4 questions, each of which can be answered by near-threshold experiments. This section briefly describes these questions and their answers. The logic by which one can answer these questions about multiple analyzers on the basis of psychophysical experiments (some of which is referred to as multidimensional signal-detection theory) is itself an area in which a good deal of progress has been made over the last two decades ( e.g. Graham, Kramer, and Yager, 1987; Graham, 1989; Klein, 1985; Macmillan and Creelman, 1990; Nachmias, 1981; Pelli, 1985; and Thomas, 1985).

Are there multiple analyzers on a given dimension?

Experimental results in which two stimuli having values close together on some dimension (e.g. two grating patches of very similar orientation) interact more than stimuli having far-apart values (e.g. two grating patches of perpendicular orientation) are taken as evidence for multiple analyzers on that dimension. The kind of interaction demonstrated depends on the kind of experiment in question.

In an adaptation experiment, for example, an observer might adapt to a vertical grating by looking at if for a period of some minutes while moving his or her eyes in order to prevent conventional afterimages. Then the observer would be tested with gratings of a number of different orientations. Typically, the detection thresholds for test gratings similar in orientation to the adapting grating (e.g. somewhat tilted off vertical) would be elevated after adaptation, while the detection thresholds for test gratings very different in orientation (e.g. horizontal) would not be affected. This value-selective behavior is explained by assuming (i) that analyzers sensitive to different orientations exist, and (ii) that those analyzers that are sensitive to the adapting orientation were fatigued or inhibited in some manner by the adaptation period.

In summation experiments, the degree of interaction between two values is the degree to which the detectability of a compound stimulus containing both values (e.g. a superposition of two orientations) exceeds the detectability of each component. For example, a compound pattern composed of two gratings of very similar orientations (which probably both stimulate much the same analyzers) is much more detectable than a compound composed of perpendicular orientations (which probably stimulate different analyzers).

Even this latter compound containing perpendicular orientations is somewhat more detectable than its components, however, an effect that is usually explained as "probability summation". To understand probability summation, it may help to think about throwing coins, where each coin represents a set of analyzers and getting a head on the coin represents the analyzers' detection of the stimulus. For the compound stimulus containing two far-apart orientations, two sets of analyzers each have a chance to detect the compound (those sensitive to one component and those sensitive to the other) which is like throwing two coins in order to get at least one head (a probability of 0.75 for fair coin tossing). But for a stimulus containing only one of the orientations, only one set of analyzers has a chance to detect it, which is like throwing one coin in order to get a head ( a probability of 0.50, lower than the 0.75 for the compound). Thus, if there is probability summation among analyzers, the compound should be somewhat more likely to be detected than either component even though the components are not stimulating the same analyzers. Probability summation is an example of behavior considered by multidimensional signal detection theory, behavior that needs careful consideration before psychophysical experiments can be properly interpreted.

In identification experiments, the degree of interaction between two values is the degree to which stimuli of those two values are confusable (e.g. a slightly off-vertical line being confused with a vertical line) as measured in any of several experimental paradigms. The confusion presumably occurs because both values stimulate the same analyzers. Here we consider only the identification of patterns which are themselves near-threshold, that is, imperfectly discriminable from a blank field.

In uncertainty experiments, an observer's performance is measured both when uncertain about which stimulus will be presented and when certain. If, for example, two stimuli are very far apart in orientation and an observer does not know which of the two will be presented on a given trial, the observer can not detect either stimulus as well as he or she can when certain as to its orientation. This decrement due to uncertainty is presumed to occur because, when the observer is uncertain as to which of two far-apart orientations will be presented, the observer has to monitor two different sets of analyzers; when the observer is certain however, he or she can monitor just one set. If, however, the two stimuli are quite similar in orientation, ignorance of which will be presented does not deleteriously affect performance, presumably because the observer is always monitoring just the one set of analyzers sensitive to both of the close-together orientations. (Uncertainty effects of the magnitude found in these near-threshold experiments do not appear to be due to limitations in attention capacity, however, but rather due to independent noisiness in the analyzers, another example of the behavior considered by multidimensional signal detection theory.) In uncertainty experiments, in short, interaction between values occurs when uncertainty does not cause a decrement in performance.

(back to experiments section of Multiple-Analyzer page)

Table 1 indicates on which dimensions value-selectivity has been found and, therefore, multiple analyzers are thought to exist. (Two notes of caution: The exact interpretations of any of the experiments is still incompletely understood and, therefore, bandwidth estimates may be revised. To answer unambiguously the question of multiple analyzers along any one dimension, definitional issues involving two or more dimensions are sometimes involved. See, e.g., Graham, 1989.)

(see answers in cells of Table 1)

One conclusion about the existence of multiple analyzers along different dimensions is worth special mention. Although spatial frequency and temporal frequency are in many ways formally identical, the value-selectivity found on the spatial-frequency dimension is much more pronounced than on the temporal-frequency dimension, in other words, each analyzer must respond to a much narrower range of values on the spatial frequency dimension than on the temporal frequency dimension.

Are the outputs of the multiple analyzers labeled?

To have a "labeled" output means that the higher stages of processing keep track of which output comes from which analyzer. For example, if the analyzers' outputs are labeled, the observer might be assumed to identify which of several stimuli was presented on a trial by finding out which analyzer had the biggest output on that trial. Although value-specific behavior in near-threshold summation and adaptation experiments can be explained without assuming that the multiple analyzers' outputs are labeled, value-specific behavior in uncertainty and identification experiments seems to require labeled outputs in its explanation (although the labeling may be fuzzy or otherwise imperfect).

Notice that on all but one of the dimensions in Table 1 where multiple analyzers exist, those multiple analyzers are labeled. The one exception is the eye-of-origin dimension. Although some analyzers respond better to one eye than the other, the higher stages of visual processing do not keep track of which analyzer prefers which eye well enough to enable an observer to identify which eye a monocular stimulus was shown to. Nor does uncertainty about which eye is stimulated affect performance. (The absence of labeling in these tasks may seem particularly surprising since, in order to do stereopsis computations, some higher stages of visual processing presumably are taking account of the eye each analyzer responds bests to.)

Are the multiple analyzers probabilistically independent?

Are the outputs of different analyzers variable? If so, is the variability in the output of different analyzers probabilistically independent, so that random variability in the output of analyzers sensitive to one range of values does not correlate with the random variability in analyzers sensitive to a different range of values? Or, in other words, are the noise sources in different analyzers independent? Probabilistic independence among the outputs of multiple analyzers can show its effects in near-threshold summation experiments (as probability summation for compounds containing far-apart values), in uncertainty experiments, and in some forms of identification experiments. Experimental results suggest that, wherever there are multiple analyzers, they seem to be probabilistically independent (although the interpretation is complicated on the temporal-frequency dimension by the breadth of tuning of individual analyzers and on the eye dimension, by the same factor and also by lack of labeling).

Is there mutual inhibition among the analyzers?

When stimuli that are far-apart on the dimension of interest produce an effect that is opposite to the effect found when stimuli are close together, mutual inhibition among analyzers is a possible interpretation. This kind of evidence for inhibition has occasionally been reported in adaptation, summation, and identification experiments. Another kind of evidence for inhibition is to find that adapting to a compound stimulus produces less effect than adapting to one of the components by itself. For both kinds of evidence, however, reasonable alternative explanations not requiring inhibition have also been put forth. Consequently, the psychophysical evidence for mutual inhibition from near-threshold experiments is not compelling. On the other hand, such inhibition is found in cortical physiology, and appears to be useful in the explanation of suprathreshold results. Thus a "perhaps" is listed in Table 1 for mutual inhibition whenever evidence consistent with inhibition has been observed in near-threshold psychophysical experiments.

 

Conclusion

The many hundreds of near-threshold pattern-vision experiments published in the last three decades form an impressive and compelling body of evidence for a model of pattern vision in which a fundamental stage is a set of multiple analyzers, acting in parallel, with different ranges of sensitivity along each of a number of different dimensions.

When considered together with the neurophysiological literature of the last three decades, it seems likely that the physiological substrate of these analyzers is area V1 (the lowest level of cortical visual processing) and perhaps V2.

As is consistent with this presumed physiological substrate, these multiple analyzers are apparently at a relatively low level in the full steam of visual processing (although coming after a number of other processes, e.g. light adaptation). It seems clear that much complicated computation intervenes between the analyzers' outputs and observers' perceptions.

In other words, the last three decades have told us much about how the our visual systems analyze the proximal visual stimulus into parts. A major challenge for the future is to find out how the parts that result from this analysis are "put back together" into a perception that generally corresponds very well to the distal stimulus -- the objects arranged in the world that the perceiver must know about and interact with in order to survive, the "what" is "where." In trying to do this task, we can build on the precise quantitative knowledge about the multiple analyzers that we have gained over the last three decades from both physiology and psychophysics, particularly from near-threshold psychophysics.

References