Seminar:             University Seminar on Cognitive and Behavioral Neuroscience (603)

 

Date:                March 27, 2003

 

Title:                The Biological Basis of Speech: Talking to the Animals and Listening to the Evidence

 

Speaker:           J.D. Trout, Ph.D., Philosophy Department and Parmly Hearing Institute, Loyola University of Chicago

 

Participants:            Herb Terrace, Co-Chair

                        Peter Balsam, Co-Chair

                        Robert Remez

Jon Allen

Peter Balsam

Colin Beer

Bill Benzon

Gina Cardillo

Joseph Cesario

Martha Chaiken

Josh Davis

Bridgid Finn

Jessica Goldberg

Nate Kornell

Robert Krauss

Jeffrey Loewenstein

Dustin Merritt

Ezequiel Morsella

Tammy Moscrip

Rebecca Passonneau

Lois Putnam

Kelley Remole

John Saxman

Eric Schoenberg

Ann Senghas

Yaakov Stern

Michael Studdert-Kennedy

Andrew Sunshine

Robert Thompson

Athena Vouloumanos

Cynthia Yang.

 

 

Rapporteur:             Jennifer Pardo

 

Summary:

 

Roughly half of this talk comes from a review paper and the other half from new work on specialization of speech processes. In the interest of full disclosure, if I seem negative toward primitive attempts at comparative research on mammals, it could be because my son didn’t compare favorably to a 3-year-old Congo Gray on some object permanence tasks at my 11-year-old niece’s science fair. The speech is special (SiS) thesis has a long history, and we’ll see throughout that this thesis is sometimes confounded with a broader thesis about language being special, but I want to defend the thesis that speech is special. We can construe that doctrine as the doctrine that production and perception of speech are uniquely human adaptations, rooted in human biology. The history of this notion goes a long way back to Eric Lenneberg and Al Liberman, early and mid-career Noam Chomsky, and Steve Pinker.

Auditorism is offered as an alternative hypothesis to the claim that there are specialized mechanisms for processing sequences in language. Auditorism is an implicit hypothesis that only general auditory mechanisms are required to explain the distinctive achievements of speech perception. I say implicit in part because there is no programmatic or systematic defense of auditorism so named, but it’s propagators are identifiable. The basis for the hypothesis trades on the idea that we share auditory capacities, such as frequency analysis, with organisms beyond our evolutionary lineage. That hypothesis fuels another, that if you can make non-human organisms perform similarly (to humans) on selected speech tasks, then there is no reason to postulate specialized speech mechanisms.

There is diverse biological evidence for the SiS hypothesis. There is PET and fMRI evidence that shows that there are different brain areas used when people perform auditory tasks, like pitch judgment tasks, versus speech tasks, like phoneme identification. The area that activates most distinctively for the speech task is Wernicke’s area, which regulates preparation for articulation. By making a simple phonetic judgment, one is in effect preparing for articulation. Next, there is a well-established critical period for language and speech acquisition, but not for audition. This is a general phenomenon, not just for speakers, but also for signers. If signers don’t begin prior to the end of this period, they end up with correlative forms of agrammatism and other sorts of disorders. They can learn as speakers can learn after that, but they deploy general problem-solving strategies that are much more onerous. In addition, there is an autonomy of aphasia from auditory function, showing that there is no necessary auditory basis for a speech-relevant disorder, as such patients typically show normal audiograms. Moreover, dichotic listening tasks demonstrate a right-ear advantage for speech tasks. The effect is small and complicated, but nevertheless real. A recent article by Hickok makes it clear that there is a certain amount of speech processing done in both hemispheres, but the connections are better to the left side. That places some restrictions on the kind of animals that you can use in a comparative paradigm because some animals’ brains aren’t even lateralized.

In terms of speech perception research, the most prominent paradigm is categorical perception, and this task is the one that was used by researchers at Wisconsin and Texas on quail to form the basis for auditorist theses. In this paradigm, a series of sounds varying along a single acoustic dimension, such as voice onset time, is created so that there are items of intermediary values, and people didn’t identify these middle versions as different from the extreme versions. They would only identify the items as “ba” or “pa.” The idea was that if you could train quail to respond similarly to humans, then they are displaying categorical perception as well.

There is also auditory-visual speech perception evidence that is absent from animal research except for a rough orientation to the face. In humans, the ability to comprehend running speech is enhanced by access to movement of the lips. In the standard experiment, researchers add noise to the sound or visual image (using sanded plexiglass) and shift modal reliance on audition or vision. Others use cognitive load as well—one set of researchers presented a passage of Kant’s First Critique spoken by a talker that listeners either could or could not see, and the listeners had to shadow the passage. There were fewer shadowing errors in the audio-visual condition. The integration appears to occur early on in perception, not as a result of independent auditory and visual perception. Finally, there is the characteristic symmetry between speech perception and speech production. There is a relationship between being able to perceive certain categories of speech and being able to produce them. This finding is confirmed by some of the PET findings connecting phoneme identification with the area associated with preparation for articulation.

The main effort to debunk SiS and the motor theory of speech perception began with the categorical perception task. The method was the operant training of quail. Another version was done earlier with chinchilla by Miller and Kuhl, but their conclusions were much more hedged, and they interpreted the similarity in the performance functions as an evolutionary predisposition to mark category boundaries for perceptual reasons. In the quail studies, the training proceeds by presenting the animals with CV combinations in which the F1 onset frequency is manipulated to create a continuum of variable CVs between the two relevant speech sounds. The dependent variable is pecking behavior, and the animals are reinforced when they identify the item correctly. They are trained on the clear cases, then given a generalization task where they are offered a few novel intermediary versions. The quail do show similar identification performance functions to humans in this kind of task.

There are a couple of points to keep in mind about the task. There is a lot of individual variability, believe it or not, among the quail in their ability to perform the task. Some birds are given up on, which is always sad. Some quail require three times as much training as others. Each quail in any experiment requires a minimum of 4,000 trials over relatively brief periods of explicit reinforcement. In the 1987 study, in which there were only 3 or 4 quail, one quail required over 12,000 trials. In the 1991 paper, only one required as few as 4,000 trials.

The SiS rejoinder to this debunking design is as follows. Humans learn to speak and to comprehend speech, and an aspect of this knowledge is phonetic. Quail are trained to identify sound patterns outside of a linguistic domain. Human psychophysical performance reflects only a small part of human competence in language. Quail psychophysical performance in these studies is their only contact with language. Therefore, the suggested analogy between human and quail psychophysical performance is inaccurate, and the comparison is unilluminating.

I refer to the auditorist projects as a refutation project. The structure of the argument and the motivation for the experiments as stated in these articles doesn’t have anything to do with understanding the quail auditory system any better, nor with understanding quail perceptual organization. It is an argument for reorganizing and reorienting the research program on the contemporary scene in speech perception. That is, we should shift attention to the mechanisms of human hearing because we know a lot more about hearing than about speech mechanisms, if there be such. That promises to allow us to solve problems in the relatively near future.

The auditorist refutation projects states something like the following:  Nonhuman animals display the same response patterns as humans when presented with CV syllables. Parsimony demands that we infer that similar mechanisms are responsible for similar response patterns. Therefore, the same mechanisms are employed by humans, chinchilla, and quail. This argument is invalid without the following additional premise:  And the only evidence bearing on this hypothesis is the behavioral evidence in these experiments. However, if you include any of the developmental evidence or any of the biological evidence and much of the behavioral evidence in humans, you are not driven to the conclusion that auditory mechanisms are the only ones at play in speech processing. Even if that additional premise is true, the argument is unsound because other premises are false.

The structure of the refutation argument looks like the standard behaviorist argument, that similarity of behavior indicates sameness of mechanism, but similar behavior does not imply similar function or biological mechanism, yet that is the conclusion that’s drawn (the title of the 1991 paper uses the word “mechanism”). In the famous Breland & Breland paper, mere behavioral isomorphisms come cheap, too cheap to earn their explanatory keep. There are elephants that are trained to walk bipedally, horses that can count in base 10, and fleas that pull carriages, and nobody suggests that that shows that humans don’t have specialized anatomy for bipedal gait. But, that is the logical structure for the auditorist refutation argument.

Compounded with these problems is what you could refer to as misleading attributions. When the quail’s behavior is described by the individuals who do this research, they often say things like, “for birds trained to peck to CVs with low F3 onset frequencies” and “peck rates increased for CV syllables following al,” that papers over the difference in perspective one might say exists between the quail and the human. Everybody would agree that it’s misleading at least in a certain sense to say that Oedipus wanted to marry his mother. He wanted to marry Jocasta, and it happened that Jocasta was equivalent to his mother, but that fact was opaque to him. So, transparent attributions are overly generous, they make the subject of the study appear wiser than it in fact is—CVs don’t appear in the electrical vocabulary of quail, so this attribution makes it appear as if they are conceptualizing the pattern in some way.

The fact is, we don’t really know what it is about the stimuli that they are responding to—they may be responding to some acoustic feature that they are really good at picking up on. I think that’s the key to a really interesting experiment that could be run:  To train up quail to perform on speech materials in a way that’s better than humans can do. Then there couldn’t possibly be an explanation for why they would have that adaptation. It would be obvious that some other feature in their auditory system is available for them to draw on that allows them to perform the categorical perception task. It should also be noted that they start collecting the data as soon as the quail hit 90% performance in training, which is considerably worse than what a child could do under similar circumstances.

Doctor Allen:  Are you arguing against perceptual features in speech? Because that’s what it sounds like. If you take a particular feature and the fact that a quail can recognize it, it is an argument for the primitiveness of that feature, but there are many other ways of getting at this question, you don’t have to use quail. I would say that there’s a lot of evidence that there are some basic features that are pre-phonemic that are recognized, and this goes back to Miller and Nicely and some others. You’re arguing against them.

Professor Trout:  No, my point is that if there are these basic features, they are features that may be useful to the quail in their environment, but they are certainly not features in the sense of phonological features, for example. It might be that animals statistically track changes in distributional information, and some of them can do that relatively well, but what they can’t do very well is learn the homorganic constraint—Anything that requires anything like the recognition of a phonological feature or a hierarchical relationship.

Doctor Allen:  They can use pitch to communicate, they have primitive pitch codes. It’s obvious to me, I don’t know if you agree, that that is a form of a feature. Also, durations and things like that, in some kind of a binary representation.

Professor Trout:  Yes, I should just make clear that I don’t deny that there’s some ability that these quail are using, but when humans make the discriminations and identifications, it’s likely, given the specialized processing that I’ve been talking about, that they’re making it on a different basis, a different mechanism is involved.

Doctor Allen:  I think that there’s strong evidence to indicate that that’s not the case, like Miller and Nicely.

Professor Trout:  As long as it turns out that the chinchilla are exploiting a mammalian feature that other mammals have as well, an innate predisposition of some sort, we may share a common basis, but there are many linguistic contrasts that we mark that are not of that sort—that are not just a pitch change or something like that. And there’s a reason for that as well. When we listen to language, it’s easy for us humans to track formants in the sense that we’re sensitive to formant changes at particular points, it’s very hard for us to track F2 alone, for example. That’s because the linguistic nature of the speech suppresses your efforts to hear it as a mere acoustic property. Yes, there are acoustic features that we may be able to track, like pitch contours for pragmatic implicatures, but much of the information that we do track is higher-level linguistic information that is above the fray of quail.

Here are some examples of the attempt to paper over the differences by making some transparent attributions. The more grotesque versions would be to say something like, when I come home, my dog is happy to greet the lover of earthy poetry. He is responding to something in me, but he’s not conceptualizing me under that mode of presentation. In the same sense, if you could train quail or starling to discriminate frequencies in a musical score, it wouldn’t necessarily show that they are conceiving of it as anything other than a sequence of acoustic properties, they are not appreciating any subtleties of Puccini. It is important to try to represent the mode of presentation under which the organism is capable of understanding the stimuli presented.

There are cognitive approaches that I regard as largely above the fray of this dispute. Many of the studies on numerosity in apes and object naming by Alex, for example, the Pepperberg Congo Gray, and studies of the sort that Hauser is doing on tracking hierarchical information in humans compared to tamarins don’t suffer from the same problems. They may suffer from methodological problems, but if they do, they are of a different sort. First of all, there is no effort in those studies explicitly to state behaviorist inferences of the sort that I’m discussing about similarity of performance and sameness of mechanism. They have wisely chosen species for which an argument from homology could be made. Finally, they involve largely cognitive tasks—again there is no effort to eliminate appeal to cognitive mechanisms as in traditional behaviorism.

I offer a defense of auditorism that memorializes the distinction between speech and language. This happens early on in any dispute—that all of the difficult problems are with language. Among these are the hierarchical relations in syntax that other animals clearly aren’t able to appreciate, but speech could be understood in terms of hearing. Speech is a waveform that the auditory system is suited to process, but language concerns higher-level cognitive or central processes. One reply is just to point out that it is not clear what the distinction is, or where the distinction is drawn between speech and language. When you have phenomena like effects of coarticulation or hierarchical relationships among even short sequences of speech. The homorganic constraint does not range over very wide sequences of speech—it is relatively local when you consider the rule that an obstruent at the termination of a syllable sequence that is preceded by a nasalized consonant must share the same place of articulation [speaks these examples:  ankle, amble, antler, angle]. If one considers a phonological rule of this sort determining the articulation of such short sequences, it becomes difficult to distinguish a proprietary speech stimulus from a proprietary language stimulus.

There are other reasons to distinguish between speech and language, and one of them just has to do with good housekeeping. We need a taxonomy to work from, and working with one set of stimuli may be more appropriate for some purposes than for others, but be clear that it’s as conventional taxonomy, and that you needn’t expect the natural world to cooperate with the taxonomy that you are imposing. You have to be open to discovering that the world does not respect the contours of your theory.

In a very recent article in Science, Hauser, Chomsky, and Fitch construct a new theory to approach the problems of comparative method and offer a new explanation for the distribution of these psychological features in language. They propose that there’s a faculty of language in the broad sense (FLB), which says that the basis for speech perception is shared with non-human animals. Faculty of language in the narrow sense (FLN) would be a subset of that. FLB includes 1) a sensorimotor system, 2) a conceptual-intentional system, and 3) computational mechanisms for recursion. They think that #3 is the only uniquely human feature of language, and that #1 is so well-established as shared among humans and non-human animals that it deserves presumptive favor, or status as the null hypothesis. It is worth asking in a situation like this, what are the alternative hypotheses to the one that deserves presumptive favor here?

An interesting aspect of the paper is that they are very clear that elaborate training of the non-human subjects is a threat to validity—even in the close of the article, they mention that importantly, they are engaged in the process of running non-human animals (tamarins in one case) with a minimum of training. It turns out that there are some linguistic tasks that the tamarins just cannot perform, even after a lot of training. If elaborate training is a threat to validity, why is #1 asserted as the null hypothesis? All of the evidence that we have from non-human animals is the result of elaborate training. Another point is that if one considers the fMRI or PET evidence, if areas devoted to articulation are activated as part of a sensorimotor system, that must be a disanalogy with quail because they do not have the motor system for producing speech.

Doctor Allen:  You seem to be ignoring all the neurophysiological evidence of what goes on in the auditory cortex. It’s obvious that if you stick an electrode in a cat, you can get a lot of information that is clearly similar to what is going on in humans. There is a lot of overlap based on neurophysiological evidence and that’s also being support in the PET and other sorts of imaging studies.

Professor Trout:  That’s right, and even in the Hickok research that I mentioned earlier with Poeppel, the auditory system is very richly involved in speech processing, along with other areas of the brain. You don’t get the same kind of activation if you use sub-lexical items that you get if you use entire words. I’m going to argue that there are important contributions that audition makes, and you can see where those contributions are important when you look at the reading disorders literature as well, which I will discuss later. I’m not ignoring the contributions that audition makes, I’m just saying that if your argument is that humans don’t have specialized speech mechanisms because quail have a similar performance function on a categorical perception task, or if you’re going to argue that the basis for speech perception is shared with non-human animals’ sensorimotor systems, then it is peculiar that the motor area gets activated in humans, but there couldn’t be any correlate in quail for the motor system that’s associated with speech production.

The faculty of language in the narrow sense (FLN) includes only the computational mechanisms for recursion, and it is the only uniquely human component of the language faculty. These mechanisms are responsible for some of the more syntactic features of language, namely, discrete infinity, embeddedness, and hierarchical structure. Hauser et al. argue that FLN could have evolved independently to serve the adaptive advantage that might be conferred on an organism that can play Machiavellian social schemes. This ability might allow such organisms to survive to reproduce more of their kind. The problem is that they offer little more detail, so you have to worry about the evolutionary explanations being “just so stories” without more constraining information.

Attempting to constrain such evolutionary explanations is a healthy project, in part because it allows one to be clearer about whether this performance across different species can really be explained by biological homologies, or whether there are homoplasies, or analogies, in organisms. In the classic comparative example, the human eye and the octopus eye are analogies rather than biological homologies that evolved because the constraints of the environment on an ability to focus a clear image on a sheet of cells converges on similar solutions. This is one way that you could think about different species having similar performance, but not homologous mechanisms for the solution of the problem.

Because the auditorist position hasn’t been defended in much positive detail, though there are allusions to hearing being relatively better understood than speech, there are auditorist promissory notes about how reorientation of a research program from speech perception to the auditorist view would be healthy. The reason most often given is that speech perception is a tractable problem on the auditorist view. If we understand so much about hearing mechanisms, and speech perception is reduceable to hearing, then it’s at least possible to solve some of the problems, like that of acoustic variability. If you are searching for the content of the claim that speech perception is a tractable problem, one thing they might mean is that it is a natural process and if speech perception is a natural process, then of course it is tractable in a certain respect. It is part of the world to be understood with experimental methods and theoretical enquiry, and it’s only on the condition that the real solutions to problems of speech perception outstrip our computational capacities permanently that it wouldn’t be a tractable problem. Perhaps more is meant than that.

One thing that might be meant is that, understood in terms of audition, the mechanisms for which are well-documented, speech perception is a tractable problem. This trades on the understanding that we have of hearing mechanisms. Another possibility, and I think this is getting closer to the bone, is that clinical problems could be solvable in 10 years or so. The suggestion is more like we’re at the horizon of solving some stubborn clinical problems that require the development of distinctive technologies. It’s easier to imagine how those problems could be ameliorated with clinical solutions if speech perception is reduceable to hearing because there is a lot of technology for hearing. Part of the thrust behind the speech perception as a tractable problem view may also have to do with the hopes for the construction and patenting of prosthetics.

One issue that has oftentimes given solace to the auditorist’s understanding of speech perception is that there is evidence that there’s an auditory basis for selected reading disorders. The finding that is most frequently mentioned in the reading disorder literature is a difficulty associated with processing of duration information. In these cases, researchers identify people with specific reading disorders and examine performance on audition tasks. For these subjects with selective reading disorders, they also perform poorly on triad tone tests, in which the tones vary in duration in two increments (250 ms versus 500 ms), and the task is to report the sequence of short and long tones. Such subjects have difficulty processing short-term auditory information. This finding has led some auditorists to recruit this as evidence that speech perception is an auditory process. In particular, the phonological encoding that’s crucial to reading is hampered by duration processing disorders. On that view, however, the basis for reading is not identical with the basis for hearing because if x (hearing disorder) causes y (reading disorder), then x cannot be identical with y.

Any experimental manipulation is done in the context of a particular theoretical and hypothetical construct, so one cannot simply resort to striking the pose of pure empiricism. In a well-ordered science, there are priorities set for what the big questions are and how they should be funded, among other things. In many domains, including cellular development, there are payoffs expected in some unknown future, and the decision to fund basic research cannot hinge on what those might be because they are unknown. Yet, decisions about the scientific significance of particular theoretical approaches are typically made in terms of how the resources would be used and how much success could be expected based on those allocations. There is a certain value that could be applied to the goal of truth in theoretical understanding, but it may be that funding priorities under limited resources really drive people’s theoretical rhetoric about which positions are most plausible.

If you have a view that you can hold out that might contribute to patents and might contribute to certain kinds of clinical solutions, it’s not only good press, but it is also good for the closing paragraph of a grant application. Some of it may be true, but the point here is that there is no reason why a theory of biological specialization for speech should perform much worse at that task than an auditorist theory. In addition, some care and responsibility must be exercised when you promote views based on the hope for patents and clinical promise because you’re talking about a vulnerable population that is desperate for solutions. We should be concerned that the standards of peer review are different from the standards of accreditation.

Those are some of the reasons that the SiS dispute may matter. It may help define what an ordered science in the area would look like—what a unified theory would look like that is general, that draws on diverse fields, rather than simply audition, and that takes a proper perspective on the relation between basic research and technological application.

In summary, cross-species behavioral similarities recruited by auditorism are unintelligible until we know more about what the quail is actually experiencing. The structure of the refutation project, with its premature appeal to parsimony, is invalid. Most support for SiS is untested on non-human animals. The theoretical perspective of SiS is by contrast with auditorism extremely complicated, which may make it less attractive for politically interesting reasons. The auditorist perspective is comparatively simple, in casting these long-standing problems in speech as practical. The impact on funding priorities, like big questions versus local applications, may drive the discussion, but SiS can promise to organize the big questions.

[Jen’s rejoinder:  If the problem is so tractable, and we already know so much about the auditory system, then why don’t we have a speech recognizer that can spare me the 8 hours I spent transcribing this talk? Not that I mind listening to the talk, but I would have preferred to be playing with my kids for the other 7 hours….]

APPLAUSE

Questions

Professor Krauss:  This is a question that I would only ask a philosopher. That is, not whether SiS raises a question that’s important, but whether it is a question that’s answerable. Unless you’re a creationist, you have to assume that at some remove at least, there is continuity between what we do and what other species do, and if you believe that the species is identified by the fact that it is in some respects unique, it’s unlikely that you’re going to find another creature that is going to do exactly what we do. So, to say that speech is special in some sense is a no-brainer because we don’t see very much speech in other species, in the same way that virtually everything about any species is special. Then, the really interesting possible question is what is there about humans that enables this facility that isn’t present in other species? I want to know how special does it have to be in order to be interesting?

Professor Trout:  I don’t know whether this is largely a rhetorical effect, but the thought that it is would probably be cynical. Evidence from brain localization that some find persuasive does not simply suggest that these mechanisms are innate. Rather, over the process of learning we may develop an ability to do something that is species-specific. It is not just that we come endowed with certain mechanisms, but that those mechanisms are time-locked in an interesting way. One quick answer to your question would be if you could find mechanisms associated with developmental features that seem to be significant only for humans, but the kind of evidence there is thus far for speech cases relies on behavioral isomorphisms.

Professor Krauss:  I just want to make two points, and then let someone else get in. One point is that you are able to make that argument without referring to any other species, which I think is fine. The other point is that saying that there is a structure in humans that seems to be associated with speech and that doesn’t occur in other species isn’t, I think, convincing evidence that that structure is responsible, or even necessary for speech.

Professor Trout:  That’s right. The second stage of what Hauser et al. refer to as the conceptual-intentional process is something that can be interestingly inquired about across species. You can ask whether an animal is really capable of referring, which is some of the research that Hauser has done. If they can, then they’ve achieved some intentional state. Of course, there’s a question about what constitutes referring, but the idea is that you can look for the same features in human language, but notice that I’ve changed from speech to language. Yet speech has indexical characteristics—you’re using words to pick out objects.

Professor Krauss:  Well, within the domain, vervets do the same thing. That’s why putting the issue as to the specialness of speech seems to be an uninteresting question, or maybe an unanswerable question. What is answerable is what we do with the capacities that we have, and what other species do with what they have.

Professor Trout:  You might use what kinds of predictions the theory makes as a measure of its fecundity. It would be a serious objection to SiS if you were to find that it vacuous, and it has never been subjected to that critique, rather that it is unduly substantial. There have been predictions that motor theorists have tested, and the particular motor theorist response to the quail findings was that there was no reason ever to predict that mammalian hearing couldn’t mark boundaries that might be auditorily important, but it is surprising that the ones that were auditorily important to them happened to mark categorical boundaries in human speech.

Doctor Allen:  I have a great confusion with what you are saying about SiS. At some point, you have language, and that’s special—there’s a big difference between English and Chinese and French, those are special. At one end of the spectrum, clearly there’s something that’s special because there’s a difference between French and English, and that’s special. But at the other end, you have the input device, which is speech goes into the ear and is processed by the auditory system, that’s one input to language. You also could sit down and read something, that’s another input for language. So you can get your information by reading it off of a page or by hearing it in the auditory system. It seems to me most of the time you’re talking about speech, so that means it’s the oral form of communication, and then sometimes you mix in the visual cues, and then you end up talking about language, which is special. It seems to me the real question here is at what point is it specialized? Clearly the auditory system is not specialized, and actually, speech is a very very simplistic signal compared with listening out a violin in an entire orchestra. Speech is a very primitive form of auditory signal, you could do a much more complicated task than listening to speech. So, it starts off as an acoustic waveform processing problem by the auditory system, and eventually it is presumably decoded down into some basic units, which may or may not be special. I would argue that they are not special, those are very basic feature detectors, just like vision has very basic feature detectors. Once it gets up into the modality of language, it becomes special, but it doesn’t have anything to do with speech.

Professor Trout:  There are a couple of ways of responding. I don’t think that everything that you’re saying is unfriendly to the proposal that I’m making, but I have to do some assimilation. In the sense of special that I’m using here, the differences between Chinese and French and American English are not special. There are differences across those languages that have to do with tonality and lots of other things, but the traditional arguments for the specialization hypothesis always had to do with things like the argument of the poverty of the stimulus as well as motor theory kinds of arguments. There’s a distinct pace and sequence with which children learn syntax, and it can happen in a relatively impoverished linguistic environment, whether you’re Chinese, French, or English-speaking. I had a grandma in Buffalo who was always saying, “No, no, no, it’s ‘I ain’t going to the store,’” so this is really bad feedback. So the differences between languages are not special, but other aspects are special. I don’t recall the second part of your question.

Doctor Allen:  I’m really looking for where is the definition of what’s special? It’s not special at the output of the cochlea, and maybe it’s starting to become special when you get to the auditory cortex, but we don’t even know that. By the time you get into the realm of language, and understanding, and syntax, you know, high-level context, then it can be special, but I don’t think that’s surprising. Nobody’s arguing about that. When you say speech is special, what do you mean is special?

Professor Trout:  One point that I was trying to make was that it’s not clear where you draw the line between speech, where a lot of the sequences are relatively tractable to issues of hearing, and language, that boundary is a natural boundary, so it may be unclear. I the cases of short sequences that are governed by phonological rules, you have something that’s a relatively simple sequence that is still constrained by higher-order rules that are specialized for linguistic processing. There’s not going to be a simple answer to your question, like, it has to happen before 200 ms if it’s going to be special. But, there are answers in terms of what kinds of predictions you would make about ease of processing that implicates the use of the lexicon. For example, I have just completed a set of studies that any auditorist should love. They basically create 5-channel noise-band speech, Shannon speech, with easy (high frequency, low neighborhood density) and hard (low frequency, high neighborhood density) words. The performance is quite different for these words-much higher recognition rates for easy words. There’s an independent finding about cochlear implant patients, that the most successful patients are the ones who have a more developed lexicon to begin with. There’s a straightforward proposal to make—if you’re going to create training materials, do so from easy words. Now, the output of the electrical leads in the cochlear implant are noisy, but because of the tonotopic organization of the basilar membrane, it is producing a signal which subjects can remap by drawing upon their lexicon. They are performing the remapping by drawing upon the words that they already know. Are they implicating general learning strategies to do that? We’re talking about noisy outputs of electrical leads and the use of the lexicon, we’re not really sure at what stage that interface occurs.

Doctor Allen:  You’re really out on a limb if you’re trying to understand normal human speech recognition by looking at cochlear implants. The articulation index, from 40 years of study at Bell Labs, proved that starting with just 20 bands of signal-to-noise ratio, you could predict nonsense phones, and from the nonsense phones you could predict words, and from the words you could get a certain probability correct on sentences. So the whole thing is a hierarchical description from a probabilistic point of view that describes how you can go from some very very primitive measurements to predicting performance of human speech, a complex human speech in noise. This is from something like 50 year’s worth of research.

Professor Trout:  Well, I’m not out on a limb making the argument without pediatric cochlear implant patients, I would add them to the fold, because I used normal hearing subjects hearing 5-channel speech. The expectation is that normal individuals who are challenged by 5-channel speech will display the same advantage for easy words, and they do.

Doctor Allen:  Entropy in language is important, and what you’re studying is about the entropy in language. If you use a high-entropy task and a low-entropy task, then you’re going to see a huge change in performance.

Professor Trout:  So, there are a couple of different ways you can go, and some of them may seem like they’re trivializations of a more substantial view that you’re offering, but one way that people might traditionally have made this point in the 80’s and early 90’s is that specialization consists in how precompiled the system is—if there are very complicated access codes already in place that transformed stimuli.

Doctor Allen:  You mean hard-wiring, don’t you. If we were hard-wired, we could only learn one language, right?

Professor Trout:  It depends on at what stage hard-wiring occurs. People agree that at the sensory stage, humans are relatively similar input, and normal humans are going to react in similar ways at the sensory periphery, but even though hard-wired, their systems don’t lack plasticity as well. So, that they can learn different languages.

Doctor Allen:  They can learn different phonemic features, even, just like the quail.

Professor Trout:  If what the quail are learning is phonemic features, as opposed to correlated markers.

Doctor Morsella:  One quick question. Your listing of the fallacies is very well-done because you see those fallacies everywhere in research on function. Another fallacy that may be related is that, as you know, early in behaviorist research, they used to believe that behavior is due to very simple mechanisms. Then, they found several cases in which behaviors could not be explained by simple operant principles, so then they said rats have cognitive maps. What has happened is not that now people doubt whether such complicated mechanisms are working in simple things, but that now people accept that once they find a complicated mechanisms, they apply it to the simpler things as well. The point with the auditory account is the idea that because quail are more simple, the process is more simple.

Professor Trout:  If you have really simple animals, then one could argue that S-R psychology wouldn’t do such a bad job. Two points that I underplayed were the discussion of parsimony, and what I didn’t talk about at all, the principle of total evidence. The principle of parsimony has to be very carefully applied because it is a methodological principle that could be misapplied if you have a really good theory. People are oftentimes opportunistic in the way that they apply it. For example, I teach at Loyola, so I often hear the argument that believing in God is actually more parsimonious than not believing in God, because not believing in God requires a relatively complex account of the world, whereas monotheism commits you to the existence of just one entity. It is parsimonious in ontology, but methodologically, it is pretty compromised. So, there are different ways that one could invoke parsimony arguments, so you have to be careful when you invoke parsimony. In the auditorist case, it’s invoked by saying, “all I know is that these quail respond in the same way that humans do, I’m just a poor boy from the city, it looks like the same performance function, it’s the same mechanism.” Certainly people believe that what is simple also has to be true, and so they may be drawn to simple explanations, but a principle that is widely accepted about considering total available evidence says that you should look at the greatest diversity of evidence that there is bearing on the hypothesis.

Doctor Studdert-Kennedy:  Maybe one way of arguing that there’s more to the SiS issue that you throughout seem to be raising is the idea that humans get more or different information out of the speech signal than other animals. In the same way, I’m quite sure that you could train quail to distinguish a phrase of Mozart from a phrase of Bach; in fact, I’m quite sure that if you worked at it, you could get them to make innumerable distinctions about music, but there would be no question that they’re not getting the same information out of it as a human listener, and I think that exactly the same is true for speech. The amount of information that is sitting in the speech signal, non-semantic, non-syntactic information, is vastly greater in human comprehension.

Professor Trout:  But, to take the auditorist side, I think the interesting question there is, why would the information that the quail get out of speech signal lead them to mark a boundary in the same way that humans do? If they are getting different information, or at least a human is getting more information, why would the quail mark the same boundary?

Doctor Allen:  You’re asking the quail to do a one-bit task, and they find out that they can do a one-bit task, but let them do a five-bit task. Can they do a five-bit task? If you use nonsense speech, and you play it to L1 Chinese, L1 English they’ll both perform the task quite well if you pick the phone, feature set from their language, and they won’t perform it well if you don’t. So, how much information you get out of speech depends on which language is being spoken.

Doctor Studdert-Kennedy:  Absolutely, as long as you are human, you can always get information out of some language, and the fact that you get different information is not as relevant as the fact that you get the same class of information.

Doctor Allen:  Yes, it’s language, you get knowledge about what the person means.

Doctor Studdert-Kennedy:  No, there’s different information because I get different information about how it’s pronounced in English, and I don’t get that information for Chinese.

Doctor Allen:  I’m certainly not saying that people aren’t getting information from vocal or auditory inputs, it’s just a matter of how much context they can take advantage of, it’s how deeply they can process it. But, this question of SiS, I’m asking, at what point are you trying to say it’s special? Certainly at some point when you get down to certain languages, it’s clearly special, but my question is still, at what point is it supposed to be special, and I still don’t think I have an answer.

Professor Trout:  Well, there are properties that other animals have that humans don’t, and you could ask the same question about those. I probably can’t discriminate above 16 kHz, but crickets apparently can discriminate categorically beyond that.

Doctor Allen:  Aren’t you saying that it’s special when it’s at the phonemic level because your example of the quail is with simple CV stimuli, so that must be sensitive to the point of your argument because speech is special because people can categorize and discriminate CV sounds.

Professor Trout:  You could also talk about lexical items, longer sequences of sounds, or you could talk about parts of the spectrum, shorter than CV sequences. You could run them on noise-band speech or on sinewave analogues, you could do any number of things and see how they react. Just because a sequence is super-segmental, for example, doesn’t mean that it’s not a good candidate for being in a specialization experiment.

Professor Remez:  In the Introduction to Acoustic Phonetics, by Mark Jost, published in 1948 or 1946, he says something like this:  We’ve been stuck with these articulatory descriptions of the phoneme inventory, at least since Jesperson, and now with the advent of acoustic analysis technology, it will be possible, finally, to give a rendition of the phoneme inventory in physical acoustical terms. That was the opening fanfare of a campaign of research that essentially failed. That is, the phoneme inventory of no language has been well rationalized by the distributional states of acoustic properties. When I think about the claim that Kluender has made on the basis of the quail, it seems directly to contradict at least 50 years of research that says that the phoneme inventory coincides with auditory categories. In fact, the little model case that they considered with place variation in quail is a rather selective presentation of an argument that attempts to refute this tradition—even leaving out all of the localization and developmental evidence for phonemes. I always took the rude version of the claim attempting to refute SiS as the claim that the quail data proved that, even though we can’t say what those particular acoustic properties are, that the quail shows us, having access to no other aspect of experience than the auditory experience, that in fact the categories that we call phoneme categories coincide with auditory categories. I take that to be the claim opposing SiS, and if you remove their evidence (and I think their evidence is terrible and generally acknowledged to be terrible), what you are left with is this problem of incommensurability of the categories of auditory experience and the categories that language uses to make distinctive contrasts. What’s wrong with that way to frame the argument?

Professor Trout:  Your question raises another question—Are you supposing that the similarity of performance is an artifact of the training regimen?

Professor Remez:  It’s the same thing as when Tinbergen’s Stickleback threatened the postal van. No ethologist would claim that the Stickleback was threatening the postman, even though that was the performance. It’s essentially a kind of bait and switch, exactly as you claim.

Professor Trout:  My question is that if the similarity in performance is somewhat more complicated than a threat or non-threat posture, how do you explain the specific character of the similarity in performance?

Professor Remez:  Why do human psychophysics and quail psychophysics look the same? Well, if you gave me a bunch of quail and a function to match, I’d figure out a way to do it. And if you gave me 30,000 trials to do it in, I’d be able to do it there, too. One of the things I tried to find out was what the test items actually were, but I found out that the test items had been lost to the mists of history, so it’s impossible to tell what the quail actually heard. My guess is that the quail were listening to apical release bursts, which is an extremely prominent spectra feature, and that they weren’t hearing place variation at all. So, it was an accident. Not only that, of course they put labial and palatal place in the same category.

Professor Trout:  I guess there’s another question why they stopped when they did as well. With humans in the McGurk effect, you can play that as long as you want and still find the effect. It would be interesting to see what would happen to the quail if you continued training them beyond the 4000 trials required to get them to 90%. You wonder why the organism is exposed for that particular period, because the identification point at which they stopped doesn’t have human significance.

Doctor Allen:  Do you know the Stevens work where they looked at the auditory nerve of a gerbil using whispered voiced and unvoiced speech, and even in the auditory nerve, the voicing was very clearly represented. This is just as important as an F2 transition, if not a lot more important, and it’s represented early on in very simple creatures. Is that special?

Professor Trout:  It depends on how specialized humans can become on those tasks as well. For example, in an ethology experiment where you are looking at non-human animals responding to some speech contrast, it may be that there’s a significant amount of brain plasticity that is required for just learning, even though the learning itself displays highly specialized features. Rather than simply saying that some non-human organism can do this, so is it special when a human can do it? You have to be able to see what the differences are. Part of the question is whether the organisms are responding to the same features, that is, the one is responding to acoustic correlates, and the other is responding to a linguistic feature.

 

(Notes prepared by Jennifer Pardo)