Seminar: University Seminar on Cognitive and
Behavioral Neuroscience (603)
Date: March
27, 2003
Title: The Biological Basis of Speech: Talking to
the Animals and Listening to the Evidence
Speaker: J.D.
Trout, Ph.D., Philosophy Department and Parmly Hearing Institute, Loyola
University of Chicago
Participants: Herb Terrace, Co-Chair
Peter
Balsam, Co-Chair
Robert
Remez
Jon Allen
Peter
Balsam
Colin
Beer
Bill
Benzon
Gina
Cardillo
Joseph
Cesario
Martha
Chaiken
Josh
Davis
Bridgid
Finn
Jessica
Goldberg
Nate
Kornell
Robert
Krauss
Jeffrey
Loewenstein
Dustin
Merritt
Ezequiel
Morsella
Tammy
Moscrip
Rebecca
Passonneau
Lois
Putnam
Kelley
Remole
John
Saxman
Eric
Schoenberg
Ann
Senghas
Yaakov
Stern
Michael
Studdert-Kennedy
Andrew
Sunshine
Robert
Thompson
Athena
Vouloumanos
Cynthia
Yang.
Rapporteur: Jennifer Pardo
Summary:
Roughly half of this
talk comes from a review paper and the other half from new work on
specialization of speech processes. In the interest of full disclosure, if I
seem negative toward primitive attempts at comparative research on mammals, it
could be because my son didn’t compare favorably to a 3-year-old Congo Gray on
some object permanence tasks at my 11-year-old niece’s science fair. The speech
is special (SiS) thesis has a long history, and we’ll see throughout that this
thesis is sometimes confounded with a broader thesis about language being
special, but I want to defend the thesis that speech is special. We can
construe that doctrine as the doctrine that production and perception of speech
are uniquely human adaptations, rooted in human biology. The history of this
notion goes a long way back to Eric Lenneberg and Al Liberman, early and
mid-career Noam Chomsky, and Steve Pinker.
Auditorism is offered
as an alternative hypothesis to the claim that there are specialized mechanisms
for processing sequences in language. Auditorism is an implicit hypothesis that
only general auditory mechanisms are required to explain the distinctive
achievements of speech perception. I say implicit in part because there is no
programmatic or systematic defense of auditorism so named, but it’s propagators
are identifiable. The basis for the hypothesis trades on the idea that we share
auditory capacities, such as frequency analysis, with organisms beyond our evolutionary
lineage. That hypothesis fuels another, that if you can make non-human
organisms perform similarly (to humans) on selected speech tasks, then there is
no reason to postulate specialized speech mechanisms.
There is diverse
biological evidence for the SiS hypothesis. There is PET and fMRI evidence that
shows that there are different brain areas used when people perform auditory
tasks, like pitch judgment tasks, versus speech tasks, like phoneme
identification. The area that activates most distinctively for the speech task
is Wernicke’s area, which regulates preparation for articulation. By making a
simple phonetic judgment, one is in effect preparing for articulation. Next,
there is a well-established critical period for language and speech acquisition,
but not for audition. This is a general phenomenon, not just for speakers, but
also for signers. If signers don’t begin prior to the end of this period, they
end up with correlative forms of agrammatism and other sorts of disorders. They
can learn as speakers can learn after that, but they deploy general
problem-solving strategies that are much more onerous. In addition, there is an
autonomy of aphasia from auditory function, showing that there is no necessary
auditory basis for a speech-relevant disorder, as such patients typically show
normal audiograms. Moreover, dichotic listening tasks demonstrate a right-ear
advantage for speech tasks. The effect is small and complicated, but
nevertheless real. A recent article by Hickok makes it clear that there is a
certain amount of speech processing done in both hemispheres, but the
connections are better to the left side. That places some restrictions on the
kind of animals that you can use in a comparative paradigm because some
animals’ brains aren’t even lateralized.
In terms of speech
perception research, the most prominent paradigm is categorical perception, and
this task is the one that was used by researchers at Wisconsin and Texas on
quail to form the basis for auditorist theses. In this paradigm, a series of
sounds varying along a single acoustic dimension, such as voice onset time, is
created so that there are items of intermediary values, and people didn’t
identify these middle versions as different from the extreme versions. They
would only identify the items as “ba” or “pa.” The idea was that if you could
train quail to respond similarly to humans, then they are displaying
categorical perception as well.
There is also
auditory-visual speech perception evidence that is absent from animal research
except for a rough orientation to the face. In humans, the ability to
comprehend running speech is enhanced by access to movement of the lips. In the
standard experiment, researchers add noise to the sound or visual image (using
sanded plexiglass) and shift modal reliance on audition or vision. Others use
cognitive load as well—one set of researchers presented a passage of Kant’s First Critique spoken by a talker that
listeners either could or could not see, and the listeners had to shadow the
passage. There were fewer shadowing errors in the audio-visual condition. The
integration appears to occur early on in perception, not as a result of
independent auditory and visual perception. Finally, there is the
characteristic symmetry between speech perception and speech production. There
is a relationship between being able to perceive certain categories of speech
and being able to produce them. This finding is confirmed by some of the PET
findings connecting phoneme identification with the area associated with preparation
for articulation.
The main effort to
debunk SiS and the motor theory of speech perception began with the categorical
perception task. The method was the operant training of quail. Another version
was done earlier with chinchilla by Miller and Kuhl, but their conclusions were
much more hedged, and they interpreted the similarity in the performance
functions as an evolutionary predisposition to mark category boundaries for
perceptual reasons. In the quail studies, the training proceeds by presenting the
animals with CV combinations in which the F1 onset frequency is manipulated to
create a continuum of variable CVs between the two relevant speech sounds. The
dependent variable is pecking behavior, and the animals are reinforced when
they identify the item correctly. They are trained on the clear cases, then
given a generalization task where they are offered a few novel intermediary
versions. The quail do show similar identification performance functions to
humans in this kind of task.
There are a couple of
points to keep in mind about the task. There is a lot of individual
variability, believe it or not, among the quail in their ability to perform the
task. Some birds are given up on, which is always sad. Some quail require three
times as much training as others. Each quail in any experiment requires a
minimum of 4,000 trials over relatively brief periods of explicit
reinforcement. In the 1987 study, in which there were only 3 or 4 quail, one
quail required over 12,000 trials. In the 1991 paper, only one required as few
as 4,000 trials.
The SiS rejoinder to
this debunking design is as follows. Humans learn to speak and to comprehend
speech, and an aspect of this knowledge is phonetic. Quail are trained to
identify sound patterns outside of a linguistic domain. Human psychophysical
performance reflects only a small part of human competence in language. Quail
psychophysical performance in these studies is their only contact with language. Therefore, the suggested analogy
between human and quail psychophysical performance is inaccurate, and the
comparison is unilluminating.
I refer to the
auditorist projects as a refutation project. The structure of the argument and
the motivation for the experiments as stated in these articles doesn’t have
anything to do with understanding the quail auditory system any better, nor
with understanding quail perceptual organization. It is an argument for
reorganizing and reorienting the research program on the contemporary scene in
speech perception. That is, we should shift attention to the mechanisms of
human hearing because we know a lot more about hearing than about speech
mechanisms, if there be such. That promises to allow us to solve problems in
the relatively near future.
The auditorist
refutation projects states something like the following: Nonhuman animals display the same response
patterns as humans when presented with CV syllables. Parsimony demands that we
infer that similar mechanisms are responsible for similar response patterns.
Therefore, the same mechanisms are employed by humans, chinchilla, and quail.
This argument is invalid without the following additional premise: And the only evidence bearing on this
hypothesis is the behavioral evidence in these experiments. However, if you
include any of the developmental evidence or any of the biological evidence and
much of the behavioral evidence in humans, you are not driven to the conclusion
that auditory mechanisms are the only ones at play in speech processing. Even
if that additional premise is true, the argument is unsound because other
premises are false.
The structure of the
refutation argument looks like the standard behaviorist argument, that
similarity of behavior indicates sameness of mechanism, but similar behavior
does not imply similar function or biological mechanism, yet that is the
conclusion that’s drawn (the title of the 1991 paper uses the word
“mechanism”). In the famous Breland & Breland paper, mere behavioral
isomorphisms come cheap, too cheap to earn their explanatory keep. There are
elephants that are trained to walk bipedally, horses that can count in base 10,
and fleas that pull carriages, and nobody suggests that that shows that humans
don’t have specialized anatomy for bipedal gait. But, that is the logical
structure for the auditorist refutation argument.
Compounded with these
problems is what you could refer to as misleading attributions. When the
quail’s behavior is described by the individuals who do this research, they
often say things like, “for birds trained to peck to CVs with low F3 onset
frequencies” and “peck rates increased for CV syllables following al,” that
papers over the difference in perspective one might say exists between the
quail and the human. Everybody would agree that it’s misleading at least in a
certain sense to say that Oedipus wanted to marry his mother. He wanted to
marry Jocasta, and it happened that Jocasta was equivalent to his mother, but
that fact was opaque to him. So, transparent attributions are overly generous,
they make the subject of the study appear wiser than it in fact is—CVs don’t
appear in the electrical vocabulary of quail, so this attribution makes it
appear as if they are conceptualizing the pattern in some way.
The fact is, we don’t
really know what it is about the stimuli that they are responding to—they may
be responding to some acoustic feature that they are really good at picking up
on. I think that’s the key to a really interesting experiment that could be
run: To train up quail to perform on
speech materials in a way that’s better than humans can do. Then there couldn’t
possibly be an explanation for why they would have that adaptation. It would be
obvious that some other feature in their auditory system is available for them
to draw on that allows them to perform the categorical perception task. It
should also be noted that they start collecting the data as soon as the quail
hit 90% performance in training, which is considerably worse than what a child
could do under similar circumstances.
Doctor Allen: Are you arguing against perceptual features
in speech? Because that’s what it sounds like. If you take a particular feature
and the fact that a quail can recognize it, it is an argument for the
primitiveness of that feature, but there are many other ways of getting at this
question, you don’t have to use quail. I would say that there’s a lot of
evidence that there are some basic features that are pre-phonemic that are
recognized, and this goes back to Miller and Nicely and some others. You’re
arguing against them.
Professor Trout: No, my point is that if there are these basic
features, they are features that may be useful to the quail in their
environment, but they are certainly not features in the sense of phonological
features, for example. It might be that animals statistically track changes in
distributional information, and some of them can do that relatively well, but
what they can’t do very well is learn the homorganic constraint—Anything that
requires anything like the recognition of a phonological feature or a
hierarchical relationship.
Doctor Allen: They can use pitch to communicate, they have
primitive pitch codes. It’s obvious to me, I don’t know if you agree, that that
is a form of a feature. Also, durations and things like that, in some kind of a
binary representation.
Professor Trout: Yes, I should just make clear that I don’t
deny that there’s some ability that these quail are using, but when humans make
the discriminations and identifications, it’s likely, given the specialized
processing that I’ve been talking about, that they’re making it on a different
basis, a different mechanism is involved.
Doctor Allen: I think that there’s strong evidence to
indicate that that’s not the case, like Miller and Nicely.
Professor Trout: As long as it turns out that the chinchilla
are exploiting a mammalian feature that other mammals have as well, an innate
predisposition of some sort, we may share a common basis, but there are many
linguistic contrasts that we mark that are not of that sort—that are not just a
pitch change or something like that. And there’s a reason for that as well.
When we listen to language, it’s easy for us humans to track formants in the
sense that we’re sensitive to formant changes at particular points, it’s very
hard for us to track F2 alone, for example. That’s because the linguistic
nature of the speech suppresses your efforts to hear it as a mere acoustic
property. Yes, there are acoustic features that we may be able to track, like
pitch contours for pragmatic implicatures, but much of the information that we
do track is higher-level linguistic information that is above the fray of
quail.
Here are some examples
of the attempt to paper over the differences by making some transparent
attributions. The more grotesque versions would be to say something like, when
I come home, my dog is happy to greet the lover of earthy poetry. He is
responding to something in me, but he’s not conceptualizing me under that mode
of presentation. In the same sense, if you could train quail or starling to
discriminate frequencies in a musical score, it wouldn’t necessarily show that
they are conceiving of it as anything other than a sequence of acoustic
properties, they are not appreciating any subtleties of Puccini. It is
important to try to represent the mode of presentation under which the organism
is capable of understanding the stimuli presented.
There are cognitive
approaches that I regard as largely above the fray of this dispute. Many of the
studies on numerosity in apes and object naming by Alex, for example, the
Pepperberg Congo Gray, and studies of the sort that Hauser is doing on tracking
hierarchical information in humans compared to tamarins don’t suffer from the
same problems. They may suffer from methodological problems, but if they do,
they are of a different sort. First of all, there is no effort in those studies
explicitly to state behaviorist inferences of the sort that I’m discussing
about similarity of performance and sameness of mechanism. They have wisely
chosen species for which an argument from homology could be made. Finally, they
involve largely cognitive tasks—again there is no effort to eliminate appeal to
cognitive mechanisms as in traditional behaviorism.
I offer a defense of
auditorism that memorializes the distinction between speech and language. This
happens early on in any dispute—that all of the difficult problems are with
language. Among these are the hierarchical relations in syntax that other
animals clearly aren’t able to appreciate, but speech could be understood in
terms of hearing. Speech is a waveform that the auditory system is suited to
process, but language concerns higher-level cognitive or central processes. One
reply is just to point out that it is not clear what the distinction is, or
where the distinction is drawn between speech and language. When you have
phenomena like effects of coarticulation or hierarchical relationships among
even short sequences of speech. The homorganic constraint does not range over
very wide sequences of speech—it is relatively local when you consider the rule
that an obstruent at the termination of a syllable sequence that is preceded by
a nasalized consonant must share the same place of articulation [speaks these
examples: ankle, amble, antler, angle].
If one considers a phonological rule of this sort determining the articulation
of such short sequences, it becomes difficult to distinguish a proprietary
speech stimulus from a proprietary language stimulus.
There are other
reasons to distinguish between speech and language, and one of them just has to
do with good housekeeping. We need a taxonomy to work from, and working with
one set of stimuli may be more appropriate for some purposes than for others,
but be clear that it’s as conventional taxonomy, and that you needn’t expect
the natural world to cooperate with the taxonomy that you are imposing. You
have to be open to discovering that the world does not respect the contours of
your theory.
In a very recent
article in Science, Hauser, Chomsky,
and Fitch construct a new theory to approach the problems of comparative method
and offer a new explanation for the distribution of these psychological
features in language. They propose that there’s a faculty of language in the
broad sense (FLB), which says that the basis for speech perception is shared
with non-human animals. Faculty of language in the narrow sense (FLN) would be
a subset of that. FLB includes 1) a sensorimotor system, 2) a
conceptual-intentional system, and 3) computational mechanisms for
recursion. They think that #3 is the only uniquely human feature of language,
and that #1 is so well-established as shared among humans and non-human animals
that it deserves presumptive favor, or status as the null hypothesis. It is
worth asking in a situation like this, what are the alternative hypotheses to
the one that deserves presumptive favor here?
An interesting aspect
of the paper is that they are very clear that elaborate training of the
non-human subjects is a threat to validity—even in the close of the article,
they mention that importantly, they are engaged in the process of running
non-human animals (tamarins in one case) with a minimum of training. It turns
out that there are some linguistic tasks that the tamarins just cannot perform,
even after a lot of training. If elaborate training is a threat to validity, why
is #1 asserted as the null hypothesis? All of the evidence that we have from
non-human animals is the result of elaborate training. Another point is that if
one considers the fMRI or PET evidence, if areas devoted to articulation are
activated as part of a sensorimotor system, that must be a disanalogy with
quail because they do not have the motor system for producing speech.
Doctor Allen: You seem to be ignoring all the
neurophysiological evidence of what goes on in the auditory cortex. It’s
obvious that if you stick an electrode in a cat, you can get a lot of
information that is clearly similar to what is going on in humans. There is a
lot of overlap based on neurophysiological evidence and that’s also being
support in the PET and other sorts of imaging studies.
Professor Trout: That’s right, and even in the Hickok research
that I mentioned earlier with Poeppel, the auditory system is very richly
involved in speech processing, along with other areas of the brain. You don’t
get the same kind of activation if you use sub-lexical items that you get if
you use entire words. I’m going to argue that there are important contributions
that audition makes, and you can see where those contributions are important
when you look at the reading disorders literature as well, which I will discuss
later. I’m not ignoring the contributions that audition makes, I’m just saying
that if your argument is that humans don’t have specialized speech mechanisms
because quail have a similar performance function on a categorical perception
task, or if you’re going to argue that the basis for speech perception is
shared with non-human animals’ sensorimotor systems, then it is peculiar that
the motor area gets activated in humans, but there couldn’t be any correlate in
quail for the motor system that’s associated with speech production.
The faculty of
language in the narrow sense (FLN) includes only the computational mechanisms
for recursion, and it is the only uniquely human component of the language
faculty. These mechanisms are responsible for some of the more syntactic
features of language, namely, discrete infinity, embeddedness, and hierarchical
structure. Hauser et al. argue that FLN could have evolved independently to
serve the adaptive advantage that might be conferred on an organism that can
play Machiavellian social schemes. This ability might allow such organisms to
survive to reproduce more of their kind. The problem is that they offer little
more detail, so you have to worry about the evolutionary explanations being
“just so stories” without more constraining information.
Attempting to
constrain such evolutionary explanations is a healthy project, in part because
it allows one to be clearer about whether this performance across different
species can really be explained by biological homologies, or whether there are
homoplasies, or analogies, in organisms. In the classic comparative example,
the human eye and the octopus eye are analogies rather than biological
homologies that evolved because the constraints of the environment on an
ability to focus a clear image on a sheet of cells converges on similar
solutions. This is one way that you could think about different species having
similar performance, but not homologous mechanisms for the solution of the
problem.
Because the auditorist
position hasn’t been defended in much positive detail, though there are
allusions to hearing being relatively better understood than speech, there are
auditorist promissory notes about how reorientation of a research program from
speech perception to the auditorist view would be healthy. The reason most
often given is that speech perception is a tractable problem on the auditorist
view. If we understand so much about hearing mechanisms, and speech perception
is reduceable to hearing, then it’s at least possible to solve some of the
problems, like that of acoustic variability. If you are searching for the
content of the claim that speech perception is a tractable problem, one thing
they might mean is that it is a natural process and if speech perception is a
natural process, then of course it is tractable in a certain respect. It is
part of the world to be understood with experimental methods and theoretical
enquiry, and it’s only on the condition that the real solutions to problems of
speech perception outstrip our computational capacities permanently that it
wouldn’t be a tractable problem. Perhaps more is meant than that.
One thing that might
be meant is that, understood in terms of audition, the mechanisms for which are
well-documented, speech perception is a tractable problem. This trades on the
understanding that we have of hearing mechanisms. Another possibility, and I
think this is getting closer to the bone, is that clinical problems could be
solvable in 10 years or so. The suggestion is more like we’re at the horizon of
solving some stubborn clinical problems that require the development of
distinctive technologies. It’s easier to imagine how those problems could be
ameliorated with clinical solutions if speech perception is reduceable to hearing
because there is a lot of technology for hearing. Part of the thrust behind the
speech perception as a tractable problem view may also have to do with the
hopes for the construction and patenting of prosthetics.
One issue that has
oftentimes given solace to the auditorist’s understanding of speech perception
is that there is evidence that there’s an auditory basis for selected reading
disorders. The finding that is most frequently mentioned in the reading
disorder literature is a difficulty associated with processing of duration
information. In these cases, researchers identify people with specific reading
disorders and examine performance on audition tasks. For these subjects with
selective reading disorders, they also perform poorly on triad tone tests, in
which the tones vary in duration in two increments (250 ms versus 500 ms), and
the task is to report the sequence of short and long tones. Such subjects have
difficulty processing short-term auditory information. This finding has led
some auditorists to recruit this as evidence that speech perception is an
auditory process. In particular, the phonological encoding that’s crucial to
reading is hampered by duration processing disorders. On that view, however,
the basis for reading is not identical with the basis for hearing because if x
(hearing disorder) causes y (reading disorder), then x cannot be identical with
y.
Any experimental
manipulation is done in the context of a particular theoretical and
hypothetical construct, so one cannot simply resort to striking the pose of
pure empiricism. In a well-ordered science, there are priorities set for what
the big questions are and how they should be funded, among other things. In
many domains, including cellular development, there are payoffs expected in
some unknown future, and the decision to fund basic research cannot hinge on
what those might be because they are unknown. Yet, decisions about the
scientific significance of particular theoretical approaches are typically made
in terms of how the resources would be used and how much success could be
expected based on those allocations. There is a certain value that could be
applied to the goal of truth in theoretical understanding, but it may be that
funding priorities under limited resources really drive people’s theoretical
rhetoric about which positions are most plausible.
If you have a view
that you can hold out that might contribute to patents and might contribute to
certain kinds of clinical solutions, it’s not only good press, but it is also
good for the closing paragraph of a grant application. Some of it may be true,
but the point here is that there is no reason why a theory of biological
specialization for speech should perform much worse at that task than an
auditorist theory. In addition, some care and responsibility must be exercised
when you promote views based on the hope for patents and clinical promise
because you’re talking about a vulnerable population that is desperate for
solutions. We should be concerned that the standards of peer review are different
from the standards of accreditation.
Those are some of the
reasons that the SiS dispute may matter. It may help define what an ordered
science in the area would look like—what a unified theory would look like that
is general, that draws on diverse fields, rather than simply audition, and that
takes a proper perspective on the relation between basic research and
technological application.
In summary,
cross-species behavioral similarities recruited by auditorism are
unintelligible until we know more about what the quail is actually
experiencing. The structure of the refutation project, with its premature
appeal to parsimony, is invalid. Most support for SiS is untested on non-human
animals. The theoretical perspective of SiS is by contrast with auditorism
extremely complicated, which may make it less attractive for politically
interesting reasons. The auditorist perspective is comparatively simple, in
casting these long-standing problems in speech as practical. The impact on
funding priorities, like big questions versus local applications, may drive the
discussion, but SiS can promise to organize the big questions.
[Jen’s rejoinder: If the problem is so tractable, and we
already know so much about the auditory system, then why don’t we have a speech
recognizer that can spare me the 8 hours I spent transcribing this talk? Not
that I mind listening to the talk, but I would have preferred to be playing
with my kids for the other 7 hours….]
APPLAUSE
Questions
Professor Krauss: This is a question that I would only ask a
philosopher. That is, not whether SiS raises a question that’s important, but
whether it is a question that’s answerable. Unless you’re a creationist, you
have to assume that at some remove at least, there is continuity between what
we do and what other species do, and if you believe that the species is
identified by the fact that it is in some respects unique, it’s unlikely that
you’re going to find another creature that is going to do exactly what we do.
So, to say that speech is special in some sense is a no-brainer because we
don’t see very much speech in other species, in the same way that virtually
everything about any species is special. Then, the really interesting possible
question is what is there about humans that enables this facility that isn’t
present in other species? I want to know how special does it have to be in
order to be interesting?
Professor Trout: I don’t know whether this is largely a rhetorical effect,
but the thought that it is would probably be cynical. Evidence from brain
localization that some find persuasive does not simply suggest that these
mechanisms are innate. Rather, over the process of learning we may develop an
ability to do something that is species-specific. It is not just that we come
endowed with certain mechanisms, but that those mechanisms are time-locked in
an interesting way. One quick answer to your question would be if you could
find mechanisms associated with developmental features that seem to be
significant only for humans, but the kind of evidence there is thus far for
speech cases relies on behavioral isomorphisms.
Professor Krauss: I just want to make two points, and then let someone else
get in. One point is that you are able to make that argument without referring
to any other species, which I think is fine. The other point is that saying
that there is a structure in humans that seems to be associated with speech and
that doesn’t occur in other species isn’t, I think, convincing evidence that
that structure is responsible, or even necessary for speech.
Professor Trout: That’s right. The second stage of what Hauser et al. refer
to as the conceptual-intentional process is something that can be interestingly
inquired about across species. You can ask whether an animal is really capable
of referring, which is some of the research that Hauser has done. If they can,
then they’ve achieved some intentional state. Of course, there’s a question
about what constitutes referring, but the idea is that you can look for the
same features in human language, but notice that I’ve changed from speech to
language. Yet speech has indexical characteristics—you’re using words to pick
out objects.
Professor Krauss:
Well, within the domain, vervets do the same thing. That’s why putting
the issue as to the specialness of speech seems to be an uninteresting
question, or maybe an unanswerable question. What is answerable is what we do
with the capacities that we have, and what other species do with what they
have.
Professor Trout:
You might use what kinds of predictions the theory makes as a measure of
its fecundity. It would be a serious objection to SiS if you were to find that
it vacuous, and it has never been subjected to that critique, rather that it is
unduly substantial. There have been predictions that motor theorists have
tested, and the particular motor theorist response to the quail findings was
that there was no reason ever to predict that mammalian hearing couldn’t mark
boundaries that might be auditorily important, but it is surprising that the
ones that were auditorily important to them happened to mark categorical
boundaries in human speech.
Doctor Allen: I have a great confusion with what you are
saying about SiS. At some point, you have language, and that’s special—there’s
a big difference between English and Chinese and French, those are special. At
one end of the spectrum, clearly there’s something that’s special because
there’s a difference between French and English, and that’s special. But at the
other end, you have the input device, which is speech goes into the ear and is
processed by the auditory system, that’s one input to language. You also could
sit down and read something, that’s another input for language. So you can get
your information by reading it off of a page or by hearing it in the auditory
system. It seems to me most of the time you’re talking about speech, so that
means it’s the oral form of communication, and then sometimes you mix in the
visual cues, and then you end up talking about language, which is special. It
seems to me the real question here is at what point is it specialized? Clearly
the auditory system is not specialized, and actually, speech is a very very
simplistic signal compared with listening out a violin in an entire orchestra.
Speech is a very primitive form of auditory signal, you could do a much more
complicated task than listening to speech. So, it starts off as an acoustic
waveform processing problem by the auditory system, and eventually it is
presumably decoded down into some basic units, which may or may not be special.
I would argue that they are not special, those are very basic feature
detectors, just like vision has very basic feature detectors. Once it gets up
into the modality of language, it becomes special, but it doesn’t have anything
to do with speech.
Professor Trout: There
are a couple of ways of responding. I don’t think that everything that you’re
saying is unfriendly to the proposal that I’m making, but I have to do some
assimilation. In the sense of special that I’m using here, the differences between
Chinese and French and American English are not special. There are differences
across those languages that have to do with tonality and lots of other things,
but the traditional arguments for the specialization hypothesis always had to
do with things like the argument of the poverty of the stimulus as well as
motor theory kinds of arguments. There’s a distinct pace and sequence with
which children learn syntax, and it can happen in a relatively impoverished
linguistic environment, whether you’re Chinese, French, or English-speaking. I
had a grandma in Buffalo who was always saying, “No, no, no, it’s ‘I ain’t going to the store,’” so this is
really bad feedback. So the differences between languages are not special, but
other aspects are special. I don’t recall the second part of your question.
Doctor Allen: I’m really looking for where is the
definition of what’s special? It’s not special at the output of the cochlea,
and maybe it’s starting to become special when you get to the auditory cortex,
but we don’t even know that. By the time you get into the realm of language,
and understanding, and syntax, you know, high-level context, then it can be
special, but I don’t think that’s surprising. Nobody’s arguing about that. When
you say speech is special, what do you mean is special?
Professor Trout: One
point that I was trying to make was that it’s not clear where you draw the line
between speech, where a lot of the sequences are relatively tractable to issues
of hearing, and language, that boundary is a natural boundary, so it may be
unclear. I the cases of short sequences that are governed by phonological
rules, you have something that’s a relatively simple sequence that is still
constrained by higher-order rules that are specialized for linguistic processing.
There’s not going to be a simple answer to your question, like, it has to
happen before 200 ms if it’s going to be special. But, there are answers in
terms of what kinds of predictions you would make about ease of processing that
implicates the use of the lexicon. For example, I have just completed a set of
studies that any auditorist should love. They basically create 5-channel
noise-band speech, Shannon speech, with easy (high frequency, low neighborhood
density) and hard (low frequency, high neighborhood density) words. The
performance is quite different for these words-much higher recognition rates
for easy words. There’s an independent finding about cochlear implant patients,
that the most successful patients are the ones who have a more developed
lexicon to begin with. There’s a straightforward proposal to make—if you’re
going to create training materials, do so from easy words. Now, the output of
the electrical leads in the cochlear implant are noisy, but because of the
tonotopic organization of the basilar membrane, it is producing a signal which
subjects can remap by drawing upon their lexicon. They are performing the
remapping by drawing upon the words that they already know. Are they
implicating general learning strategies to do that? We’re talking about noisy
outputs of electrical leads and the use of the lexicon, we’re not really sure
at what stage that interface occurs.
Doctor Allen: You’re really out on a limb if you’re trying
to understand normal human speech recognition by looking at cochlear implants.
The articulation index, from 40 years of study at Bell Labs, proved that
starting with just 20 bands of signal-to-noise ratio, you could predict
nonsense phones, and from the nonsense phones you could predict words, and from
the words you could get a certain probability correct on sentences. So the
whole thing is a hierarchical description from a probabilistic point of view
that describes how you can go from some very very primitive measurements to
predicting performance of human speech, a complex human speech in noise. This
is from something like 50 year’s worth of research.
Professor Trout: Well, I’m not out on a limb making the
argument without pediatric cochlear implant patients, I would add them to the
fold, because I used normal hearing subjects hearing 5-channel speech. The
expectation is that normal individuals who are challenged by 5-channel speech
will display the same advantage for easy words, and they do.
Doctor Allen: Entropy in language is important, and what
you’re studying is about the entropy in language. If you use a high-entropy
task and a low-entropy task, then you’re going to see a huge change in
performance.
Professor Trout: So, there are a couple of different ways you
can go, and some of them may seem like they’re trivializations of a more
substantial view that you’re offering, but one way that people might
traditionally have made this point in the 80’s and early 90’s is that
specialization consists in how precompiled the system is—if there are very
complicated access codes already in place that transformed stimuli.
Doctor Allen: You mean hard-wiring, don’t you. If we were
hard-wired, we could only learn one language, right?
Professor Trout: It depends on at what stage hard-wiring
occurs. People agree that at the sensory stage, humans are relatively similar
input, and normal humans are going to react in similar ways at the sensory
periphery, but even though hard-wired, their systems don’t lack plasticity as
well. So, that they can learn different languages.
Doctor Allen: They can learn different phonemic features,
even, just like the quail.
Professor Trout: If what the quail are learning is phonemic
features, as opposed to correlated markers.
Doctor Morsella: One quick question. Your listing of the
fallacies is very well-done because you see those fallacies everywhere in
research on function. Another fallacy that may be related is that, as you know,
early in behaviorist research, they used to believe that behavior is due to
very simple mechanisms. Then, they found several cases in which behaviors could
not be explained by simple operant principles, so then they said rats have
cognitive maps. What has happened is not that now people doubt whether such
complicated mechanisms are working in simple things, but that now people accept
that once they find a complicated mechanisms, they apply it to the simpler
things as well. The point with the auditory account is the idea that because
quail are more simple, the process is more simple.
Professor Trout: If you have really simple animals, then one
could argue that S-R psychology wouldn’t do such a bad job. Two points that I
underplayed were the discussion of parsimony, and what I didn’t talk about at
all, the principle of total evidence. The principle of parsimony has to be very
carefully applied because it is a methodological principle that could be
misapplied if you have a really good theory. People are oftentimes
opportunistic in the way that they apply it. For example, I teach at Loyola, so
I often hear the argument that believing in God is actually more parsimonious
than not believing in God, because not believing in God requires a relatively
complex account of the world, whereas monotheism commits you to the existence
of just one entity. It is parsimonious in ontology, but methodologically, it is
pretty compromised. So, there are different ways that one could invoke
parsimony arguments, so you have to be careful when you invoke parsimony. In
the auditorist case, it’s invoked by saying, “all I know is that these quail
respond in the same way that humans do, I’m just a poor boy from the city, it
looks like the same performance function, it’s the same mechanism.” Certainly
people believe that what is simple also has to be true, and so they may be
drawn to simple explanations, but a principle that is widely accepted about
considering total available evidence says that you should look at the greatest
diversity of evidence that there is bearing on the hypothesis.
Doctor Studdert-Kennedy:
Maybe one way of arguing that there’s more to the
SiS issue that you throughout seem to be raising is the idea that humans get
more or different information out of the speech signal than other animals. In
the same way, I’m quite sure that you could train quail to distinguish a phrase
of Mozart from a phrase of Bach; in fact, I’m quite sure that if you worked at
it, you could get them to make innumerable distinctions about music, but there
would be no question that they’re not getting the same information out of it as
a human listener, and I think that exactly the same is true for speech. The
amount of information that is sitting in the speech signal, non-semantic,
non-syntactic information, is vastly greater in human comprehension.
Professor Trout: But,
to take the auditorist side, I think the interesting question there is, why
would the information that the quail get out of speech signal lead them to mark
a boundary in the same way that humans do? If they are getting different
information, or at least a human is getting more information, why would the quail
mark the same boundary?
Doctor Allen: You’re asking the quail to do a one-bit
task, and they find out that they can do a one-bit task, but let them do a
five-bit task. Can they do a five-bit task? If you use nonsense speech, and you
play it to L1 Chinese, L1 English they’ll both perform the task quite well if
you pick the phone, feature set from their language, and they won’t perform it
well if you don’t. So, how much information you get out of speech depends on
which language is being spoken.
Doctor Studdert-Kennedy: Absolutely, as long as you are human, you
can always get information out of some language, and the fact that you get
different information is not as relevant as the fact that you get the same
class of information.
Doctor Allen: Yes, it’s language, you get knowledge about
what the person means.
Doctor Studdert-Kennedy: No, there’s different information because I
get different information about how it’s pronounced in English, and I don’t get
that information for Chinese.
Doctor Allen: I’m certainly not saying that people aren’t
getting information from vocal or auditory inputs, it’s just a matter of how
much context they can take advantage of, it’s how deeply they can process it.
But, this question of SiS, I’m asking, at what point are you trying to say it’s
special? Certainly at some point when you get down to certain languages, it’s
clearly special, but my question is still, at what point is it supposed to be
special, and I still don’t think I have an answer.
Professor Trout: Well,
there are properties that other animals have that humans don’t, and you could
ask the same question about those. I probably can’t discriminate above 16 kHz,
but crickets apparently can discriminate categorically beyond that.
Doctor Allen: Aren’t you saying that it’s special when
it’s at the phonemic level because your example of the quail is with simple CV
stimuli, so that must be sensitive to the point of your argument because speech
is special because people can categorize and discriminate CV sounds.
Professor Trout: You could also talk about lexical items,
longer sequences of sounds, or you could talk about parts of the spectrum,
shorter than CV sequences. You could run them on noise-band speech or on
sinewave analogues, you could do any number of things and see how they react.
Just because a sequence is super-segmental, for example, doesn’t mean that it’s
not a good candidate for being in a specialization experiment.
Professor Remez: In
the Introduction to Acoustic Phonetics,
by Mark Jost, published in 1948 or 1946, he says something like this: We’ve been stuck with these articulatory
descriptions of the phoneme inventory, at least since Jesperson, and now with
the advent of acoustic analysis technology, it will be possible, finally, to
give a rendition of the phoneme inventory in physical acoustical terms. That
was the opening fanfare of a campaign of research that essentially failed. That
is, the phoneme inventory of no language has been well rationalized by the
distributional states of acoustic properties. When I think about the claim that
Kluender has made on the basis of the quail, it seems directly to contradict at
least 50 years of research that says that the phoneme inventory coincides with
auditory categories. In fact, the little model case that they considered with
place variation in quail is a rather selective presentation of an argument that
attempts to refute this tradition—even leaving out all of the localization and
developmental evidence for phonemes. I always took the rude version of the claim
attempting to refute SiS as the claim that the quail data proved that, even
though we can’t say what those particular acoustic properties are, that the
quail shows us, having access to no other aspect of experience than the
auditory experience, that in fact the categories that we call phoneme
categories coincide with auditory categories. I take that to be the claim
opposing SiS, and if you remove their evidence (and I think their evidence is
terrible and generally acknowledged to be terrible), what you are left with is
this problem of incommensurability of the categories of auditory experience and
the categories that language uses to make distinctive contrasts. What’s wrong
with that way to frame the argument?
Professor Trout: Your
question raises another question—Are you supposing that the similarity of
performance is an artifact of the training regimen?
Professor Remez: It’s the same thing as when Tinbergen’s
Stickleback threatened the postal van. No ethologist would claim that the
Stickleback was threatening the postman, even though that was the performance.
It’s essentially a kind of bait and switch, exactly as you claim.
Professor Trout: My
question is that if the similarity in performance is somewhat more complicated
than a threat or non-threat posture, how do you explain the specific character
of the similarity in performance?
Professor Remez: Why do human psychophysics and quail
psychophysics look the same? Well, if you gave me a bunch of quail and a
function to match, I’d figure out a way to do it. And if you gave me 30,000
trials to do it in, I’d be able to do it there, too. One of the things I tried
to find out was what the test items actually were, but I found out that the
test items had been lost to the mists of history, so it’s impossible to tell
what the quail actually heard. My guess is that the quail were listening to
apical release bursts, which is an extremely prominent spectra feature, and
that they weren’t hearing place variation at all. So, it was an accident. Not
only that, of course they put labial and palatal place in the same category.
Professor Trout: I
guess there’s another question why they stopped when they did as well. With
humans in the McGurk effect, you can play that as long as you want and still
find the effect. It would be interesting to see what would happen to the quail
if you continued training them beyond the 4000 trials required to get them to
90%. You wonder why the organism is exposed for that particular period, because
the identification point at which they stopped doesn’t have human significance.
Doctor Allen: Do you know the Stevens work where they
looked at the auditory nerve of a gerbil using whispered voiced and unvoiced
speech, and even in the auditory nerve, the voicing was very clearly
represented. This is just as important as an F2 transition, if not a lot more
important, and it’s represented early on in very simple creatures. Is that
special?
Professor Trout: It
depends on how specialized humans can become on those tasks as well. For
example, in an ethology experiment where you are looking at non-human animals
responding to some speech contrast, it may be that there’s a significant amount
of brain plasticity that is required for just learning, even though the
learning itself displays highly specialized features. Rather than simply saying
that some non-human organism can do this, so is it special when a human can do
it? You have to be able to see what the differences are. Part of the question
is whether the organisms are responding to the same features, that is, the one
is responding to acoustic correlates, and the other is responding to a
linguistic feature.
(Notes prepared by
Jennifer Pardo)