4
___

How Language Evolved from
Hand to Mouth

A tongue is a tongue

And a lung is a lung

In a tale you can shout or sing

Without the gesture? Nothing!

—from “Gesticulate” in the 1953 musical Kismet

The 1866 ban by the Linguistic Society of Paris on all discussions of the evolution of language seems to have had a prolonged effect. The main difficulty, it seems, was (and to some extent still is) the widespread belief that language is uniquely human, so that there is no evidence to be gained from the study of nonhuman animals. This meant that language must have evolved some time since the split of the hominins from the great apes. In the nineteenth century, at least, there was little to be gleaned from fossil evidence, and any theory on how language evolved was largely a matter of speculation—and no doubt argument. Of course evolution was itself contentious, and was vigorously attacked by the church. In the case of language, the conflict between science and religion would have been exacerbated by the long-standing view that language was a gift from God.

Noam Chomsky’s view of language, summarized in chapter 2, also leads to a somewhat miraculous view of how language evolved. The basis for all language, he proposes, is I-language, or the language of thought. Since I-language has no external referents, and must precede the evolution of E-languages—the languages we actually speak or sign—it cannot have arisen through natural selection. It must therefore have emerged as a singular event, in a single individual. In chapter 2, we were introduced to this individual as Prometheus. Chomsky also remarks, “Roughly 100,000+ years ago … there were no languages,”1 which suggests that Prometheus lived after the actual emergence of our species, presumably in Africa.

The view that language emerged de novo in Homo sapiens has been elaborated by others, and is sometimes called the “big bang” theory. Derek Bickerton once wrote that “true language, via the emergence of syntax, was a catastrophic event, occurring within the first few generations of Homo sapiens sapiens.”2 The idea that language emerged very recently is based in part on the view that so-called modern human behavior, as inferred from the archaeological record and presumed to include language, arose within the past 100,000 years, and perhaps as recently as 50,000 years ago. Richard Klein, for example, writes that it becomes “at least plausible to tie the basic behavioral shift at 50 ka to a fortuitous mutation that created the fully modern brain.”3 Timothy Crow has proposed that a genetic mutation gave rise to the speciation of Homo sapiens, along with such uniquely human attributes as language, cerebral asymmetry, theory of mind, and a vulnerability to psychosis.4 That was some bang.

From a Darwinian perspective this view is deeply implausible. Steven Pinker and Paul Bloom were perhaps the first to question the idea that language emerged in a singular event. Language, they observed, is complex, and “The only successful account of the origin of complex biological structure is the theory of natural selection, the view that the differential reproductive success associated with heritable variation is the primary organizing force in the evolution of organisms.”5 They go on to point out that the emergence of complex structure through natural selection is gradual: “The only way for complex design to evolve is through a sequence of mutations with small effects.”6 On a priori grounds, then, it seems highly unlikely that language was the product of a single mutation in some lone Prometheus.

As we saw in the previous chapter, there is still general agreement that the critical ingredient distinguishing most human languages from other forms of animal communication is recursive grammar. Even this ingredient need not have appeared suddenly and fully formed in our own species, and indeed, as we saw in chapter 2, there may be languages such as that of the Pirahã that do not make use of recursion. The concept of grammaticalization, also discussed in chapter 2, suggests scenarios in which grammar unfolds gradually, rather than in a single step. Instead of supposing that it all happened within the past 100,000 years, it seems much more reasonable to suppose that grammatical language evolved slowly—and variably—over the six or seven million years since the hominins parted company with the chimpanzee line, although I suggest below that it was probably the last two million years that were critical. This must still be regarded as something of a big bang in evolutionary terms, but does give the theorist more breathing space to develop a plausible evolutionary scenario. But of course at least some of the other ingredients of language were no doubt present in our primate forebears, and to understand how language evolved we need to reach back in time to these forebears, then forward to hominin evolution, and then try to determine what gave language its power of limitless expression.

In the previous chapter, I showed that the nearest equivalents of language in nonhuman primates lie in manual systems rather than in vocal calls. Manual activity in primates is intentional and subject to learning, whereas vocalizations appear to be largely involuntary and fixed.7 In teaching great apes to speak, much greater success has been achieved through gesture and the use of keyboards than through vocalization, and the bodily gestures of apes in the wild are less constrained by context than are their vocalizations. These observations strongly suggest that language evolved from manual gestures.

The Gestural Origins of Language

The theory that language evolved from manual gestures has a long but checkered history. An early advocate was the eighteenth-century French philosopher Abbé Étienne Bonnot de Condillac. He was interested in how language evolved, but as a priest was on dangerous ground, since the theological view was that language was a gift from God. In order to express his own heretical view he therefore had to present it as a fable.8 He imagined two abandoned children, a boy and a girl, who had not yet acquired language and were wandering about in the desert after the Flood. In order to communicate they used manual gestures. If the boy wanted something out of his reach, “He did not confine himself to cries or sounds only; he used some endeavors to obtain it, he moved his head, his arms, and every part of his body.” These movements were understood by his companion, who was then able to help. Eventually there grew “a language which in its infancy, probably consisted only in contortions and violent agitations, being thus proportioned to the slender capacity of this young couple.”9

The story goes on to explain how articulated sounds came to be associated with gestures, but “the organ of speech was so inflexible that it could not articulate any other than a few simple sounds.”10 Eventually, though, the capacity to vocalize increased, and “appeared as convenient as the mode of speaking by action; they were both indiscriminately used; till at length articulate sounds became so easy, that they absolutely prevailed.”11

On the surface, at least, this story is not about the evolution of language, but is rather about how two stranded children developed a way of communicating. It is likely, though, that Condillac really intended it to be the story of how language evolved as a human faculty, and it was remarkably prescient.

The idea that language evolved from manual gestures has since been proposed many times, although not always accepted. Condillac’s near contemporary, Jean-Jacques Rousseau, evidently unfazed by religious prohibitions, endorsed the gestural theory more openly in an essay published in 1782. Charles Darwin at least pointed (as it were) to it: “I cannot doubt that language owes its origins to the imitation and modification of various natural sounds, and man’s own distinctive cries, aided by signs and gestures.”12 In 1900, Wilhelm Wundt, the founder of the first laboratory of experimental psychology at Leipzig in 1879, wrote a two-volume work on speech, and argued that a universal sign language was the origin of all languages.13 He wrote, though, under the misapprehension that all deaf communities use the same system of signing, and that signed languages are useful only for basic communication, and are incapable of communicating abstract ideas. We now know that signed languages vary widely from community to community, and can have all of the communicative sophistication of speech.

The British neurologist MacDonald Critchley lamented that his book The Language of Gesture coincided with the outbreak of World War II and was therefore largely ignored, so he wrote a second book called Silent Language, which was published in 1975. “Gesture,” he wrote, “is full of eloquence to the sagacious and vigilant onlooker who, holding the key to its interpretation, knows how and what to observe.”14 Critchley was ambivalent about whether language originated in gesture, being at one point unable to accept that language could have been at one time gestural and voiceless, but later arguing that gesture must have predated speech in human evolution.

Perhaps the first comprehensive case for the gestural theory of language origins in modern times was presented by the anthropologist Gordon W. Hewes in an article published in 1973. Hewes was partly motivated by the discovery that great apes could not be taught to speak, but were reasonably successful at using signs to communicate. The gestural theory was strengthened by the work of Ursula Bellugi and Edward S. Klima, revealing American Sign Language (ASL) to be a full language, affected by specific brain injury in very much the same way that spoken language is.15 The gestural theory then seemed to lie dormant for a while. I picked it up in my 1991 book The Lopsided Ape, which was shortly followed by the 1994 book Gesture and the Nature of Language, by William C. Stokoe, David F. Armstrong, and Sherman E. Wilcox, who approached it from the perspective of signed languages. This was followed by Armstrong’s book Original Signs in 1999. I elaborated my own views in my 2002 book, From Hand to Mouth, and in the previous year William C. Stokoe published Language in Hand, with the self-explanatory subtitle Why Sign Came before Speech.16

The gestural theory received a powerful boost, though, with the remarkable discovery of so-called mirror neurons in the primate brain.

Mirror Neurons

In 2000, the neuroscientist Vilayanur Ramachandran famously remarked that mirror neurons would do for psychology what DNA has done for biology17—a remark that is in danger of being quoted almost as often as mirror neurons themselves are invoked. Mirror neurons were discovered in the monkey brain by the Italian scientist Giacomo Rizzolatti and his colleagues at the University of Parma. The activity of these neurons was recorded from electrodes inserted into a part of the frontal cortex called area F5. They form a subset of a class of neurons that are active when the monkey makes an intentional movement of the hand, such as reaching to grasp an object, like a peanut. To Rizzolatti’s initial surprise, some of these neurons also responded when the monkey observed another individual (such as the researcher) making the same movement. These are the neurons dubbed “mirror neurons,” because perception is mirrored in action. They have also been called “monkey see, monkey do” neurons.

The idea that mirror neurons may have set the stage for the eventual evolution of language was first set out by Michael Arbib and Giacomo Rizzolatti.18 The main points are as follows.

First, area F5 is homologous to an area in the human brain known as Broca’s area, which plays a critical role in speech and language. More precisely, Broca’s area can be divided into two areas, known as Brodman areas 44 and 45, and area 44 is considered the true analogue of area F5. In humans, it is now evident that area 44 is involved not only in speech, but also in motor functions unrelated to speech, including complex hand movements, and sensorimotor learning and integration.19 In the course of human evolution, then, it seems that vocalization must have been incorporated into the system, which explains why language can be either vocal, as in speech, or manual, as in signed languages.20

Second, mirror neurons are now understood to be part of a larger network, called the mirror system. In the monkey it includes, besides area F5, more posterior areas such as the superior temporal sulcus and the inferior parietal lobule.21 This system largely over-laps the corresponding regions in the human brain that have to do with the more general functions of language. Besides Broca’s area, these regions include the other well-known language area, Wernicke’s area, in the posterior part of the superior temporal sulcus, although language areas are probably distributed more widely than these two classic areas.22 The overlap has led to the notion that language grew out of the mirror system itself, an idea developed in some detail by Michael Arbib.23

image

Figure 7. Sites of Broca’s area and motor cortex in the human brain (left), and of mirror neurons and motor cortex in the macaque brain (right). Copyright © 2010 W. Tecumseh Fitch. Reprinted with the permission of Cambridge University Press.

Third, Rizzolatti and colleagues proposed that the mirror system in the monkey is in essence a system for understanding action. That is, the monkey understands the actions of others in terms of how it would itself perform those actions. This is the basic idea underlying what has been called the motor theory of speech perception, which holds that we perceive speech, not in terms of the acoustic patterns it creates, but in terms of how we ourselves would articulate it. This theory arose from the work the Alvin Liberman and others at the Haskins Laboratories in the United States, who sought the acoustic principles underlying the basic units of sound that make up our speech.24 For example, b sounds in words like battle, bottle, beer, bug, rabbit, Beelzebub, or flibbertigibbet probably sound much the same to you, but the actual acoustic streams created by these b sounds varies widely, to the point that they actually have virtually nothing in common.25 The same is true of other speech sounds, especially the plosive sounds d, g, p, t, and k; the acoustic signals vary widely depending on the contexts in which they are embedded. Liberman and colleagues concluded that we hear each sound as the same in each case because we “hear” it in terms of how we produce it.

Fourth, we saw in the previous chapter that vocalization in primates seems to be largely involuntary and, for the most part at least, impervious to learning. The mirror system, in contrast to the primate vocalization system, has to do with intentional action, and is clearly modifiable through experience. For example, mirror neurons in the monkey brain respond to the sounds of certain actions, such as the tearing of paper or the cracking of nuts,26 and these responses can only have been learned. The neurons were not activated, though, by monkey calls, suggesting that vocalization itself is not part of the mirror system in monkeys. In our forebears, then, the mirror system was already set up for processing the sounds caused by manual activity, but not for the processing of vocal sounds.

Of course, nonhuman primates do not have language as we know it, but the mirror system provided a natural platform for language to evolve. Indeed the mirror system is now well documented in humans, and involves characteristics that are more language-like than those in the monkey. For example, in the monkey, mirror neurons respond to transitive acts, as in reaching for an actual object, but do not respond to intransitive acts, where a movement is mimed and involves no object.27 In humans, in contrast, the mirror system responds to both transitive and intransitive acts, and the incorporation of intransitive acts would have paved the way to the understanding of acts that are symbolic rather than objectrelated.28 More directly, though, functional magnetic resonance imaging (fMRI) in humans shows that the mirror-neuron region of the premotor cortex is activated not only when they watch movements of the foot, hand, and mouth, but also when they read phrases pertaining to these movements.29 Somewhere along the line, the mirror system became interested in language.

An Evolutionary Scenario

Let us suppose, then, that intentional communication grew out of action understanding in our primate forebears. In the previous chapter, we saw that a number of great apes have learned language-like gestures, suggesting that the incorporation of intransitive gesture had already occurred in our great-ape forebears. These gestures lack grammar, and are therefore not true language, but it is reasonable to suppose that similar activity was a precursor to language in our hominin forebears who separated from the line leading to chimpanzees and bonobos some six or seven million years ago.

Unlike their great-ape cousins, the hominins were bipedal, which would have freed the hands for the further development of expressive manual communication. The body and hands are free to move in four dimensions (three of space and one of time), and so mimic activity in the external world. The hands can also assume, at least approximately, the shapes of objects or animals, and the fingers can mimic the movement of legs and arms. The movements of the hands can also mimic the movement of objects through space, and facial expressions can convey something of the emotions of events being described. Mimesis persists in dance, ballet, and mime, and we all resort to mime when trying to communicate with people who speak a language different from our own. Once, in Russia, I was able to successfully request a bottle opener by miming the action of opening a beer bottle, to the vast amusement of the people at the hotel desk.

Although predominantly bipedal, the early hominins were still partially adapted to arboreal life, and walked only clumsily on two legs. This is known as facultative bipedalism. Merlin Donald, in his 1991 book Origins of the Human Mind, suggested that what he called “mimetic culture” did not evolve until the emergence of Homo ergaster from around two million years ago. In Ergaster and the later members of the genus Homo, moreover, bipedalism shifted from facultative to obligate—that is, it became obligatory, and assumed a more free-striding gait. More critically, perhaps, brain size began to increase dramatically with the emergence of the genus Homo, which might be taken as evidence of selection for more complex communication, and perhaps the beginnings of grammar.

Even in modern humans, mimed action activates the brain circuits normally thought of as dedicated to language. In one experiment, for example, brain activity was recorded while people attended to video clips of a person performing pantomimes of actions, such as threading a needle, or what are called emblems, such as lifting a finger to the lips to indicate quiet. Activity was also recorded while the subjects gave spoken descriptions of these actions. All three activities elicited activity in the left side of the brain in frontal and posterior areas—including Broca’s and Wernicke’s areas—that have been identified since the nineteenth century as the core of the language system. The authors of this study conclude that these areas have to do, not just with language, but with the more general linking of symbols to meaning, whether the symbols are words, gestures, images, sounds, or objects.30

We also know that the use of signed language in the profoundly deaf activates the same brain areas that are activated by speech,31 and indeed modern sign languages are also partly dependent on mime. It has been estimated, for example, that in Italian Sign Language some 50 percent of the hand signs and 67 percent of the bodily locations of signs stem from iconic representations, in which there is a degree of spatiotemporal mapping between the sign and its meaning.32 In American Sign Language, too, some signs are arbitrary, but many more are iconic. For example, the sign for erase resembles the action of erasing a blackboard, and the sign for play piano mimics the action of actually playing a piano.33 But of course signs need not be transparently iconic, and the meanings of even iconic symbols often cannot be guessed by naïve observers, or even by those using a different sign language.34 They also tend to become less iconic and more arbitrary over time, in the interests of speed, efficiency, and grammatical constraints. This process is known as conventionalization.35

The Swiss linguist Ferdinand de Saussure wrote of the “arbitrariness of the sign” as a defining property of language,36 and on this basis it is sometimes supposed that signed languages, with their strong basis in iconic representations, are not true languages. Although most words in spoken languages are indeed arbitrary—the words cat and dog in no way resemble those friendly animals or the sounds that they make—there are of course some words that are onomatopoeic. One such word is zanzara, which is the evocative Italian word for mosquito, and Steven Pinker notes a number of newly minted examples: oink, tinkle, barf, conk, woofer, tweeter.37 We should perhaps add twitter. Speech can also mimic visual properties in subtle ways; for example, it has been shown that speakers tend to raise the pitch of their voice when describing an object moving upwards, and lower it in describing a downward movement.38 The arbitrariness of words (or morphemes) is not so much a necessary property of language, though, as a matter of expedience, and of the constraints imposed by the particular language medium.

image

Figure 8. Signs for TREE in different sign languages. Although all are fundamentally iconic, they differ markedly. Iconic representations are not always transparent to the viewer not conversant with the language (author’s drawing).

Speech, for example, requires that the information be linearized, piped into a sequence of sounds that are necessarily limited in terms of how they can capture the spatial and physical natures of what they represent. The linguist Charles Hockett put it this way:

When a representation of some four-dimensional hunk of life has to be compressed into the single dimension of speech, most iconicity is necessarily squeezed out. In one-dimensional projections, an elephant is indistinguishable from a woodshed. Speech perforce is largely arbitrary; if we speakers take pride in that, it is because in 50,000 years or so of talking we have learned to make a virtue of necessity.39

Signed languages are clearly less constrained. The hands and arms can mimic the shapes of real-world objects and actions, and to some extent lexical information can be delivered in parallel instead of being forced into rigid temporal sequence. With the hands, it is almost certainly possible to distinguish an elephant from a woodshed, in purely visual terms. Even so, conventionalization allows signs to be simplified and speeded up, to the point that many of them lose most or all of their iconic aspect. For example, in American Sign Language the sign for home was once a combination of the sign for eat, which is a bunched hand touching the mouth, and the sign for sleep, which is a flat hand on the cheek. Now it consists of two quick touches on the cheek, both with a bunched handshape, so the original iconic components are effectively lost.40

Although signing has been recorded from as early as 360 BC,41 modern signed languages have short pedigrees, arising independently among different deaf communities. This somewhat contaminates the comparison between signed and spoken languages, which have evolved, albeit with modification and divergence, over tens of thousands of years. One interesting exception is Turkish Sign Language, which has highly schematized morphology and an exceptionally large proportion of arbitrary, noniconic signs.42 Turkish Sign Language may go back over 500 years. Visitors to the Ottoman court in the sixteenth century observed that mute servants, most of them deaf, were favored in the court, probably because they could not be bribed for court secrets. These servants developed a sign language, which was also acquired by many of the courtiers. A photograph published in 1917 shows two servants still using sign language. It is not known for sure whether modern Turkish Sign Language is related to that of the Ottoman court, but if it is, it supports the view that the passage of time is the critical element in the loss of iconic representation.43

Language, then, may have evolved from mime, with the arbitrary nature of some words or signs deriving from the drive to greater economy and from the constraints of the medium through which language is expressed. Conventionalization of course depends on the power of the brain to form associations, since the iconic or onomatopoeic component that may serve to indicate meaning is effectively lost. We know from the studies of Kanzi and the other “linguistic apes” that the ability to form such associations is not unique to humans, although the human capacity to do so may far outstretch that of the ape. Kanzi has learned a few hundred symbols, but the average literate human has a vocabulary on the order of 50,000 words, most of them neither iconic nor onomatopoeic.44 This may be one reason why the human brain is some three times larger relative to body size than that of the great apes.45 We simply need a much larger dictionary.

Once conventionalization sets in, there is no reason why language need be restricted to the visual domain. Spoken words will do as well as signed ones. But this of course raises the question of why language switched, at least in the majority of humans, from manual gesture to speech.

The Switch

This indeed has proven to be the most contentious issue for the gestural theory. The linguist Robbins Burling, for example, writes

[T]he gestural theory has one nearly fatal flaw. Its sticking point has always been the switch that would have been needed to move from a visual language to an audible one.46

In another recent book, Peter F. MacNeilage expresses similar concerns,47 but I will argue that the switch was probably a relatively simple and natural one.

The first step from manual gesture to speech may have been the incorporation of facial gestures. Even in the monkey, manual and facial gestures are closely linked neurophysiologically, as well as behaviorally.48 Some neurons in area F5 fire when the animal makes movements to grasp an object with either the hand or the mouth. An area in the monkey brain that is considered homologous to Broca’s area is involved in control of the orofacial musculature—though not of speech itself.49 These neural links between hand and mouth may be related to eating rather than communication, perhaps involved in preparing the mouth to hold an object after the hand has grasped it, but later adapted for gestural and finally vocal language.

The connection between hand and mouth can also be demonstrated in human behavior. In one experiment, people were instructed to open their mouths while grasping objects with their hands, and the size of the mouth opening increased with the size of the grasped object. Conversely, when they open their hands while grasping objects with their mouths, the size of the hand opening also increased with the size of the object.50 Grasping movements of the hand also affect the way we utter sounds. If people are asked to say ba while grasping an object, or even while watching someone else grasping an object, the syllable itself is affected by the size of the object grasped. The larger the object, the wider the opening of the mouth, with consequent effects on the speech formants.51 These effects can also be observed in one-year-old infants.52

The link between hand and mouth suggests that early communicative gestures may have involved facial gestures as well as gestures of the hands. Indeed, speech itself retains a strong visual component. This is illustrated by an effect attributed to the psychologist Harry McGurk,53 in which dubbing sounds onto a mouth that is saying something different alters what the hearer actually hears; that is, the viewer/listener often reports what the speaker is seen to be saying rather than the speech sound itself, or sometimes a blend of the two. Deaf people often become highly proficient at lipreading, and even in individuals with normal hearing the brain areas involved in speech production are activated when they view speech-related lip movements.54 Ventriloquists project their own voices onto the face of a dummy by synchronizing the mouth movements of the dummy with their own pursed-lipped utterances.

The signed languages of the deaf also depend at least as much on movements of the face as of the hands. Facial expressions and head movements can alter the mood or polarity of a sentence. For example, a question in American Sign Language is signaled by raising the eyebrows, and shaking the head while signing a sentence turns it from affirmative to negative. Movements of the mouth are especially important, to the point that linguists are beginning to identify the rules that govern the formation of mouthed signs.55 Mouth gestures can also serve to disambiguate hand gestures, and as part of more general facial gestures provide the visual equivalent of prosody in speech.56 Recordings of eye movements show that users of British Sign Language watch the face more often than they watch the hands or body when watching a person signing a story.57

The first part of the switch, then, was probably the increasing involvement of the face. The idea that movements of the face played a role in the evolution of language was anticipated by Friedrich Nietzsche in his 1878 book Human, All Too Human. Aphorism 216 from that book reads in part as follows;

Imitation of gesture is older than language, and goes on involuntarily even now, when the language of gesture is universally suppressed, and the educated are taught to control their muscles. The imitation of gesture is so strong that we cannot watch a face in movement without the innervation of our own face (one can observe that feigned yawning will evoke natural yawning in the man who observes it). The imitated gesture led the imitator back to the sensation expressed by the gesture in the body or face of the one being imitated. This is how we learned to understand one another; this is how the child still learns to understand its mother. In general, painful sensations were probably also expressed by a gesture that in its turn caused pain (for example, tearing the hair, beating the breast, violent distortion and tensing of the facial muscles). Conversely, gestures of pleasure were themselves pleasurable and were therefore easily suited to the communication of understanding (laughing as a sign of being tickled, which is pleasurable, then served to express other pleasurable sensations).

As soon as men understood each other in gesture, a symbolism of gesture could evolve. I mean, one could agree on a language of tonal signs, in such a way that at first both tone and gesture (which were joined by tone symbolically) were produced, and later only the tone.

This remarkable extract also anticipates the discovery of the mirror system.

The final act, then, was the incorporation of vocalization. Part of the reason for this may have been that facial gestures increasingly involved movements of the tongue that are invisible, and activation of the vocal cords simply allowed these invisible gestures to be accessible. Speech might be described as facial gesture half swallowed, with sound added. Along with the incorporation of vocal sound, the vocal tract itself changed, and control of the tongue was enhanced, enabling a wider diversity of sounds. The sound itself could be turned on or off to create the distinction between voiced sounds, such as the consonants /b/, /d/, and /g/, and their unvoiced equivalents, /p/, /t/, and /k/.

Speech itself can be viewed as gestural rather than acoustic.58 The motor theory of speech perception, described above, is based on the idea that perceiving speech sounds depends on the mapping of those sounds on to the articulatory gestures that produce them. This has led to what is known as articulatory phonology,59 in which speech is understood as gestures produced by six articulatory organs, the lips, the velum, the larynx, and the blade, body, and root of the tongue. In the context of speech understood as gesture, then, the incorporation of vocal gestures into the mirror system may have been a relatively small step for mankind.

But how and when did it happen?

FOXP2

A possible clue comes from genetics. About half of the members of three generations of an extended family in England, known as the KE family, are affected by a disorder of speech and language. The disorder is evident from the affected child’s first attempts to speak and persists into adulthood.60 The disorder is now known to be due to a point mutation on the FOXP2 gene (forkhead box P2) on chromosome 7.61 For normal speech to be acquired, you need two functional copies of this gene.

The FOXP2 gene, then, may have played a role in incorporating vocalization into the mirror system, thus allowing speech to develop as an intentional, learnable system.62 As we saw earlier, one of the main brain centers for the production of speech is Broca’s area, which is also part of the mirror system. Typically, Broca’s area is activated when people generate words. A brain-imaging study revealed, though, that the members of the KE family affected by the mutation, unlike their unaffected relatives, showed no activation in Broca’s area while silently generating verbs.63 The FOXP2 gene may also play a role in other species. In songbirds, knockdown of the FOXP2 gene impairs the imitation of song,64 and insertion of the FOXP2 point mutation found in the KE family into the mouse critically alters synaptic plasticity and motor learning.65 On the other hand, insertion of the normal human variant of FOXP2 into the mouse gives rise to qualitatively different ultrasonic variations.66 The role of FOXP2 in foxes is unknown.

FOXP2 is highly conserved in mammals, and in humans the gene differs in only three places from that in the mouse. Nevertheless it underwent two mutations since the split between hominin and chimpanzee lines. According to one theoretical estimate, the more recent of these occurred “not more than” 100,000 years ago,67 although the error associated with this estimate makes it not unreasonable to suppose that it coincided with the emergence of Homo sapiens around 170,000 years ago. This conclusion is seemingly contradicted, though, by recent evidence that the mutation is also present in the DNA of a 45,000-year-old Neandertal fossil,68 suggesting that it goes back at least 700,000 years to the common ancestor of humans and Neandertals.69

But this is challenged in turn by a more recent phylogenetic dating of the haplotype, in which the time of the most recent common ancestor carrying the FOXP2 mutation was estimated at 42,000 years ago, with 95 percent confidence that it occurred between 38,000 and 45,500 years ago.70 Even allowing for distortions in assumptions underlying this estimate, it is much more consistent with the earlier molecular-based estimate than with that based on the Neandertal fossil. It is possible that the Neandertal DNA was contaminated, or that there was some degree of inbreeding between Neandertal and Homo sapiens, who overlapped in Europe for some 20,000 years. Recent evidence suggests that microcephalin, a gene involved in regulating brain size, may have entered the human gene pool through interbreeding with Neandertals,71 so the reverse possibility of FOXP2 entering the late Neandertal gene pool from Homo sapiens is not completely ruled out. We might have been slightly chummier with the Neandertals than is generally thought.72

It may well be that the FOXP2 gene holds the secret of the emergence of speech. It may nevertheless be a chimera, since it is known to be involved in a wide range of areas, not only in the brain, but also in the heart, lung, and gut.73 Moreover, the mutated gene in the KE family is not the same as that in the last common ancestor of chimpanzee and human. Even so, it continues to attract interest as a likely clue to how speech evolved.74

Anatomical Changes

Be that as it may, there is other evidence that the final touches making autonomous speech possible may not have been complete in the Neandertal. One requirement for articulate speech was the lowering of the larynx, creating a right-angled vocal tract that allows us to produce the wide range of vowels that characterize speech. Philip Lieberman has argued that this modification was incomplete even in the Neandertals, a species of Homo with brains as large as those of Homo sapiens75—slightly larger, in fact, but let’s not go there. We share a common ancestry with the Neandertals going back some 700,000 years, but parted company some 370,000 years ago before again sharing territory in Europe from around 50,000 years ago.76 The Neandertals were driven to extinction around 30,000 years ago, suggesting that we humans may have talked them out of existence—perhaps a more palatable notion, as it were, than that we simply slaughtered them.

Lieberman’s views on this are controversial.77 In direct contradiction, Tattersall writes that “a vocal tract had … been achieved among humans well over half a million years before we have any independent evidence that our forebears were using language or speaking.”78 The emergence of language, in Tattersall’s view as in Chomsky’s, was thus a later event, unrelated to the capacity for speech. Lieberman has nevertheless received support from his son Daniel Lieberman, who has shown that the structure of the cranium underwent changes after we split with the Neandertals. One such change is the shortening of the sphenoid, the central bone of the cranial base from which the face grows forward, resulting in a flattened face.79 This flattening may have been part of the change that created the right-angled vocal tract, with horizontal and vertical components of equal length.80 This is the modification that allowed us the full range of vowel sounds, from ah to oo. Another adaptation unique to H. sapiens is neurocranial globularity, defined as the roundness of the cranial vault in the sagittal, coronal, and transverse planes,81 which is likely to have increased the relative size of the temporal and frontal lobes relative to other parts of the brain.

Other anatomical evidence suggests that the anatomical requirements for fully articulate speech were probably not complete until late in the evolution of Homo. For example, the hypoglossal canal is much larger in humans than in great apes, suggesting that the hypoglossal nerve, which innervates the tongue, is also much larger in humans, perhaps reflecting the importance of tongued gestures in speech. The evidence suggests that the size of the hypoglossal canal in early australopithecines, and perhaps in Homo habilis, was within the range of that in modern great apes, while that of the Neandertal and early H. sapiens skulls was contained well within the modern human range,82 although this has been disputed.83 A further clue comes from the finding that the thoracic region of the spinal cord is relatively larger in humans than in nonhuman primates, probably because breathing during speech involves extra muscles of the thorax and abdomen. Fossil evidence indicates that this enlargement was not present in the early hominins or even in Homo ergaster, dating from about 1.6 million years ago, but was present in several Neandertal fossils.84

Emboldened by such evidence, and no doubt heartened by familial support, Philip Lieberman has recently made the radical claim that “fully human speech anatomy first appears in the fossil record in the Upper Paleolithic (about 50,000 years ago) and is absent in both Neandertals and earlier humans.”85 This provocative statement suggests that articulate speech emerged even later than the arrival of Homo sapiens some 150,000 to 200,000 years ago. While this may be an extreme conclusion, the bulk of evidence does suggest that autonomous speech emerged very late in the human repertoire. Perhaps the critical question is whether the capacity to speak was unique to humans, or whether we shared it with the Neandertals. I return to this question in chapter 12.

Why the Switch?

It has become clear that the signed languages of the deaf have all of the linguistic sophistication of spoken languages. At Gallaudet University, in Washington, DC, instruction is entirely in American Sign Language, and includes all the usual academic disciplines, even poetry. In some respects, signed languages may have an advantage, since the greater iconic component can provide extra clues to meaning.

This aside, then, the advantages of speech over manual language are likely to be practical rather than linguistic. Let’s consider what these advantages might be.

Spatial Reach

Sound reaches into areas inaccessible to sight. You can talk to a person who is hidden from sight, whereas signed language requires visual contact. This has the important advantage of allowing communication at night, especially during the times when there was no artificial lighting, except perhaps for the campfire. The San, a modern hunter-gatherer society, are known to talk late at night, sometimes all through the night, to resolve conflict and share knowledge.86 Mary Kingsley, the noted British explorer of the late nineteenth century, made the following observation of tribes she encountered in Africa:

African languages [are not elaborate enough] to enable a native to state his exact thought. Some of them are very dependent upon gesture. When I was with the Fans they frequently said “We will go to the fire so that we can see what they say,” when any question had to be decided after dark, and the inhabitants of Fernando Po, the Bubis, are quite unable to converse with each other unless they have sufficient light to see the accompanying gestures of the conversation.87

While this may seem condescending, it may well be the case that some cultures make more use of gesture than others,88 through cultural rather than biological necessity.

Vocal language can carry over longer distances than signed language, but its reach is not simply one of distance. You can talk to someone in the next room, or even while the person’s eyes are closed. Many an intelligent question after a talk has come from a member (usually elderly) of the audience who has appeared to be asleep throughout the talk. Signed language requires that the recipient is actually looking at the signer, whereas speech enters the ears regardless of how the listener is oriented or placed relative to the speaker. We speak of the line of sight, but the envelope of sound—and a three-dimensional envelope at that. Speech works over longer distances than signing, and can more readily command attention. You can wake a sleeping audience by shouting, but no amount of silent gesture will do the trick.

Speech gains these physical advantages with little cost compared to gesture. Teachers of sign language have been known to report needing regular massages in order to meet the sheer physical demands of sign-language expression. In contrast, the physiological costs of speech are so low as to be nearly unmeasurable.89 In terms of expenditure of energy, speech adds little to the cost of breathing, which we must do anyway to sustain life.

Diversity and the Language Fortress

In chapter 2, I noted that there is huge diversity in the world’s languages, allowing language to serve as a badge of a culture’s distinctiveness. There may also have been pressure to establish languages that were impenetrable to outsiders, thereby enhancing group membership and keeping out intruders and freeloaders. The absence of any iconic component in sound-based languages reduces penetrability, and the sheer diversity of possible sound-based systems adds further to the fortress-like nature of language. Although there are diverse sign languages, the diversity is almost certainly tiny compared with that possible in the deployment of speech sounds, even allowing for the fact that existing signed languages have a much shorter pedigree.

The impenetrability of speech is well illustrated by an anecdote from World War II. The outcome of the war in the Pacific hinged to a large extent on the ability of each side to crack codes developed by the other side. Early in the war, the Japanese were easily able to crack the codes developed by the allies, but the American military then developed a code that proved essentially impenetrable. They simply employed Navajo speakers to communicate openly in their own language via walkie-talkie. To the Japanese, Navajo simply sounded like a “strange, gurgling” sound, unrecognizable even as language.90

Nicholas Evans asserts that there are well over 1,500 possible speech sounds, but no language uses more than 10 percent of them for its inventory of phonemes. As we have seen, phonemes do not map exactly onto actual sounds, but some measure of the variation in sounds used can be gained by comparing the number of phonemes identified for different languages. The most parsimonious of languages appears to be that spoken by women of the Pirahã, the small hunter-gatherer Amazonian tribe in Brazil that we encountered in chapter 2. The Pirahã language as spoken by women has only seven consonants and three vowels.91 As though to compensate for a male lack of verbal fluency, the men are permitted eight consonants and three vowels; with 11 phonemes, they tie with Hawaiian for the second most parsimonious language.92 New Zealand Maori has only 14 phonemes, but Maori are known for fine oratory. In Maori society, as in other traditionally oral societies, speech implies power and status; the New Zealand scholar Anne Salmond writes that, among the Maori, “oratory is the prime qualification for entry into the power game.”93 The most phonologically diverse language may be !Xóõ, spoken by about 4,000 people in Botswana and Namibia, which has somewhere between 84 and 159 consonants.94 English bumbles along with about 44 phonemes.

Different languages, then, can in principle use inventories of sounds that are totally distinct, or very nearly so. Evans gives the example of the |Gui language of Botswana, which includes a large number of click sounds unpronounceable to those who don’t speak the language. Conversely, a Japanese colleague of Evans found the |Gui quite unable to pronounce her own name, which is Mimmi—straightforward enough to speakers of Western languages. Even within Western languages, there are sounds in one language that speakers of other languages cannot produce. Japanese people have difficulty distinguishing between l- and r-sounds, creating difficulties with words like parallel or Dylan Thomas’s Llareggub.95 Speakers of English have comparable difficult distinguishing the different t-sounds of Hindi, and I never could quite get the precise form of the French r-sound—aarrghh!

Paradoxically, none of the sounds used in speech poses a problem to children exposed from an early age. There is no reason to suppose a child born to a |Gui family but raised from birth in New York would have any difficulty learning to speak New York English, which to me sounds only slightly peculiar, and would no doubt eventually lose any ability to learn |Gui. On the other hand, Woody Allen would have easily learned |Gui had he been exposed to it from infancy. It has been claimed that very young infants can discriminate between many, if not all, of the phonemes of the world’s languages,96 but by the age of about one they can discriminate only the phonemes of the language or languages they have been exposed to.

Little babies have the potential to crack the language barrier erected by different groups, but by the time we are adults, language makes outsiders of us all.

Freeing the Hands

Charles Darwin wrote: “We might have used our fingers as efficient instruments [of communication], for a person with practice can report to a deaf man every word of a speech rapidly delivered at a public meeting; but the loss of our hands, while thus employed, would have been a serious inconvenience.”97 Darwin made this point to account for why we evolved speech rather than signed language, but the argument holds equally as an explanation for a switch from manual gesture to speech.

The switch, then, would have freed the hands for other activities, such as carrying and manufacture. It also allows people to speak and use tools at the same time. It might be regarded, in fact, as an early example of miniaturization, whereby gestures are squeezed from the upper body to the mouth. It also allows the development of pedagogy, enabling us to explain skilled actions while at the same time demonstrating them, as in a modern television cooking show.98 The freeing of the hands and the parallel use of speech may have led to significant advances in technology,99 and help explain why humans eventually predominated over other large-brained hominins, including the Neandertals, who died out some 30,000 years ago. I take up this theme in more detail in chapter 12, but suffice to say here that our newfound manual freedom may have been primarily responsible for the evolution of what anthropologists call “modernity,” which makes us so markedly different from all of our hominin cousins.

And Still We Gesture

Of course there are also advantages to visual language over vocal language. Vocal language is denied to those unable to hear or to speak, and signed languages form a natural substitute. Visual language still permits an iconic element, and most people resort to gesture, or drawing, when trying to communicate with those who speak a different language, or even when trying to explain spatial concepts, such as a spiral. Some sort of manual gesture is necessary even for the acquisition of speech; in learning the names of objects, for example, there must be some means of indicating which object has which name. Even adults gesture as they speak, and their gestures can add a significant component of meaning.100 For example, people regularly point to indicate directions: He went that way …, accompanied by pointing. Language has still not fully escaped its manual origins. Indeed, speech could scarcely exist without gesture, some way to relate words to the physical world. As Ludwig Wittgenstein put it, “It isn’t the colour red that takes the place of the word red, but the gesture that points to a red object.”101

And of course what people do is often as eloquent as the things they say. In Shakespeare’s Henry VIII, Norfolk says this of Cardinal Wolsey:

Some strange commotion

Is in his brain; he bites his lip and starts;

Stops on a sudden, looks upon the ground,

Then, lays his finger on his temple; Straight,

Springs out into a fast gait; then, stops again,

Strikes his breast hard; and anon, he casts

His eye against the moon: In most strange postures

We have seen him set himself.

In this chapter, we have come a long way from Prometheus, that solitary pioneer who suddenly found himself possessed of language. Even so, we have seen some evidence of late changes, perhaps restricted to Homo sapiens, that gave rise to the power of speech. This may even have involved mutations, such as that of the FOXP2 gene. But this does not mean of course that language emerged late. Along the way, though, as we progressed from manual gesture to speech, there is still one missing ingredient—grammar. How and when did language, whether spoken or signed, acquire its distinctive property of generativity, the power to create a potentially infinite number of different sentences? For an answer to that, we need to digress first through two other recursive capacities arguably unique to humans, episodic memory and mental time travel. We shall then find one likely source of grammatical language in the human ability to transcend time.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset