Vocal learning explained

Vocal learning is the ability to modify acoustic and syntactic sounds, acquire new sounds via imitation, and produce vocalizations. "Vocalizations" in this case refers only to sounds generated by the vocal organ (mammalian larynx or avian syrinx) as opposed to by the lips, teeth, and tongue, which require substantially less motor control.^[1] A rare trait, vocal learning is a critical substrate for spoken language and has only been detected in eight animal groups despite the wide array of vocalizing species; these include humans, bats, cetaceans, pinnipeds (seals and sea lions), elephants, and three distantly related bird groups including songbirds, parrots, and hummingbirds. Vocal learning is distinct from auditory learning, or the ability to form memories of sounds heard, a relatively common trait which is present in all vertebrates tested. For example, dogs can be trained to understand the word "sit" even though the human word is not in its innate auditory repertoire (auditory learning). However, the dog cannot imitate and produce the word "sit" itself as vocal learners can.

Classification

Historically, species have been classified into the binary categories of vocal learner or vocal non-learner based on their ability to produce novel vocalizations or imitate other species, with evidence from social isolation, deafening studies, and cross-fostering experiments. However, vocal learners exhibit a great deal of plasticity or variation between species, resulting in a spectrum of ability. The vocalizations of songbirds and whales have a syntactic-like organization similar to that of humans but are limited to Finite-State Grammars (FSGs), where they can generate strings of sequences with limited structural complexity.^[2] Humans, on the other hand, show deeper hierarchical relationships, such as the nesting of phrases within others, and demonstrate compositional syntax, where changes in syntactic organization generate new meanings, both of which are beyond the capabilities of other vocal learning groups^[3]

Vocal learning phenotype also differ within groups and closely related species will not display the same abilities. Within avian vocal learners, for example, zebra finch songs only contain strictly linear transitions that go through different syllables in a motif from beginning to end, yet mockingbird and nightingale songs show element repetition within a range of legal repetitions, non-adjacent relationships between distant song elements, and forward and backward branching in song element transitions.^[4] Parrots are even more complex as they can imitate the speech of heterospecifics like humans and synchronize their movements to a rhythmic beat.^[5]

Continuum hypothesis

Even further complicating the original binary classification is evidence from recent studies that suggests that there is greater variability in a non-learner's ability to modify vocalizations based on experience than previously thought. Findings in suboscine passerine birds, non-human primates, mice, and goats, has led to the proposal of the vocal learning continuum hypothesis by Erich Jarvis and Gustavo Arriaga. Based on the apparent variations seen in various studies, the continuum hypothesis reclassifies species into non-learner, limited vocal learner, moderate vocal learning, complex vocal learner and high vocal learner categories where higher tiers have fewer species. Under this system, previously identified non-human vocal learners like songbirds are considered complex learners while humans fall under the “high” category; non-human primates, mice, and goats, which are traditionally classified as non-learners, are considered limited vocal learners under this system. Recent work, while generally acknowledging the usefulness of this richer view of vocal learning, has pointed out conceptual and empirical limitations of the vocal learning continuum hypothesis, suggesting more species and factors should be taken into account.^[6] ^[7]

Evidence of vocal learning in various species

Known vocal learners

Birds

See also: avian brain. The most extensively studied model organisms of vocal learning are found in birds, namely songbirds, parrots, and hummingbirds. The degree of vocal learning in each specific species varies. While many parrots and certain songbirds like canaries can imitate and spontaneously combine learned sounds during all periods of their life, other songbirds and hummingbirds are limited to a certain songs learned during their critical period.

Bats

The first evidence for audio-vocal learning in a non-human mammal was produced by Karl-Heinz Esser in 1994. Hand-reared infant lesser spear-nosed bats (Phyllostomos discolor) were able to adapt their isolation calls to an external reference signal. Isolation calls in a control group that had no reference signal did not show the same adaptation.^[8]

Further evidence for vocal learning in bats appeared in 1998 when Janette Wenrick Boughman studied female greater spear-nosed bats (Phyllostomus hastatus). These bats live in unrelated groups and use group contact calls that differ among social groups. Each social group has a single call, which differs in frequency and temporal characteristics. When individual bats were introduced to a new social group, the group call began to morph, taking on new frequency and temporal characteristics, and over time, calls of transfer and resident bats in the same group more closely resembled their new modified call than their old calls.^[9]

Cetaceans

Whales

Male humpback whales (Megaptera novaeangliae) sing as a form of sexual display while migrating to and from their breeding grounds. All males in a population produce the same song which can change over time, indicating vocal learning and cultural transmission, a characteristic shared by some bird populations. Songs become increasingly dissimilar over distance and populations in different oceans have dissimilar songs.

Whale songs recorded along the east coast of Australia in 1996 showed introduction of a novel song by two foreign whales who had migrated from the west Australian coast to the east Australian coast. In just two years, all members of the population had switched songs. This new song was nearly identical to ones sung by migrating humpback whales on the west Australian Coast, and the two new singers who introduced the song are hypothesized to have introduced the new "foreign" song to the population on the east Australian coast.^[10]

Vocal learning has also been seen in killer whales (Orcinus orca). Two juvenile killer whales, separated from their natal pods, were seen mimicking cries of California sea lions (Zalophus californianus) that were near the region they lived in. The composition of the calls of these two juveniles were also different from their natal groups, reflecting more of the sea lion calls than that of the whales.^[11]

Dolphins

Captive bottlenose dolphins (Tursiops truncatus) can be trained to emit sounds through their blowhole in open air. Through training, these vocal emissions can be altered from natural patterns to resemble sounds like the human voice, measurable through the number of bursts of sound emitted by the dolphin. In 92% of exchanges between humans and dolphins, the number of bursts equaled ±1 of the number of syllables spoken by a human.^[12] Another study used an underwater keyboard to demonstrate that dolphins are able to learn various whistles in order to do an activity or obtain an object. Complete mimicry occurred within ten attempts for these trained dolphins.^[13] Other studies of dolphins have given even more evidence of spontaneous mimicry of species-specific whistles and other biological and computer-generated signals.^[14]

Such vocal learning has also been identified in wild bottlenose dolphins. Bottlenose dolphins develop a distinct signature whistle in the first few months of life, which is used to identify and distinguish itself from other individuals. This individual distinctiveness could have been a driving force for evolution by providing higher species fitness since complex communication is largely correlated with increased intelligence. However, vocal identification is present in vocal non-learners as well. Therefore, it is unlikely that individual identification was a primary driving force for the evolution of vocal learning. Each signature whistle can be learned by other individuals for identification purposes and are used primarily when the dolphin in question is out of sight. Bottlenose dolphins use their learned whistles in matching interactions, which are likely to be used while addressing each other, signalling alliance membership to a third party, or preventing deception by an imitating dolphin.^[15]

Mate attraction and territory defense have also been seen as possible contributors to vocal learning evolution. Studies on this topic point out that while both vocal learners and non-learners use vocalizations to attract mates or defend territories, there is one key difference: variability. Vocal learners can produce a more varied arrangement of vocalizations and frequencies, which studies show may be more preferred by females. For example, Caldwell^[16] observed that male Atlantic bottlenose dolphins may initiate a challenge by facing another dolphin, opening its mouth, thereby exposing its teeth, or arching its back slightly and holding its head downward. This behavior is more along the lines of visual communication but still may or may not be accompanied by vocalizations such as burst-pulsed sounds. The burst-pulsed sounds, which are more complex and varied than the whistles, are often utilized to convey excitement, dominance or aggression such as when they are competing for the same piece of food.^[17] The dolphins also produce these forceful sounds when in the presence of other individuals moving towards the same prey. On the sexual side, Caldwell saw that dolphins may solicit a sexual response from another by swimming in front of it, looking back, and rolling on its side to display the genital region.^[18] These observations provide yet another example of visual communication where dolphins exhibit different postures and non-vocal behaviors to communicate with others that also may or may not be accompanied by vocalizations. Sexual selection for greater variability, and thus in turn vocal learning, may then be a major driving force for the evolution of vocal learning.

Seals

Captive harbor seals (Phoca vitulina) were recorded mimicking human words such as "hello", "Hoover" (the seal's own name) and producing other speech-like sounds. Most of the vocalizations occurred during the reproductive season.^[19]

More evidence of vocal learning in seals occurs in southern elephant seals (Mirounga leonine). Young males imitate the vocal cries of successful older males during their breeding season. northern and southern elephant seals have a highly polygynous mating system with a vast disparity in mating success. In other words, few males guard huge harems of females, eliciting intense male-male competition. Antagonistic vocal cries play an important role in inter-male competitions and are hypothesized to demonstrate the resource-holding potential of the emitter. In both species, antagonistic vocal cries vary geographically and are structurally complex and individually distinct. Males displays unique calls, which can be identified by the specific arrangement of syllable and syllable parts.

Harem holders frequently vocalize to keep peripheral males away from females, and these vocalizations are the dominant component in a young juvenile's acoustic habitat. Successful vocalizations are heard by juveniles, who then imitate these calls as they get older in an attempt to obtain a harem for themselves. Novel vocal types expressed by dominant males spread quickly through populations of breeding elephant seals and are even imitated by juveniles in the same season.

Genetic analysis indicated that successful vocal patterns were not passed down hereditarily, indicating that this behavior is learned. Progeny of successful harem holders do not display their father's vocal calls and the call that makes one male successful often disappears entirely from the population.^[20]

Elephants

Mlaika, a ten-year-old adolescent female African elephant, has been recorded imitating truck sounds coming from the Nairobi-Mombasa highway three miles away. Analysis of Mlaika's truck-like calls show that they are different from the normal calls of African elephants, and that her calls are a general model of truck sounds, not copies of the sounds of trucks recorded at the same time of the calls. In other words, Mlaika's truck calls are not imitations of the trucks that she hears, but rather, a generalized model she developed over time.

Other evidence of vocal learning in elephants occurred in a cross-fostering situation with a captive African elephant. At the Basel Zoo in Switzerland, Calimero, a male African elephant, was kept with two female Asian elephants. Recordings of his cries shows evidence of chirping noises, typically only produced by Asian elephants. The duration and frequency of these calls differs from recorded instances of chirping calls from other African elephants and more closely resembles the chirping calls of Asian elephants.^[21]

Controversial or limited vocal learners

The following species are not formally considered vocal learners, but some evidence has suggested they may have limited abilities to modify their vocalizations. Further research is needed in these species to fully understand their learning abilities.

Non-human primates

Early research asserted that primate calls are fully formed at an early age in development, yet recently some studies have suggested these calls are modified later in life.^[22] In 1989, Masataka and Fujita cross-fostered Japanese and rhesus monkeys in the same room and demonstrated that foraging calls were learned directly from their foster mothers, providing evidence of vocal learning.^[23] However, when another independent group was unable to reproduce these results, Masataka and Fujita's findings were questioned. Adding to the evidence against vocal learning in non-human primates is the suggestion that regional differences in calls may be attributed to genetic differences between populations and not vocal learning.^[24]

Other studies argue that non-human primates do have some limited vocal learning ability, demonstrating that they can modify their vocalizations in a limited fashion through laryngeal control and lip movements.^[25] ^[26] For example, chimpanzees in both captivity and in the wild have been recorded producing novel sounds to attract attention. By puckering their lips and making a vibrating sounds, they can make a "raspberry" call, which has been imitated by both naïve captive and wild individuals. There is also evidence of an orangutan learning to whistle by copying a human, an ability previously unseen in the species. A cross-fostering experiment with marmosets and macaques showed convergence in pitch and other acoustic features in their supposedly innate calls, demonstrating the ability, albeit limited, for vocal learning.

Mice

Mice produce long sequences of vocalizations or "songs" that are used for both isolation calls in pups when cold or removed from nest and for courtship when males sense a female or detect pheromones in their urine. These ultrasonic vocalizations consist of discrete syllables and patterns, with species-specific differences. Males tend to use particular syllable types that can be used to differentiate individuals.^[27]

There has been intense debate on whether these songs are innate or learned. In 2011, Kikusui et al. cross-fostered two strains of mice with distinct song phenotypes and discovered that strain-specific characteristics of each song persisted in the offspring, indicating that these vocalizations are innate.^[28] However, a year later work by Arriaga et al. contradicted these results as their study found a motor cortex region active during singing, which projects directly to brainstem motor neurons and is also important for keeping songs stereotyped and on pitch. Vocal control by forebrain motor areas and direct cortical projections to vocal motor neurons are both features of vocal learning. Furthermore, male mice were shown to depend on auditory feedback to maintain some ultrasonic song features, and sub-strains with differences in their songs were able to match each other's pitch when cross-housed under competitive social conditions.^[29]

In 2013, Mahrt et al. showed that genetically deafened mice produce calls of the same types, number, duration, frequency as normal hearing mice. This finding shows that mice do not require auditory experience to produce normal vocalizations, suggesting that mice are not vocal learners.^[30]

With this conflicting evidence, it remains unclear whether mice are vocal non-learners or limited vocal learners.

Goats

When goats are placed in different social groups, they modify their calls to show more similarity to that of the group, which provides evidence they may be limited vocal learners according to Erich Jarvis' continuum hypothesis.^[31]

Evolution

As vocal learning is such a rare trait that evolved in distant groups, there are many theories to explain the striking similarities between vocal learners, especially within avian vocal learners.

Adaptive advantage

There are several proposed hypotheses that explain the selection for vocal learning based on environment and behavior. These include:^[32]

Individual identification: In most vocal-learning species, individuals have their own songs which serve as a unique signature to differentiate themselves from others in the population, which some suggest has driven selection of vocal learning. However, identification by voice, rather than by song or name, is present in vocal non-learners as well. Among vocal learners, only humans and maybe bottlenose dolphins actually use unique names. Therefore, it is unlikely that individual identification was a primary driving force for the evolution of vocal learning.
Semantic communication: Semantic vocal communication associates specific vocalizations with animate or inanimate objects to convey a factual message. This hypothesis asserts that vocal learning evolved to facilitate enhanced communication of these specific messages as opposed to affective communication, which conveys emotional content. For example, humans are able to shout "watch out for that car!" when another is in danger while crossing the street instead of just making a noise to indicate urgency, which is less effective at conveying the exact danger at hand. However, many vocal non-learners, including chickens and velvet monkeys, have been shown to use their innate calls to communicate semantic information such as ‘a food source’ or 'predator.' Further discrediting this hypothesis is the fact that vocal learning birds also use innate calls for this purpose and only rarely use their learned vocalizations for semantic communication (for example, the grey parrot can mimic human speech and the black-capped chickadee uses calls to indicate predator size). As learned vocalizations rarely convey semantic information, this hypothesis also does not fully explain the evolution of vocal learning.
Mate attraction and territory defense: While both vocal learners and non-learners use vocalizations to attract mates or defend territories, there is one key difference: variability. Vocal learners can produce more varied syntax and frequency modulation, which have been shown to be preferred by females in songbirds. For example, canaries use two voices to produce large frequency modulation variations called "sexy syllables" or "sexy songs", which are thought to stimulate estrogen production in females. When vocal non-learner females were presented with artificially increased frequency modulations in their innate vocalizations, more mating was stimulated. Sexual selection for greater variability, and thus in turn vocal learning, may then be a major driving force for the evolution of vocal learning.
Rapid adaptation to sound propagation in different environments: Vocal non-learners produce their sounds best in specific habitats, making them more susceptible to changes in the environment. For example, pigeons' low-frequency calls travel best near the ground, and so communication higher in the air is much less effective. In contrast, vocal learners can change voice characteristics to suit their current environment, which presumably allows for better group communication.

Predatory pressure

With the many possible advantages outlined above, it still remains unclear as to why vocal learning is so rare. One proposed explanation is that predatory pressure applies a strong selective force against vocal learning. If mates prefer more variable vocalizations, predators may also be more strongly attracted to more variable vocalizations. As innate calls are typically constant, predators quickly habituate to these vocalizations and ignore them as background noise. In contrast, the variable vocalizations of vocal learners are less likely to be ignored, possibly increasing the predation rate among vocal learners. In this case, relaxed predation pressure or some mechanism to overcome increased predation must first develop to facilitate the evolution of vocal learning. Supporting this hypothesis is the fact that many mammalian vocal learners including humans, whales, and elephants have very few major predators. Similarly, several avian vocal learners have behaviors that are effective in avoiding predators, from the rapid flight and escape behavior of hummingbirds to predator mobbing in parrots and songbirds.

While little research has been done in this area, some studies have supported the predation hypothesis. One study showed that Bengalese finches bred in captivity for 250 years without predation or human selection for singing behavior show increased variability in syntax than their conspecifics in the wild. A similar experiment with captive zebra finches demonstrated the same result as captive birds had increased song variability, which was then preferred by females.^[33] Although these studies are promising, more research is needed in this area to compare predation rates across vocal learners and non-learners.

Phylogeny

Birds

Modern birds supposedly evolved from a common ancestor around the Cretaceous-Paleogene boundary at the time of the extinction of dinosaurs, about 66 million years ago. Out of the thirty avian orders, only three evolved vocal learning and all have incredibly similar forebrain structures despite the fact that they are distantly related (for example, parrots and songbirds are as distantly related as humans and dolphins). Phylogenetic comparisons have suggested that vocal learning evolved among birds at least two or three independent times, in songbirds, parrots, and hummingbirds. Depending on the interpretation of the trees, there were either three gains in all three lineages or two gains, in hummingbirds and the common ancestor of parrots and songbirds, with a loss in the suboscine songbirds. There are several hypotheses to explain this phenomenon:

Independent convergent evolution: All three avian groups evolved vocal learning and similar neural pathways independently (not through a common ancestor). This suggests that there are strong epigenetic constraints imposed by the environment or morphological needs, and so this hypothesis predicts that groups that newly evolve vocal learning will also develop similar neural circuits.
Common ancestor: This alternative hypothesis suggests that vocal learning birds evolved the trait from a distant common ancestor, which was then lost four independent times in interrelated vocal non-learners. Possible causes include high survival costs of vocal learning (predation) or weak adaptive benefits that did not induce strong selection for the trait for organisms in other environments.
Rudimentary structures in non-learners: This alternative hypothesis states that avian non-learners actually do possess rudimentary or undeveloped brain structures necessary for song learning, which were enlarged in vocal learning species. Significantly, this concept challenges the current assumption that vocal nuclei are unique to vocal learners, suggesting that these structures are universal even in other groups such as mammals.
Motor theory: This hypothesis suggests that cerebral systems that control vocal learning in distantly related animals evolved as specializations of a pre-existing motor system inherited from a common ancestor. Thus in avian vocal learners, each of the three groups of vocal learning birds evolved cerebral vocal systems independently, but the systems were constrained by a previous genetically determined motor system inherited from the common ancestor that controls learned movement sequencing. Evidence for this hypothesis was provided by Feenders and colleagues in 2008 as they found that EGR1, an immediate early gene associated with increases in neuronal activity, was expressed in forebrain regions surrounding or directly adjacent to song nuclei when vocal learning birds performed non-vocal movement behaviors such as hopping and flying. In non-learners, comparable areas were activated, but without the adjacent presence of song nuclei.^[34] EGR1 expression patterns were correlated with the amount of movement, just as its expression typically correlates with the amount of singing performed in vocal birds. These finding suggest that vocal learning brain regions developed from the same cell lineages that gave rise to the motor pathway, which then formed a direct projection onto the brainstem vocal motor neurons to provide greater control.

Currently, it remains unclear as to which of these hypotheses is the most accurate.

Primates

In primates, only humans are known to be capable of complex vocal learning. Similar to the first hypothesis relating to birds, one explanation is that vocal learning evolved independently in humans. An alternative hypothesis suggests evolution from a primate common ancestor capable of vocal learning, with the trait subsequently being lost at least eight other times. Considering the most parsimonious analysis, it seems unlikely that the number of independent gains (one in humans) would be exceeded so greatly by the number of independent losses (at least eight), which supports the independent evolution hypothesis.

Neurobiology

Neural pathways in avian vocal learners

As avian vocal learners are the most amenable to experimental manipulations, the vast majority of work to elucidate the neurobiological mechanisms of vocal learning has been conducted with zebra finches, with a few studies focusing on budgerigars and other species. Despite variation in vocal learning phenotype, the neural circuitry necessary for producing learned song is conserved in songbirds, parrots, and hummingbirds. As opposed to their non-learner avian counterparts such as quail, doves, and pigeons, these avian vocal learners contain seven distinct cerebral song nuclei, or distinct brain areas associated with auditory learning and song production defined by their gene expression patterns. As current evidence suggests independent evolution of these structures, the names of each equivalent vocal nucleus are different per bird group, as shown in the table below.

Parallel Song Nuclei in Avian Vocal Learners
Songbirds	Parrots	Hummingbirds
HVC: a letter based name	NLC: central nucleus of the lateral nidopallium	VLN: vocal nucleus of the lateral nidopallium
RA: robust nucleus of the arcopallium	AAC: central nucleus of the anterior arcopallium	VA: vocal nucleus of the arcopallium
MAN: magnocellular nucleus of anterior nidopallium	NAOc: oval nucleus of the anterior nidopallium complex
Area X: area X of the striatum	MMSt: magnocellular nucleus of the anterior striatum
DLM: medial nucleus of dorsolateral thalamus	DMM: magnocellular nucleus of the dorsomedial thalamus
MO: oval nucleus of the mesopallium	MOc: oval nucleus of the mesopallium complex

Vocal nuclei are found in two separate brain pathways, which will be described in songbirds as most research has been conducted in this group, yet connections are similar in parrots^[35] and hummingbirds.^[36] Projections of the anterior vocal pathway in the hummingbird remain unclear and so are not listed in the table above.

The posterior vocal pathway (also known as vocal motor pathway), involved in the production of learned vocalizations, begins with projections from a nidopallial nucleus, the HVC in songbirds. The HVC then projects to the robust nucleus of the arcopallium (RA). The RA connects to the midbrain vocal center DM (dorsal medial nucleus of the midbrain) and the brainstem (nXIIts) vocal motor neurons that control the muscles of the syrinx, a direct projection similar to the projection from LMC to the nucleus ambiguus in humans^[37] The HVC is considered the syntax generator while the RA modulates the acoustic structure of syllables. Vocal non-learners do possess the DM and twelfth motor neurons (nXIIts), but lack the connections to the arcopallium. As a result, they can produce vocalizations, but not learned vocalizations.

The anterior vocal pathway (also known as vocal learning pathway) is associated with learning, syntax, and social contexts, starting with projections from the magnocellular nucleus of the anterior nidopallium (MAN) to the striatal nucleus Area X. Area X then projects to the medial nucleus of dorsolateral thalamus (DLM), which ultimately projects back to MAN in a loop^[38] The lateral part of MAN (LMAN) generates variability in song, while Area X is responsible for stereotypy, or the generation of low variability in syllable production and order after song crystallization.

Despite the similarities in vocal learning neural circuits, there are some major connectivity differences between the posterior and anterior pathways among avian vocal learners. In songbirds, the posterior pathway communicates with the anterior pathway via projections from the HVC to Area X; the anterior pathway sends output to the posterior pathway via connections from LMAN to RA and medial MAN (MMAN) to HVC. Parrots, on the other hand, have projections from the ventral part of the AAC (AACv), the parallel of the songbird RA, to the NAOc, parallel of the songbird MAN, and the oval nucleus of the mesopallium (MO). The anterior pathway in parrots connects to the posterior pathway via NAOc projections to the NLC, parallel of the songbird HVC, and AAC. Thus, parrots do not send projections to the striatal nucleus of the anterior pathway from their posterior pathway as do songbirds. Another crucial difference is the location of the posterior vocal nuclei among species. Posterior nuclei are located in auditory regions for songbirds, laterally adjacent to auditory regions in hummingbirds, and are physically separate from auditory regions in parrots. Axons must therefore take different routes to connect nuclei in different vocal learning species. Exactly how these connectivity differences affect song production and/or vocal learning ability remains unclear.^[39]

An auditory pathway that is used for auditory learning brings auditory information into the vocal pathway, but the auditory pathway is not unique to vocal learners. Ear hair cells project to cochlear ganglia neurons to auditory pontine nuclei to midbrain and thalamic nuclei and to primary and secondary pallial areas. A descending auditory feedback pathway exists projecting from the dorsal nidopallium to the intermediate arcopallium to shell regions around the thalamic and midbrain auditory nuclei. Remaining unclear is the source of auditory input into the vocal pathways described above. It is hypothesized that songs are processed in these areas in a hierarchical manner, with the primary pallial area responsible for acoustic features (field L2), the secondary pallial area (fields L1 and L3 as well as the caudal medial nidopallium or NCM) determining sequencing and discrimination, and the highest station, the caudal mesopallium (CM), modulating fine discrimination of sounds. Secondary pallial areas including the NCM and CM are also thought to be involved in auditory memory formation of songs used for vocal learning, but more evidence is needed to substantiate this hypothesis.

Critical period

The development of the sensory modalities necessary for song learning occurs within a “critical period” of development that varies among avian vocal learners. Closed-ended learners such as the zebra finch and aphantochroa hummingbird can only learn during a limited time period and subsequently produce highly stereotyped or non-variable vocalizations consisting of a single, fixed song which they repeat their entire lives. In contrast, open-ended learners, including canaries and various parrot species, display significant plasticity and continue to learn new songs throughout the course of their lives.^[40]

In the male zebra finch, vocal learning begins with a period of sensory acquisition or auditory learning where juveniles are exposed to the song of an adult male “tutor” at about posthatch day 30 to 60.^[41] During this stage, juveniles listen and memorize the song pattern of their tutor and produce subsong, characterized by the production of highly variable syllables and syllable sequences. Subsong is thought to be analogous to babbling in human infants. Subsequently during the sensorimotor learning phase at posthatch day 35 to 90, juveniles practice the motor commands required for song production and use auditory feedback to alter vocalizations to match the song template. Songs during this period are plastic as specific syllables begin to emerge but are frequently in the wrong sequence, errors that are similar to phonological mistakes made by young children when learning a language. As the bird ages, its song becomes more stereotyped until at posthatch day 120 the song syllables and sequence are crystallized or fixed. At this point, the zebra finch can no longer learn new songs and thus sings this single song for the duration of its life.^[42]

The neural mechanisms behind the closing of the critical period remain unclear, but early deprivation of juveniles from their adult tutors has been shown to extend the critical period of song acquisition^[43] “Synapse selection” theories hypothesize that synaptic plasticity during the critical period is gradually reduced as dendritic spines are pruned through activity-dependent synaptic rearrangement^[44] The pruning of dendritic spines in the LMAN song nucleus was delayed in isolated zebra finches with extended critical periods, suggesting that this form of synaptic reorganization may be important in closing the critical period.^[45] However, other studies have shown that birds reared normally as well as isolated juveniles have similar levels of dendritic pruning despite an extended critical period in the latter group,^[46] demonstrating that this theory does not completely explain critical period modulation.

Previous research has suggested that the length of the critical period may be linked to differential gene expression within song nuclei, thought to be caused by neurotransmitter binding of receptors during neural activation.^[47] One key area is the LMAN song nucleus, part of the specialized cortical-basal-ganglia-thalamo-cortical loop in the anterior forebrain pathway, which is essential for vocal plasticity.^[38] While inducing deafness in songbirds usually disrupts the sensory phase of learning and leads to production of highly abnormal song structures, lesioning of LMAN in zebra finches prevents this song deterioration,^[48] leading to the earlier development of stable song. One of the neurotransmitter receptors shown to affect LMAN is the N- methyl-D-aspartate glutamate receptor (NMDAR), which is required for learning and activity-dependent gene regulation in the post-synaptic neuron. Infusions of the NMDAR antagonist APV (R-2-amino-5-phosphonopentanoate) into the LMAN song nucleus disrupts the critical period in the zebra finch.^[49] NMDAR density and mRNA levels of the NR1 subunit also decrease in LMAN during early song development.^[50] When the song becomes crystallized, expression of the NR2B subunit decreases in LMAN and NMDAR-mediated synaptic currents shorten.^[51] It has been hypothesized that LMAN actively maintains RA microcircuitry in a state permissive for song plasticity and in a process of normal development it regulates HVC-RA synapses.

In humans

Humans seem to have analogous anterior and posterior vocal pathways which are implicated in speech production and learning. Parallel to the avian posterior vocal pathway mentioned above is the motor cortico-brainstem pathway. Within this pathway, the face motor cortex projects to the nucleus ambiguous of the medulla, which then projects to the muscles of the larynx. Humans also have a vocal pathway that is analogous to the avian anterior pathway. This pathway is a cortico-basal ganglia-thalamic-cortico loop which begins at a strip of the premotor cortex, called the cortical strip, which is responsible for speech learning and syntax production. The cortical strip includes spans across five brain regions: the anterior insula, Broca's area, the anterior dorsal lateral prefrontal cortex, the anterior pre-supplementary motor area, and the anterior cingulate cortex. This cortical strip has projections to the anterior striatum which projects to the globus pallidus to the anterior dorsal thalamus back to the cortical strip. All of these regions are also involved in syntax and speech learning.^[52]

Genetic applications to humans

In addition to the similarities in the neurobiological circuits necessary for vocalizations between animal vocal learners and humans, there are also a few genetic similarities. The most prominent of these genetic links are the FOXP1 and FOXP2 genes, which code for forkhead box (FOX) proteins P1 and P2, respectively. FOXP1 and FOXP2 are transcription factors which play a role in the development and maturation of the lungs, heart, and brain,^[53] and are also highly expressed in brain regions of the vocal learning pathway, including the basal ganglia and the frontal cortex. In these regions (i.e. the basal ganglia and frontal cortex), FOXP1 and FOXP2 are thought to be essential for brain maturation and development of speech and language.^[54]

Orthologues of FOXP2 are found in a number of vertebrates including mice and songbirds, and have been implicated in modulating plasticity of neural circuits. In fact, although mammals and birds are very distant relatives and diverged more than 300 million years ago, the FOXP2 gene in zebra finches and mice differs at only five amino acid positions, and differs between zebra finches and humans at only eight amino acid positions. In addition, researchers have found that patterns of expression of FOXP1 and FOXP2 are amazingly similar in the human fetal brain and the songbird.

These similarities are especially interesting in the context of the aforementioned avian song circuit. FOXP2 is expressed in the avian Area X, and is especially highly expressed in the striatum during the critical period of song plasticity in songbirds. In humans, FOXP2 is highly expressed in the basal ganglia, frontal cortex, and insular cortex, all thought to be important nodes in the human vocal pathway. Thus, mutations in the FOXP2 gene are proposed to have detrimental effects on human speech and language, such as grammar, language processing, and impaired movement of the mouth, lips, and tongue,^[55] as well as potential detrimental effects on song learning in songbirds. Indeed, FOXP2 was the first gene to be implicated in the cognition of speech and language in a family of individuals with a severe speech and language disorder.

Additionally, it has been suggested that due to the overlap of FOXP1 and FOXP2 expression in songbirds and humans, mutations in FOXP1 may also result in speech and language abnormalities seen in individuals with mutations in FOXP2.

These genetic links have important implications for studying the origin of language because FOXP2 is so similar among vocal learners and humans, as well as important implications for understanding the etiology of certain speech and language disorders in humans.

Currently, no other genes have been linked as compellingly to vocal learning in animals or humans.

Notes and References

Petkov CI, Jarvis ED . Birds, primates, and spoken language origins: behavioral phenotypes and neurobiological substrates . Frontiers in Evolutionary Neuroscience . 4 . 12 . 2012 . 22912615 . 3419981 . 10.3389/fnevo.2012.00012 . free .
Okanoya K . The Bengalese finch: a window on the behavioral neurobiology of birdsong syntax . Annals of the New York Academy of Sciences . 1016 . 724–35 . June 2004 . 1 . 15313802 . 10.1196/annals.1298.026 . 2004NYASA1016..724O . 21327383 .
Book: Hurford JR . The Origins of Grammar: Language in the Light of Evolution. 2012. Oxford University Press. Oxford.
Honda E, Okanoya K . Acoustical and syntactical com- parisons between songs of the white-backed Munia (Lonchura striata) and its domesticated strain, the Bengalese finch (Lonchura striata var. domestica). Zool. Sci.. 1999. 16. 2. 319–326. 10.2108/zsj.16.319. 85306560. free.
Schachner A, Brady TF, Pepperberg IM, Hauser MD . Spontaneous motor entrainment to music in multiple vocal mimicking species . Current Biology . 19 . 10 . 831–6 . May 2009 . 19409786 . 10.1016/j.cub.2009.03.061 . 2009CBio...19..831S . free . 1721.1/96194 . 6150861 .
2018-06-01. Vocal learning: a language-relevant trait in need of a broad cross-species approach. Current Opinion in Behavioral Sciences. en. 21. 209–215. 10.1016/j.cobeha.2018.04.007. 2352-1546. Lattenkamp. Ella Z.. Vernes. Sonja C.. 13809803. free. 21.11116/0000-0001-237D-C. free.
Martins PT, Boeckx C . Vocal learning: Beyond the continuum . PLOS Biology . 18 . 3 . e3000672 . March 2020 . 32226012 . 7145199 . 10.1371/journal.pbio.3000672 . free .
Esser KH . Audio-vocal learning in a non-human mammal: the lesser spear-nosed bat Phyllostomus discolor . NeuroReport . 5 . 14 . 1718–20 . September 1994 . 7827315 . 10.1097/00001756-199409080-00007 .
Boughman JW . Vocal learning by greater spear-nosed bats . Proceedings. Biological Sciences . 265 . 1392 . 227–33 . February 1998 . 9493408 . 1688873 . 10.1098/rspb.1998.0286 .
Noad MJ, Cato DH, Bryden MM, Jenner MN, Jenner KC . Cultural revolution in whale songs . Nature . 408 . 6812 . 537 . November 2000 . 11117730 . 10.1038/35046199 . 2000Natur.408..537N . 4398582 . free .
Foote AD, Griffin RM, Howitt D, Larsson L, Miller PJ, Hoelzel AR . Killer whales are capable of vocal learning . Biology Letters . 2 . 4 . 509–12 . December 2006 . 17148275 . 1834009 . 10.1098/rsbl.2006.0525 .
Lilly JC . Vocal Mimicry in Tursiops: Ability to Match Numbers and Durations of Human Vocal Bursts . Science . 147 . 3655 . 300–1 . January 1965 . 17788214 . 10.1126/science.147.3655.300 . 1965Sci...147..300L . 2038221 .
Reiss D, McCowan B . Spontaneous vocal mimicry and production by bottlenose dolphins (Tursiops truncatus): evidence for vocal learning . Journal of Comparative Psychology . 107 . 3 . 301–12 . September 1993 . 8375147 . 10.1037/0735-7036.107.3.301 .
Herman LM, Richards DG, Wolz JP . Comprehension of sentences by bottlenosed dolphins . Cognition . 16 . 2 . 129–219 . March 1984 . 6540652 . 10.1016/0010-0277(84)90003-9 . 43237011 .
Janik VM . Whistle matching in wild bottlenose dolphins (Tursiops truncatus) . Science . 289 . 5483 . 1355–7 . August 2000 . 10958783 . 10.1126/science.289.5483.1355 . 2000Sci...289.1355J . Comparative studies indicate that whistles vary in structure across populations and species (Steiner 1981; Ding et al. 1995; Rendell et al. 1999), with whistle divergence perhaps facilitating species recognition and speciation (Podos et al. 2002). .
Book: Caldwell DK . How animals communicate. 1977.
Janik VM, Sayigh LS . Communication in bottlenose dolphins: 50 years of signature whistle research . Journal of Comparative Physiology A: Neuroethology, Sensory, Neural & Behavioral Physiology . 199 . 6 . 479–89 . June 2013 . 23649908 . 10.1007/s00359-013-0817-7 . 15378374 .
Book: Caldwell DK . How Animals Communicate. 794.
Ralls K, Fiorelli P, Gish S . 1985 . Vocalizations and vocal mimicry in captive harbour seals, Phoca vitulina . Canadian Journal of Zoology . 63 . 5. 1050–1056 . 10.1139/z85-157.
Sanvito S, Galimberti F, Miller EH . 2007 . Observational evidences of vocal learning in southern elephant seals: a longitudinal study . Ethology . 113 . 2. 137–146 . 10.1111/j.1439-0310.2006.01306.x. 2007Ethol.113..137S .
Poole JH, Tyack PL, Stoeger-Horwath AS, Watwood S . Animal behaviour: elephants are capable of vocal learning . Nature . 434 . 7032 . 455–6 . March 2005 . 15791244 . 10.1038/434455a . 2005Natur.434..455P . 4369863 .
Cheney DL, Owren MJ, Dieter JA, Seyfarth RM . 'Food' calls produced by adult female rhesus (Macaca mulatta) and Japanese (M. fuscata) macaques, their normally-raised offspring, and offspring cross-fostered between species. . Behaviour . January 1992 . 120 . 3–4 . 218–31 . 10.1163/156853992X00615 .
Masataka N, Fujita K . Vocal learning of Japanese and rhesus monkeys. . Behaviour . January 1989 . 109 . 3–4 . 191–9 . 10.1163/156853989X00222 .
Snowdon, CT. (2009). "Plasticity of communication in non-human primates," in Advances in the Study of Behavior, eds M. Naguib and V. M. Janik (Burlington, NJ: Academic Press), 239–276.
Hopkins WD, Taglialatela J, Leavens DA . Chimpanzees Differentially Produce Novel Vocalizations to Capture the Attention of a Human . Animal Behaviour . 73 . 2 . 281–286 . February 2007 . 17389908 . 1832264 . 10.1016/j.anbehav.2006.08.004 .
Wich SA, Swartz KB, Hardus ME, Lameira AR, Stromberg E, Shumaker RW . A case of spontaneous acquisition of a human sound by an orangutan . Primates; Journal of Primatology . 50 . 1 . 56–64 . January 2009 . 19052691 . 10.1007/s10329-008-0117-y . 708682 .
Holy TE, Guo Z . Ultrasonic songs of male mice . PLOS Biology . 3 . 12 . e386 . December 2005 . 16248680 . 1275525 . 10.1371/journal.pbio.0030386 . free .
Kikusui T, Nakanishi K, Nakagawa R, Nagasawa M, Mogi K, Okanoya K . Cross fostering experiments suggest that mice songs are innate . PLOS ONE . 6 . 3 . e17721 . March 2011 . 21408017 . 3052373 . 10.1371/journal.pone.0017721 . 2011PLoSO...617721K . free .
Arriaga G, Jarvis ED . Mouse vocal communication system: are ultrasounds learned or innate? . Brain and Language . 124 . 1 . 96–116 . January 2013 . 23295209 . 3886250 . 10.1016/j.bandl.2012.10.002 .
Mahrt EJ, Perkel DJ, Tong L, Rubel EW, Portfors CV . Engineered deafness reveals that mouse courtship vocalizations do not require auditory experience . The Journal of Neuroscience . 33 . 13 . 5573–83 . March 2013 . 23536072 . 3691057 . 10.1523/jneurosci.5054-12.2013 .
Arriaga G, Zhou EP, Jarvis ED . Of mice, birds, and men: the mouse ultrasonic song system has some features similar to humans and song-learning birds . PLOS ONE . 7 . 10 . e46610 . 2012 . 23071596 . 346858 . 10.1371/journal.pone.0046610 . 2012PLoSO...746610A . free .
Jarvis ED . Bird Song Systems: Evolution. Encyclopedia of Neuroscience. 2009. 2. 217–225. 10.1016/b978-008045046-9.00935-9. 9780080450469. 10161/11242. free.
Book: Zann RA . The Zebra Finch: A Synthesis of Field and Laboratory Studies. 1996. Oxford University Press. New York.
Feenders G, Liedvogel M, Rivas M, Zapka M, Horita H, Hara E, Wada K, Mouritsen H, Jarvis ED . 6 . Molecular mapping of movement-associated areas in the avian brain: a motor theory for vocal learning origin . PLOS ONE . 3 . 3 . e1768 . March 2008 . 18335043 . 2258151 . 10.1371/journal.pone.0001768 . 2008PLoSO...3.1768F . free .
Jarvis ED, Mello CV . Molecular mapping of brain areas involved in parrot vocal communication . The Journal of Comparative Neurology . 419 . 1 . 1–31 . March 2000 . 10717637 . 2538445 . 10.1002/(sici)1096-9861(20000327)419:1<1::aid-cne1>3.0.co;2-m .
Jarvis ED, Ribeiro S, da Silva ML, Ventura D, Vielliard J, Mello CV . Behaviourally driven gene expression reveals song nuclei in hummingbird brain . Nature . 406 . 6796 . 628–32 . August 2000 . 10949303 . 2531203 . 10.1038/35020570 . 2000Natur.406..628J .
Jarvis ED . Learned birdsong and the neurobiology of human language . Annals of the New York Academy of Sciences . 1016 . 749–77 . June 2004 . 1 . 15313804 . 2485240 . 10.1196/annals.1298.038 . 2004NYASA1016..749J .
Schraff C, Nottebohm F . A comparative study of the behavioral deficits following lesions of various parts of the zebra finch song system: implications for vocal learning. J Neurosci. 1991. 11. 9. 2896–2913. 10.1523/JNEUROSCI.11-09-02896.1991. 1880555. 6575264. free.
Wild JM . Neural pathways for the control of birdsong production. Journal of Neurobiology. 1997. 33. 5. 653–670. 10.1002/(sici)1097-4695(19971105)33:5<653::aid-neu11>3.0.co;2-a. 9369465.
Brenowitz EA, Beecher MD . Song learning in birds: diversity and plasticity, opportunities and challenges . Trends in Neurosciences . 28 . 3 . 127–32 . March 2005 . 15749165 . 10.1016/j.tins.2005.01.004 . 14586913 .
Eales LA . Song learning in zebra finches: some effects of song model availability on what is learnt and when. Anim Behav. 1985. 33. 4. 1293–1300. 10.1016/s0003-3472(85)80189-5. 54229759.
Jones AE, Ten Cate C, Slater PJ . Early experience and plasticity of song in adult male Zebra Finches (Taeniopygia guttata). . Journal of Comparative Psychology . December 1996 . 110 . 4 . 354–69 . 10.1037/0735-7036.110.4.354 .
Aamodt SM, Nordeen EJ, Nordeen KW . Early isolation from conspecific song does not affect the normal developmental decline of N-methyl-D-aspartate receptor binding in an avian song nucleus . Journal of Neurobiology . 27 . 1 . 76–84 . May 1995 . 7643077 . 10.1002/neu.480270108 .
Bischof HJ, Geissler E, Rollenhagen A . Limitations of the sensitive period for sexual imprinting: Neuroanatomical and behavioral experiments in the zebra finch (Taeniopygia guttata). Behavioural Brain Research. 2002. 133. 2. 317–322. 10.1016/s0166-4328(02)00016-5. 12110465. 28662586.
Wallhäusser-Franke E, Nixdorf-Bergweiler BE, DeVoogd TJ . Song isolation is associated with maintaining high spine frequencies on zebra finch 1MAN neurons . Neurobiology of Learning and Memory . 64 . 1 . 25–35 . July 1995 . 7582809 . 10.1006/nlme.1995.1041 . 19096150 . free .
Heinrich JE, Nordeen KW, Nordeen EJ . Dissociation between extension of the sensitive period for avian vocal learning and dendritic spine loss in the song nucleus lMAN . Neurobiology of Learning and Memory . 83 . 2 . 143–50 . March 2005 . 15721798 . 10.1016/j.nlm.2004.11.002 . 8953614 .
Wada K, Sakaguchi H, Jarvis ED, Hagiwara M . Differential expression of glutamate receptors in avian neural pathways for learned vocalization . The Journal of Comparative Neurology . 476 . 1 . 44–64 . August 2004 . 15236466 . 2517240 . 10.1002/cne.20201 .
Brainard MS, Doupe AJ . Postlearning consolidation of birdsong: stabilizing effects of age and anterior forebrain lesions. J Neurosci. 2001. 21. 7. 2501–2517. 10.1523/JNEUROSCI.21-07-02501.2001. 11264324. 6762407. free.
Basham ME, Nordeen EJ, Nordeen KW . Blockade of NMDA receptors in the anterior forebrain impairs sensory acquisition in the zebra finch (Poephila guttata) . Neurobiology of Learning and Memory . 66 . 3 . 295–304 . November 1996 . 8946423 . 10.1006/nlme.1996.0071 . 9592089 . free .
Heinrich JE, Singh TD, Sohrabji F, Nordeen KW, Nordeen EJ . Developmental and hormonal regulation of NR2A mRNA in forebrain regions controlling avian vocal learning . Journal of Neurobiology . 51 . 2 . 149–59 . May 2002 . 11932956 . 10.1002/neu.10046 .
Livingston FS, Mooney R . Development of intrinsic and synaptic properties in a forebrain nucleus essential to avian song learning. J Neurosci. 17. 1997. 23. 8997–9009. 10.1523/JNEUROSCI.17-23-08997.1997. 9364047. 6573603. free.
Jarvis ED . Neural systems for vocal learning in birds and humans: a synopsis . Journal of Ornithology . 148 . 1 . 35–44 . December 2007 . 19684872 . 2726745 . 10.1007/s10336-007-0243-0 .
Pariani MJ, Spencer A, Graham JM, Rimoin DL . A 785kb deletion of 3p14.1p13, including the FOXP1 gene, associated with speech delay, contractures, hypertonia and blepharophimosis . European Journal of Medical Genetics . 52 . 2–3 . 123–7 . 2009 . 19332160 . 2853231 . 10.1016/j.ejmg.2009.03.012 .
Vargha-Khadem F, Gadian DG, Copp A, Mishkin M . FOXP2 and the neuroanatomy of speech and language . Nature Reviews. Neuroscience . 6 . 2 . 131–8 . February 2005 . 15685218 . 10.1038/nrn1605 . 2504002 .
Takahashi K, Liu FC, Hirokawa K, Takahashi H . Expression of Foxp2, a gene involved in speech and language, in the developing and adult striatum . Journal of Neuroscience Research . 73 . 1 . 61–72 . July 2003 . 12815709 . 10.1002/jnr.10638 . 31971989 .