Perception of Emotion in Speech: The Role of Content and Tempo
Human speech represents an auditory medium through which emotion can be portrayed (e.g., Steinbeis & Koelsch, 2008a). Speech has emotional content, which is denoted by the literal meaning of certain emotion words (joy, contentment, happiness, mournfulness, melancholy etc.) used in the utterance. But, interestingly, even when low-pass filtering is applied to produce utterances that lack denotative meaning, these utterances can still be classified to convey different emotions according to the prosodic tone of the speaker (e.g., Pell, Jaywant, Monetta, & Kotz, 2011). In studies to date, researchers have examined the role of prosody only on its own, however; content has almost always been ignored, eliminated, obfuscated, or masked, through low-pass filtering or clever experimental design (e.g., Mozziconacci, 2001; Banse & Scherer, 1996 ). Further, when the interactive effects of prosody and content have been studied, the different elements of prosody (e.g., loudness, pitch level, timbre, or tempo), have been confounded (e.g., Mozziconacci, 2001). The purpose of the present research is to examine how the variable of content meshes with just a single, but prominent, element of prosody, that of tempo, and interacts with it in modulating the degree of emotion conveyed in speech utterances. When isolated as such, how tempo modulates emotion in music has already been explored and the results show that faster tempos evoke happier emotions and slower tempos evoke sadder emotions (e.g., Thompson & Ilea, 2006). Additionally, although there is no inherent semantic meaning in music, there is structural meaning, that is conveyed through musical elements, including modality (e.g., Meyer, 1956). In terms of emotion, for instance, minor excerpts often evoke sadder emotions and major excerpts often evoke happier emotions (Kastner & Crowder; 1990; Hunter, Schellenberg, & Schimmack; 2010). Researchers have found interactive effects of tempo and mode in music such that when these cues are consistent (e.g., fast-tempo, major-mode) feelings for the target emotion (both perceived and felt) were particularly pronounced (Hunter, Schellenberg, & Schimmack; 2010). Of interest in the current study is whether we will replicate the effects of varying just tempo in music within the realm of speech excerpts, and further, whether the effects will also be dependent on content established by semantically meaningful words. In Experiment 1, we used both speech and musical excerpts, varying levels of tempo (slow, normal, fast) and content (happy or sad; or major or minor), and found that not only did they both matter but they also interacted with one another in emotion ratings. Both speech and music showed the same patterns of results. Sad material became happier as tempo increased, but happy material showed no difference between different tempos. With musical materials, we also replicated previous findings in that major mode was happier than minor mode, fast tempos were happier than slow tempos, and music with consistent cues in the sad realm elicited particularly sad ratings, although we did not replicate the finding that music with consistent cues in the happy realms elicit particularly happy ratings (cf. Hunter, Schellenberg and Schimmack, 2010). In Experiment 2, we used only speech excerpts and exaggerated the variations in tempo to explore the boundary conditions of its effects in evoking emotions, and also added a neutral content to the sad and happy contents. Again, both tempo and content mattered, and they interacted with one another in that exaggerated increases in tempo still made sad content happier without exception, and still made happy content less happy. The results are discussed within the framework of perception of emotions in general.