Developing tools for analyzing spoken vocal performance

MICHELLE GORE / AGGIE

NEH grant supports UC Davis humanities research

The National Endowment for the Humanities recently awarded a $75,000 Digital Humanities Advancement Grant to a project co-led by University Writing Program lecturer Marit MacArthur.

The aim of this project is to develop more advanced tools for analyzing sound recordings of spoken vocal performance, especially those by poets and the famous radio plays by Orson Welles. This research is occurring at several universities across the globe with the other co-leader, Professor Neil Verma of Northwestern University’s department of radio, television and film, focusing on Welles and MacArthur focusing on poetry readings.

“I returned to UC Davis in the 2014-15 school year on an [American Council of Learned Societies] digital innovation fellowship, so that started this current research,” MacArthur said. “I was interested in linguistic approaches to analyzing performance styles in poetry recordings so it took off from there, and then I started collaborating with other people here.”

MacArthur explained that while some of the tools have already been developed, this grant will help to make them easier to use and make their use more widespread so that the technology can be used to generate more knowledge and lead to more potential applications.

“Some of these tools I did already develop with the ACLS fellowship but not many people know about them or use them widely,” MacArthur said. “So, this grant will develop them further and disseminate them and train more humanities scholars to use them in their research on speech recordings.”

One main tool that MacArthur employs is called Gentle, which picks up on the linear aspects of someone’s speech patterns, meaning how the speed of a speaker’s rhythms and use of silence fall over a period of time.

“[Gentle] is a forced aligner that takes a media file and aligns it with a transcript so you get precise timing information, basically how quickly people are talking and how long their pauses are,” MacArthur said. “It uses speech recognition algorithms that were developed at Johns Hopkins. It’s pretty good at guessing what was said, as well, when a transcript isn’t available, and sometimes the few mistakes it makes can be very funny.”

The other main tool, Drift, also takes the horizontal aspects into account — how quickly the voice rises and falls in pitch over time.

“I use these tools to investigate performance styles because there are a lot of highly conventional ways of speaking from film, to broadcast news, to poetry readings, to stand up comedy, and we recognize them when we hear them,” MacArthur said. “But what exactly are the performers doing with their voices?”

These tools can collect many different types of data about the mathematical elements of a performer’s speech patterns, which can then be analyzed further to learn more about what certain types of speakers tend to do.

“These tools can generate a lot of data about pitch and timing, like pitch range,” MacArthur said. “A really expressive speaker might use two octaves. But pitch range alone doesn’t tell you everything. There’s also pitch speed and pitch acceleration; more expressive speakers seem to change their pitch more rapidly. Then there’s how long people pause. A really dramatic speaker might have longer pauses. Conversational speech, for instance, is typically characterized by a faster speaking rate.”

Because of how people change pitch, speed and rhythm and employ silence in unique ways while speaking, there are many similarities between how speech patterns and music can be analyzed. However, MacArthur pointed out a key difference that makes it more difficult to analyze speech.

“The tools that exist for studying the voice in music are much better and much more widely used, at least outside of linguistics,” MacArthur said. “This difference has to do with speech versus singing. When you sing, your vocal cords vibrate really regularly and it’s really easier to pick up the pitch, but in speech they vibrate irregularly and pitches are harder to track, especially in noisy, older recordings that humanists frequently want to study.”

Owen Marshall, a postdoctoral scholar of science and technology studies, is a user-tester for the project, meaning he will be testing the tools developed and providing feedback. He commented on his background studying sound and how this project’s approach differs from what he has done before.

“I study the history and sociology of sound technology, particularly technologies of the voice,” Marshall said. “For example, I’ve studied how signal processing tools like Auto-Tune changed recording engineering by making the voice legible in a new way. This project uses similar tools for pitch-time tracking but lets us apply it to archives of recorded voices instead of just analyzing them one at a time.”

Cindy Shen, an associate professor in the Department of Communication at UC Davis, is also a user-tester for this project. She also explained what she sees as a possible application of the tools.

“My academic background is in social media and games research,” Shen said. “I often use digital trace data (or ‘big data’) in my research. The tools developed might help my research on various social aspects of gaming, as we can gain more understanding of the content and context of gamer communications with each other.”

MacArthur explained how she became involved with studying sound recordings and why she finds it so fascinating to take an empirical approach to analyzing something that seems so subjective on the surface.

“I’m originally trained as a poetry scholar, and I’ve been to a ton of poetry readings,” MacArthur said. “I developed opinions about how people were reading, what was engaging and what was boring. We have a strong response to intonation patterns, apart from content. Like the voice of the Peanuts’ teacher. You can make something boring sound really interesting and make something interesting sound really boring, depending on the intonation, so I wanted to look at how we respond to the voice, musically in a way, but in speech.”

 

Written by: Benjamin Porter — features@theaggie.org