Lip Syncing Better Than Most Pop Stars

In our previous post, we talked about acquiring and managing VO audio files in Merek’s Market. This time, let’s look at how we built a lip syncing system. The same goals apply, a quality outcome with low cost and with a small time investment.

We started with a few talking animations and built a simple system that randomly picked one of these animations and played it when the audio was playing. This results in a flappy mouth randomly playing over audio. Not great.

Hmmm. The answer to this problem is lip syncing. Getting your characters mouths to match with the playing audio. An animator would take weeks to manually create a talk animation for each audio clip. Instead, we wrote a little system that reads the audio in real time and matches the characters mouth as best we could. We think it looks pretty good.

We started by taking a look at what blend shapes we have for our characters then thinking about what we could do with them. Here’s Merek, one of our characters split up into the different blend shapes that we’ve created for his face. 

Luckily we’d already created these for other animations. We picked out shapes that looked like the character could be mid conversation. These 3.

These blend shapes can be tweened up, down and blended together to give the effect we’re looking for. The small shape is good for low volume muttering and the larger open mouth is for loud words or shouting.

To power this, we wanted to be able to play an audio file then query it for loudness. In Unity, we can call AudioSource.GetSpectrumData to return an array of volumes for each individual frequency. We then sum up each audible frequency that could be heard in dialogue (150hz low to 2000hz high) and average them out to give us one number for the loudness of the audio. Thanks to this post for helping with understanding of this concept.

Nearly there! With a loudness value being calculated every frame, we just needed to hook this value up to the blend shapes. As the volume increases we wanted to cycle through the blend shapes, bringing in larger shapes and dropping off smaller ones. Something like this:

For  a little extra smoothing, we averaged the volume over a 0.1s period and tweened over a few frames to the target value.

That’s it! It sounds quite involved but, at its core, it’s only 3 blend shapes and a few lines of code to achieve something close to what an animator can do in a few weeks. If you’d like to learn more about Merek’s Market then please sign up for our newsletter here or keep checking back in future for similar dev posts.