Why VR Needs Better Audio - OneBonsai - Blog

Why VR Needs Better Audio

In an article with the New York Times in 1992, George Lucas said that “sound is half the experience in seeing a film.” And who would deny the importance of sound after hearing the music in Star Wars or Indiana Jones, both franchises that George Lucas created?

Sound is more important than we realize. Hearing is one of the first senses a fetus develops in the uterus. Compare this to sight: a baby hardly sees anything at birth. It takes months before their eyes are fully developed. Despite this, hearing plays second fiddle to sight in today’s society.

Sound contributes enormously to our feeling of presence in virtual reality. But progress in the realistic simulation of sound for VR is not nearly as fast as progress for VR visuals. Most VR simulations today still sound somewhat flat. That’s because we don’t hear sounds in VR the same way we hear sounds in real life.

How We Hear in Real Life

Humans are very good at detecting where a sound is coming from. We can place everything we hear in the real world, even if we’re not looking in the direction that the sound is coming from. That’s called sound localization.

The loudness and the reverberation of a sound is a general cue that our brain uses to detect the distance of a sound, regardless of its direction. Depending on its reverberation, our brain will place a loud sound closer to us than a quiet sound.

Our brain uses two different audio cues to figure out whether a sound is coming from the left or right. Firstly, the interaural time difference (ITD), or the time it takes for a sound to travel between both ears. Secondly, the interaural level difference (ILD), or the difference in the pressure of the sound between both ears.

The time difference of a sound between two ears helps us understand where a sound is coming from

That’s for lateral sounds, i.e. sounds coming from the left or right. It’s a bit more complex when it comes to sounds coming from behind or in front of us. Here, our brain relies on tiny changes in sounds as they bounce off of the unique geometry of our bodies. Because each individual has a different size and shape of head, neck, torso, and so on, how these cues are interpreted will differ from person to person.

How our ears receive a sound from a point in space is called the head-related transfer function (HRTF). Think of the HRTF as a sound fingerprint; unique to every individual.

A Brief History of Sound Quality

The Jazz Singer was a 1927 musical drama film that was the first feature-length film with a synchronized music score and lip-synchronous singing and speech. It was a big commercial success that demonstrated the potential of “talkies“, or movies with sound, and an important milestone for audio in entertainment.

The Jazz Singer was a commercial and, at the time, a critical success

The next big development for audio was stereo sound, up until today the most widely used audio format. It came after mono sound, where you’d hear the same sound through all output channels (whether speakers or headphones). Stereo sound uses more than one channel to output sound.

When wearing headphones, this means you can hear one sound in your left ear and another sound in your right ear. Or you could hear the same sound in both ears, but a bit louder in one ear than in the other. Sound became richer, more vibrant.

The story is similar for speakers. Different speakers can put out different sounds. A surround sound system is an audio system where you surround yourself with speakers (usually five or seven) for a richer audio experience.

A surround sound setup

However, you have to keep in mind that sound coming from speakers will always interact with the objects of the room they’re placed in, creating a less authentic audio experience. There’s a reason why audiophiles and sound engineers around the world use headphones instead of speakers to listen to sound.

Binaural audio, or 3D audio, is the next big improvement for sound quality. The change is in the recording process. Binaural audio is recorded through two microphones that are placed inside cavities similar to our outer ears. This way, we record sound that simulates the ITD and ILD audio cues of our brain.

Because it simulates the ITD and ILD audio cues, binaural audio only works with headphones. If you’d listen to it through speakers, you’d effectively be listening to two ITD/ILD audio cues: that of the audio and that of your own ears.

Below video is a binaural video of the launch of the Falcon Heavy. It’s my favorite binaural video on YouTube (there are thousands of binaural videos already). Listen to this with headphones and crank up the volume. Experience how much better sound can sound.

While binaural audio is an impressive step up from stereo sound, it still won’t do for VR. The puzzle is still missing a piece. Here’s the problem: all audio is recorded statically. But you move around in VR, or at the very least you move your head.

A sound that’s coming from your left will still come from your left, even if you turn your head to face the sound. This breaks immersion. Luckily, VR technicians and sound engineers have been working on a few solutions to solve this problem.

Better Audio in VR

Firstly, specific sounds can be tied to specific objects in VR. This is called object-based audio. A bleeping robot will sound further away if you kick it. The footsteps of enemies will sound different as they come closer. Dolby Atmos and DTS:X are two examples of technologies that support binaural and object-based audio.

Secondly, head-tracking headphones are another innovation that VR could seriously benefit from. These types of headphones can send information about the position of a user’s head to any program, which can then sync the audio appropriately, as per your head’s position.

Binaural audio through head-tracking headphones will be a major improvement for VR sound. However, while it’ll work wonders for sounds coming from the left or right, it’ll still struggle with sounds coming from behind or in front of us. That’s where generalized HRTF databases come in.

Although every person has their own HRTF (their own sound fingerprint) it would be entirely impractical to record sound for each individual’s HRTF. It would require everyone to sit in an anechoic chamber with microphones in their ears, while sounds are played from all angles.

While each individual’s HRTF is different, they’re similar enough to create an incredibly immersive audio experience from generalized HRTF databases. The video below gives you an idea of the difference these HRTF databases can make.

Tying it Together

Audio is a very important contributor to feeling present in a VR simulation. However, audio in VR hasn’t progressed as much as visuals have. Most VR simulations still sound somewhat flat, particularly if you know how much better everything could sound.

Slowly, however, developers and innovators in VR seem to be waking up to the importance of sound. Head-tracking headphones paired with binaural audio and generalized HRTF databases would represent a huge improvement in the quality of sound for VR simulations.