Sound all around

Toby Howard

This article first appeared in Personal Computer World magazine, June 1997.

SINCE ITS EARLIEST DAYS, the human-computer interface has been almost entirely visual. Until fairly recently, audio was limited to strangulated beeps telling you something had gone wrong, or sound effects that soon became tedious. Then came multimedia, and sounds began to stop playing second fiddle to images. Now, with the appearance of a new generation of software which can create realistic 3D audio environments using standard soundcards, true virtual worlds are one step closer.

There have been many attempts to make artificial acoustic spaces using loudspeaker and headphone technology, and the field is awash with various claims for systems offering "3D sound". It is certainly possible to process conventional stereo signals to give an enhanced audio field, and multi-speaker "surround sound" systems have been around in various incarnations for years. But these systems do not create true 3D audio environments, in which the listener is able to perceive sounds coming from arbitrary points in space. This is extremely hard to achieve, but in recent years progress has been quite remarkable.

Perhaps unsurprisingly, it's the shape of our outer ears that plays the biggest part in our 3D audio perception. These convoluted wiggles of cartilege and skin interfere with the incoming sound in extremely complex ways, filtering the frequencies before the sound passes along the ear canal to the eardrum, where the process of sound perception begins.

It's easy to demonstrate the effect of our outer ears with a few experiments -- but first make sure nobody's watching. Find a sound source with a fairly full frequency range, like the sound of a PC fan, or a TV tuned to a unused channel. With your hands shaped like you're about to give karate chops, place them at each side of your head in front of your ears, with the backs of your hands facing forwards. Notice how the quality of the sound changes. Now bend your hands back to cup them over your ears. Finally, push your ears out at right angles to your head.

In each of those experiments, you should have heard changes in the tonal colour of the sound. You may also have noticed differences in the sense of where the sound was coming from, or how "spread out" it was. The actual effects differ from person to person. I've found that if I pull on my earlobes when playing Quake, as well as attracting strange looks, I can sometimes hear monsters behind me.

Because the shape of the outer ear is so complex, it has a different filter response for each possible direction of incoming sound. This is the key to spatial hearing. The technical name for the filter response of the outer ear, and which also takes into account shadowing and reflection effects of the head, torso and shoulders, is the "Head-Related Transfer Function", or HRTF. Researchers can measure HRTFs by inserting tiny microphones deep into the ear canals of volunteers. The response of the microphones is measured for a number of sound sources in carefully calibrated positions around the listener. If we now take a raw sound, filter it with a particular HRTF, and play the result direct to the ear with headphones (to bypass the outer ears), we "hear" the sound coming from whatever spatial position was used to compute its HRTF. If we have a table of HRTFs measured across a range of locations, we can interpolate their values to compute approximate HRTFs for every point in the space around a listener.

Systems using HRTFs to create spatialised sound have been around for a while. The first to appear was the gloriously named "Convolvotron", developed in the late 1980s for NASA and later marketed by Crystal River Engineering. It comprised two PC cards with digital signal processing chips which could apply HRTFs to sound in real time. It was very expensive. Subsequent hardware developments have brought the cost down, and there are a number of manufacturers selling PC cards which spatialise sound in real time using HRTFs. But it's still quite a specialised market.

Happily, increasing processor power now means that the HRTF filtering algorithms can be migrated from hardware into software. One of the best software spatialisers is Intel's "3D Realistic Sound Experience" software (3D RSX). A name like that takes some living up to, but 3D RSX is surprisingly good, incorporating reverberation and doppler effects too. It also supports the new spatialised audio nodes in VRML 2.0. If you have Windows 95, Netscape and a fast processor, you can experiment right now. You'll need the Cosmo VRML viewer plug-in from Silicon Graphics (www.sgi.com/Products/cosmo, and 3D RSX (www.intel.com/IAL/rsx), both freely downloadable. A particularly good VRML audio site is "Music of the Spheres" at codelab.siegelgale.com/solutions/spheres_index.html, which although simple, gives a flavour of the future of 3D sound on the Web.

Spatialised audio can also help blind or partially-sighted people use the Internet. One proposal is based on the idea of "Cascading Style Sheets" for Web pages. By changing the style sheet file associated with a set of pages, a designer can change the way the pages are displayed, without altering the actual pages. For example, the "H1" HTML tag might be defined to display its text in a 12 point bold sans-serif font, or under the new audio proposal, to speak its text slowly in a soft female voice located to the far-right and above the listener. The possibilities are intriguing.

Done properly, true 3D spatialised audio can be a compelling experience, leaving you wondering how your humble stereo ever managed to convince you that Oasis were playing unplugged in your living room. With 3D sound you can hear extraordinary things, but it's strange to know you're hearing them through someone else's ears.

Toby Howard teaches at the University of Manchester.