32-Bit Audio Recording – Do You Need It?

Short answer, no, you do not need to record 32-bit audio. The only benefit is moot when hiring a professional mixer. It’s entirely superfluous when you consider how sound is captured digitally.

Fine, I’ll explain why this is the case.

What Is 32-Bit Audio Anyway?

To explain it, I need to go over how digital audio works. With analog audio, the microphone translates the sound into electrical voltage that travels down a wire and is recorded on an analog medium, like a tape. Essentially, it takes a photograph of a sound wave so it can be played back on a speaker, and now you know why it used to be called a phonograph.

Computers don’t work like that. They only see numbers, so sound has to be reproduced as a series of numbers. The numbers act as instructions so the computer can reproduce the sounds exactly. There are two values that need to be considered for this: Sample Rate and Bit-Depth.

Sample Rate

If you work in any video post production capacity, the number you will see most often is 48 kHz. If you master CDs for a living, everything is encoded at 44.1 kHz. Modern recorders can usually go as high as 192 kHz. What do these numbers mean? Since I do sound for video, let’s focus on 48 kHz since that’s the only one that matters in my line of work. This means that in one second, 48,000 samples of sound pressure are taken, hence the rate of 48 kilohertz.

FUN FACT – To reproduce a desired frequency, the sample rate must be at least double the frequency. Therefore, 48 kHz audio can reproduce frequencies up to 24 kHz, 4,000 hertz beyond the range of human hearing. This is known as the Nyquist Therorem.

Sound is created by rapid changes in air pressure that our ears can translate into audio. Human ears can typically pick up sounds as low as 20 pressure changes per second (expressed as hertz) and as high as 20,000 changes per second, though most adult ears can’t hear squat above 16,000 hertz.

So in one second, the computer is taking 48,000 samples. For sample one, the computer says, “What is the sound pressure level?” and it records the number representing it. For sample two, it asks that again, records that number, and keeps going 48,000 times every second. So in essence, 48 kHz audio file is a series of numbers, specifically 48,000 of them for every second of audio that the computer can reproduce as sound. And you thought your job was tedious.

Bit-Depth

Bit-Depth, also known as the quantization rate, is number of bits. The bits determine the range of numbers that can be noted in each sample. So if you’re recording 16-bit audio which is standard for CDs, then each sample recorded has a possible value between 0 and 65,535, where 65,535 denotes the maximum level possible to record and 0 denotes dead silence.

You’re really going to make me explain basic computer science, aren’t you?

Okay, computers can only see numbers based on the values of individual two state switches that can be on or off, or a 1 or 0 where 1 is on and 0 is off. The bits refers to how many of these switches there are. 1 bit has two possible values, OFF or ON. This is expressed as 0 or 1. 2 bits have four possible values, 00, 01, 10, and 11, alternatively 0, 1, 2, 3. So a 16-bit quantization rate has 65,536 possible values, or 216 (16 bits or binary switches each with two possible choices, I hope this is making sense now). For 24-bit audio, the current standard, there are 16,777,216 possible values. This number represents the amplitude of each sample, or how loud it is.

Why Does The Bit Depth Matter?

Precision. Let’s say you want to record someone’s voice. You’re recording 16-bit, when the recorder tries to record the samples, the volume of their voice falls halfway between two samples! Computers can only record whole integers, so it has to round it up or down. This is known as a quantization error. Because 16-bit audio has 65,536 possible sample values, the error is very very small. How small? Utterly imperceivable, specifically by 0.00007 decibels. Do not ask me to explain that math as I don’t understand it, I looked up the figure in a book. No human, hell, no bat can perceive a difference of one 700 thousandth of a decibel. 16-bit recording allows us to accurately record sounds with a maximum dynamic range of 96 decibels, which is the acoustic difference between some leaves rustling outside and sitting at the bar in a happening club.

Remember, 16-bit is the standard for CDs, and nobody has every complained about the sound quality of a well mastered CD (besides vinyl enthusiasts).

2-bit quantization

Example of 2-bit quantization to illustrate the quantization errors. The red line represents the analogue waveform while the blue line is the digitized signal. Note how inaccurate it is.
Source: Wikipedia

3 bit quantization

Example of quantization at 3-bit. Note how it’s a bit more accurate than 2-bit recording. By the time you get to 16-bit, it’s quite precise.
Source: Wikipedia

So Why Don’t We Just Use 16-Bit Audio Recording?

If you’re just recording sound and then releasing it as is, 16-bit is more than adequate, and it’s served really well as a music delivery format for years. But in modern video production, there’s always a lot of effects added, noise reduction applied, midranges boosted. The computer does this with some very complicated math, and well, the quantization errors become much more noticeable as the sound goes through the post-production process. Quantization errors get multiplied and divided, quickly introducing all kinds of digital processing artifacts. For this reason, 16-bit audio makes much more sense as a delivery format.

So What About 24-Bit?

For 24 bit recording, again the standard all sound mixers use, there are 16,777,216 possible sample values. This gives us a maximum dynamic range of 144 decibels, or the acoustic difference between a worm rubbing a cotton ball from one football field away and a jet engine firing up right next to your face. This also means the maximum quantization error is 0.00000026 decibels. Now a 26 millionth of a decibel is not only imperceivable to Superman, but it’s 256 times more precise than 16-bit. The error is small enough that for all intents and purposes, it’s an exact duplicate of the analog sound wave. Quantization errors are so trivial at that point that they don’t matter in post production. A 26 millionth of a decibel is so tiny that such errors would not be compounded no matter how much post processing you put them through.

So Can We Talk About 32-Bit Audio Now?

Right! That thing this blog post is ostensibly about. So with 32-bit audio, there are 4,294,967,296 possible sample values, right?

No.

See 32-bit audio uses floating point computation rather than a fixed integer value. You know how large numbers are noted by powers of ten? Instead of saying 4,295,000,000, I could write 4.295E9. Floating point works much the same, instead of noting an integer value, it notes a mathematical formula for the computer to figure out the value after the fact.

Using this approach, each sample in a 32-bit floating point recording has approximately 460 undecillion possible values. What’s an undecillion? It’s a number with 36 zeros. Quantization errors here are absolutely nil, specifically, I don’t know how to calculate a logarithm that small. This also gives the file a maximum dynamic range of 1,528 decibels, which is the acoustic difference between an amoeba farting and the sun going supernova. For those of you who think you might need that dynamic range, I’d like to point out that the maximum possible sound pressure level on Earth is 210 decibels.

So what is gained by making something 274 nonillion times more precise than something that is essentially 100 percent precise, and able to theoretically record sounds which violate the laws of physics? Turns out, there is one advantage.

Clipping

Clipping is what happens when audio is sampled at the maximum value the bit rate allows. Because 24 and 16-bit audio denotes samples in integers, it is impossible to record above this level, so you get a very harsh and distorted sound. When recording 32-bit audio, you are recording equations rather than integer values, so if you clip you can simply lower the gain in post and your audio will be perfectly reconstructed.

However, this is largely unnecessary. A good sound mixer is listening and making adjustments to the mix so the recorded tracks don’t clip. The ISOs are recorded at a much lower level in case the mix track clips. Because 24-bit audio has a dynamic range far greater than what any speakers are even capable of and high quality recorders from Zaxcom and Sound Devices have really quiet preamps, you can add gain in post without introducing any noise.

But That Physics Defying Dynamic Range…

Yeah about that…

The maximum sound pressure level that an industry standard Sanken COS-11D lavalier is capable of is 127 decibels. DPA’s 4017B, an 1800 dollar microphone, is rated at 146 decibels. Most professional microphones are somewhere in the 110-130 decibel range.

Not only that, the microphones need to go though some kind of preamp and Analog to Digital Converter (ADC). These essentially give the mic enough power for it to transfer the sound to the ADC where it is converted into a digital file. All these components generate noise that limits the dynamic range, but really expensive units are very quiet. Zaxcom’s Neverclip technology gives their recorders a maximum dynamic range of 138 decibels, Sound Devices brand new Mix Pre II series are only rated to 142 decibels, well below the 24-bit limit of 144 despite being able to record 32-bit files. Unless you are trying to record gunshots or explosions at unsafe proximities, you aren’t maxing out our current technology’s capabilities on any job.

In other words, all the other components that go into recording audio aren’t even capable of taking advantage of the 32-bit dynamic range and even if they could, it’s still superfluous. Human voices are typically about 80 decibels, only reaching 110 if they’re shouting right into your ear. If you gain stage properly, there is no reason you need 32-bit files over 24-bit. I mean, 96 kHz sample rates have existed for a while, yet the industry has decided 48 kHz is plenty.

High bit depth doesn’t automatically make a recording sound good. There are so many other components, such as mic placement, analog gain staging, and the quality of the preamps. You can buy an 8k camera and terrible lenses, then shoot a poorly lit scene. You’ll still end up with a terrible image, albeit a high resolution terrible image. Audio works much the same way.

FUN HEADACHE – Decibels, or dB, are not units of sound pressure, but in fact ratios of one power value to another on a logarithmic scale. With sound pressure, it is 10log10 of the ratio of sound energy expressed in watts per square meter over the reference point, which is the minimum threshold of human hearing (10-12 watts per square meter). In general, an increase by 10 dB is a doubling of perceived loudness and ten times the power. So a 10 dB increase is twice the volume of the quietest thing we can hear and ten times the power. A 20 dB increase is four times the volume and a hundred times the power, etc. This calculator will help you figure these out and offers further explanation.

The dBFS scale, which is used for many digital audio recorders, is the ratio against the maximum peak level the recorder is capable of. This is why it is measured in negative dB values. The dBu scale used in analog recorders is used to measure differences between a reference voltage. Don’t worry, I don’t think the good people who designed the decibel scale understand it either.

Why Stick With 24?

You can totally use 32-bit audio, but if you hire a professional sound mixer, there is no advantage you will be able to see to your end product. Most recorders in service only go up to 24-bit, including Sound Devices’ brand new Scorpio and Zaxcom’s upcoming Nova recorders.

32-bit float has been advertised as eliminating the need for gain staging. This is not a good precedent to set. A sound mixer’s job is to record the cleanest audio possible and make the post production process as simple as possible. I’ll admit that being able to unclip audio that has clipped is pretty awesome, but it is still an extra step. There’s always some busywork for the editor, like syncing sound and picture, and would you rather add extra time unclipping the audio or cutting the piece together?

I try and record my mix tracks at a good level. This minimizes busywork and offers a good guide for the editor regardless. My ISOs are recorded much lower where there’s no chance of clipping if they are needed. My Zaxcom recorders’ preamps are so quiet that no noise is introduced if you increase the ISO levels. But they are just that, a safeguard. I try and make the live mix as close to the finished product as possible, so I want to worry about clipping!

The one advantage you do gain with 32-bit audio is entirely moot if you hire a professional sound mixer. We know how to avoid clipping and build in safeguards. Plus files are 33 percent larger than their 24-bit companions.

Oh, and current technology can barely take full advantage of 24-bit, even with recorders capable of writing 32-bit audio files.

For now, I don’t see the industry switching over as there’s simply no compelling reason to do so.

UPDATE: As of October 9th 2019, I had some more thoughts that I decided to add in its own blog post.

More Thoughts On 32-Bit

Sources: