It happens to most of us sooner or later. You put headphones on and start overdubbing a new track into a computer-based recorder, but you can’t catch the feel because the headphone mix is distracting. There’s a time lag between when the stick hits the head and when you hear it in the headphones.
This annoying phenomenon is caused by the latency of the computer recording system. In this article we’ll explain what causes latency and tell you how to beat it. We’ll also cover some related concepts in computer audio. If you’re recording in a home studio, knowing the technical details will be your first line of defense against certain frustrating problems. Understanding how a computer deals with time may not make you a better percussionist — but then again, if you can edit out your sloppy timing in a computer, you sure can sound like a better percussionist.
Computers record sound in digital form. That is, they convert sound into numbers. In order to explain how computers deal with time, we need to start with a bit of background on digital audio.
Computer Audio And Sampling. If you know anything about how film and video work, you know that the human eye can be fooled by a series of still pictures flashing past at 30 frames per second. The eye blurs those separate images into an illusion of continuous motion. The human ear is much more discriminating. In order to record digital audio in such a way that it will sound good on playback, we need to take thousands of momentary “snapshots” (called samples) of the sound pressure level during each second of time.
When you plug a mike into a computer’s digital audio interface to record your drums, the mike sends a continuous analog signal to the interface. The interface then slices the signal up into thousands of discrete samples, which are manipulated and stored as numbers. On playback, the numbers are sent back to the interface, which then reconstructs an analog signal that you can listen to.
The number of samples taken for each second of audio is called the sampling rate. The sampling rate is measured in kiloHertz (see the sidebar on “Units”). A standard audio CD has a sampling rate of 44.1kHz (that is, 44,100 samples per second). This rate is used by many computer audio interfaces, but recordings of 48kHz, 96kHz, and even 192kHz are common.
Sampling rate is only one of the factors that can affect the quality of digital audio. For the purposes of this article, though, it’s enough to say that many people believe a higher sampling rate will produce better audio quality. However, a higher sampling rate definitely puts more demands on your computer. If you double the sampling rate from 48kHz to 96kHz, you’re doubling the amount of data that your computer has to shuffle around.
Computers today are very fast, but everything that a computer does takes time. When the audio interface samples the signal and passes the resulting data down the line to the computer’s CPU (central processing unit), that takes time. When the computer retrieves the data from the hard drive and sends it out to the interface for playback, that takes more time.
The more data the computer has to shuffle per second, the harder the CPU has to work. That’s why a computer that’s being used for recording needs a fast CPU — the faster the better (see the sidebar on “CPU Headroom”). If the CPU gets so busy that there’s a data logjam, even for a split second, the audio stream will be interrupted. Your recording software probably has a meter that shows approximately how hard the CPU is working. If the meter pegs, it means the CPU is working as hard as it possibly can, and there’s no extra time left for more processing. At that point, you’ll hear audio glitches — anything from tiny clicks and pops to huge grinding and stuttering sounds. So making sure the CPU has enough power to handle all of the audio data in real time is essential.
There are several strategies for keeping the CPU happy. You can record at a lower sampling rate, use fewer plug-in effects, or use your recorder’s “track freeze” function to put effects and software synthesizers off-line. Another strategy is to increase the buffer size.
What’s the buffer size? Glad you asked.
FIG. 1. The control panel for the M-Audio FireWire 410 audio interface. The buffer size (left) has been set to 512 samples.
Buffer Size And Latency. Your computer doesn’t handle one bit, or byte, of audio data at a time. Computing audio in chunks is a lot more efficient. Buffer sizes of 128, 256, 384, 512, 768, and 1,024 samples are common. The buffer size is usually set in the control panel for your audio interface (FIG. 1). The setting you make here will affect all of the audio programs that use the interface. They’ll all use the same buffer size so that they can shuffle data back and forth and to and from the interface in an efficient way.
When you lower the buffer size, the CPU has to churn out more buffers per second, which means it has to work harder. If you’re pushing the CPU hard, you may be able to avoid audio glitches by increasing the buffer size.
The downside of this is that when the buffer size is larger, the computer needs more time to “think” about each chunk of audio. The time that it spends thinking is the main source of latency in the audio system. This is a vital concept; so let’s take a moment to look closely at it.
Latency is the amount of time that passes between when you’d like to hear something and when you actually hear it. Latency is usually measured in milliseconds (see “Units”). This may not seem like much, but milliseconds can add up.
Suppose you’re recording at 48kHz and you’ve set the buffer size to 1,024 samples. Each buffer contains 1/46 second of audio, which is roughly 22ms. When you plug a mike into the interface and start recording, the interface takes 22ms to fill the buffer, and then sends the data on to the CPU. On playback, the CPU computes one 22ms chunk of audio, sends it down the pipeline to the audio interface for output, and then starts computing the next 22ms chunk.
If you’re just listening to the playback of tracks that you recorded earlier, 22ms of delay doesn’t matter. Click the start button and 22ms later the music starts playing. So what? Latency becomes a problem only when you’re doing something live (such as playing your drums) and also monitoring your playing as the signal passes through the computer. In this scenario, the signal from the drum mikes takes 22ms to get into the computer and another 22ms to come back out. You’ll hear your drums in the headphones 44ms late — and that’s enough time lag to be very distracting.
Fortunately, most up-to-date computer systems don’t require that large a buffer size. If the buffer is set to only 128 samples, the in-to-out latency will drop to 6ms, which is a lot better. That’s why using a lower buffer setting matters. But if you have a lot of tracks that are using a lot of effects you may hear pops and crackles, unless you boost the buffer to 256 or 384.
Compensations. There are three ways to avoid in-to-out latency entirely while recording. First, you can record and monitor through an analog mixer. To do this, you’ll need a mixer with enough inputs and outputs that it can give you a headphone mix of both your mikes and the computer’s playback, while sending your mikes on to the computer (via an aux send, perhaps), without including the computer’s playback in the signal. The mike signal reaching your headphones won’t be delayed, because it never passes through the computer before you hear it. You’ll also need to make sure that the track you’re recording is muted in the recorder — that is, that it has no output back to the interface. If the recorded track is being output, you’ll hear your performance twice with a slap-back echo, first via the analog mixer, and again after it passes through the computer.
Some audio interfaces include a signal path that eliminates in-to-out latency. Look for an interface that has zero-latency monitoring. The signal path is essentially the same as what you’d set up with an analog mixer: the mike inputs are sent back to the audio outputs directly, without being routed through the computer first.
The third method is the simplest: just don’t monitor your mikes in the headphones while recording. If you’re playing reasonably loud, you probably don’t need to.
FIG. 2. Using the scissors tool in Steinberg Cubase 4 to snip an audio clip into two pieces. The pop-up data display by the scissors shows the position of the cut in bars, beats, sixteenths, and clock ticks.
Pushing And Dragging. Let’s look at a different scenario. You’ve recorded the track, you like the take, but you were pushing or dragging a little in one particular section. Or maybe it’s a single hit that’s just a tad early or late and you’d like to tighten it up.
Computers excel at this type of editing. Essentially, all you need to do is grab the scissors tool, click the tool before and after the offending audio region in order to isolate it (FIG. 2), and then grab the region with the mouse and slide it forward or backward until your performance locks with the rest of the tracks. Your owner’s manual will provide details on the mouse tools and how to use them.
If you recorded your drums using multiple mikes, you’ll need to move all of the drum tracks exactly the same amount. To do this, you’ll need to select all of the drum audio before using the scissors tool so that the cuts will be in the same places. You’ll also need to select all of the regions before dragging them. If you don’t, you’re pretty much guaranteed to end up with a mess.
Rather than relying on your ears, you may find it helpful to view the waveforms of other tracks while dragging audio. If a kick is a bit early, you may want to align it with the corresponding bass note. Position the tracks next to one another, zoom them in far enough to see exactly where the bass note starts, and then drag the drum region forward or backward so that the two waveforms line up.
When you start cutting and sliding tracks, you may find that you’re introducing new audio glitches. Your first line of defense is to always cut audio regions apart at zero crossings. A zero crossing is the place in the waveform where it crosses the line running horizontally along the middle of the track (FIG. 3). If you haven’t snipped the waveform at a zero crossing, when you move it you’ll hear a click on playback.
FIG. 3. A stereo drum track loaded into Steinberg WaveLab 6 for editing. The left channel is in the top half of the view area and the right channel is in the bottom half. The zero crossings are the spots where the waveform (the squiggly line) crosses the dotted line. The area in inverse video (at right) has been selected for an edit operation; I’ve chosen a zero crossing for the start of the edit in order to avoid clicks and pops.
With stereo or multitrack recordings, you probably won’t be able to find a zero crossing at the same spot on all of the tracks. In this case, you can apply a quick fade-out and fade-in (no more than a millisecond or two) to the edges of the audio regions to eliminate the clicks.
When snipping audio apart with the scissors tool or dragging it forward or backward in a track, you need to be aware of the timing resolution with which edits are handled. With some software, zooming in on the waveform image will allow you to make very fine adjustments. Other programs may “snap” the audio segment to the smallest increments in the time ruler (see the discussion of PPQ in the “Units” sidebar). If the audio segment jumps when you try to nudge it only slightly, check the manual and look in the program’s Preferences box for a way to change that setting.
Cop A Feel. The point of these computer audio shenanigans is to produce tracks that feel right. The bad news is that the computer turns the question of musical feel into a series of mathematical abstractions. The good news is that if you’re prepared to deal with the abstractions, computer recorders put some amazing tools at your fingertips.