Audio File Formats Demystified

Formats

I know what we need. It’s an ADA (another damn acronym)! It seems like the technology geeks who are reengineering our world know how to program and speak in code. The goal of this guide is to help you find your way through the maze of the most common file formats used in digital music making.

Uncompressed Audio Formats

When audio is digitized, it is transformed from analog sound pressure levels to numbers. Your computer, stereo system, or music player needs to understand the numbers and eventually turn all the little zeros and ones back into signals that your ears can hear and your brain can understand. This is where the world of audio file formats comes into play.

Audio files are broken down into two different groups — uncompressed (lossless) and compressed (lossy). Uncompressed file formats are used when maximum audio quality is desired. While analog purists will argue that digitizing any signal results in a loss in quality, lossless audio files try their best to keep all the digitized data in its full form. In general terms, you’ll want to use lossless formats when the audio may be further edited or passed through other computer processing. Files that have been compressed with lossy codecs will only be degraded further if they are subject to additional computer modifications. Compressed formats are used when the highest-quality audio is not necessary or when storage space and transfer time are important considerations. We’ll first take a look at the two most common lossless audio formats.

AIFF

(Audio Interchange File Format). These files will often use the .aif extension. This is a noncompressed format that is used to store audio files primarily on Apple computers. However, most PC software can read AIFF files without a problem. The AIFF format was developed by Apple (some folks believe that the acronym stands for Apple Interchange File Format) and was based on the Electronic Arts Interchange File Format. One of the advantages of this format is the large amount of nonaudio data that can be stored along with the audio information. AIFF files often contain information such as name, author, copyright, and comments.

WAV

(Waveform Audio Format). The common extension for this type of file is .wav. This is a file format designed to store audio data on PCs, yet most Mac software will also open WAV files. While this format is a close cousin to AIFF, it makes some use of the special features available with Intel CPU machines. WAV files can hold audio that has been compressed, but they are most often used to house noncompressed audio. Due to the limitations of the format, WAV files can’t be larger than 2GB.

Compressed Audio Formats

formats

Because uncompressed audio produces large files, the popularity of working with compressed file formats has grown. CD-quality sound (44.1kHz, 16-bit stereo) eats approximately 10MB per minute. That makes a 10-minute uncompressed audio file about 100MB in size. Some audio compression schemes can reduce file size by a factor of ten or more. Without audio compression, it wouldn’t be possible to get all those songs on your iPod!

All compression formats use a codec of some sort. An audio codec is a process or a software program that compresses (co-) and then decompresses (-dec) an audio signal based on a number of rules and assumptions about how we perceive music. Compressed audio is further divided into codecs that are lossy and those that are lossless. With lossless audio compression formats, the size of the file is reduced but the quality of the music remains the same. In theory, the decompressed signal is exactly the same as the original signal. While lossy compression formats can offer a huge reduction in the file size, you can expect only about a 50-percent reduction when using lossless codecs.

AAC

(Advanced Audio Coding). This is a compressed and lossy file format that was developed to be an improvement over MP3 and designed for audio streaming. Some of the features that make AAC more robust than MP3 are the ability to sample frequencies from 8 Hz up to 96 kHz, the use of up to 48 channels, and higher coding efficiencies for stationary and transient signals. The AAC file format was first published in 1997 but came into the popular consciousness when Apple computer’s iPod began using this file format in 2003.

ASF

(Advanced Streaming Format; later changed to Advanced Systems Format). This file format is patented in the US by Microsoft and is used by Windows Media for both audio and video. ASF files are more like containers that can hold data compressed by a number of different codecs. WMA (Windows Media Audio) and WMV (Windows Media Video) are the two most common file types held inside an ASF file.

ALAC

(Apple Lossless Audio Codec). This format is also called Apple Lossless Encoder, or ALE for short. Similar to ASF files, ALAC files serve as a container to house files that have the extension m4a. This format was first introduced in a QuickTime upgrade in 2004 and implemented into Apple’s iTunes 4.5.

ATRAC

(Adaptive Transform Acoustic Coding). This algorithm was developed in 1991 by Sony as a file format for their Minidisc recorders. Two more recent versions, ATRAC3-LP2 and ATRAC3-LP4, enable the long play modes featured on many current Minidisc recorders. These two formats increase recording time up to 324 minutes on an 80-minute Minidisc.

CAF

(Core Audio Format). This is the newest addition to the audio format family and is supported natively in the Macintosh OS X operating system. Apple claims that this file format can record as many as a thousand channels of audio for as long as a thousand years (now that’s going to be a big file!). Like its cousins, it can serve as a container that can house both compressed and uncompressed files.

FLAC

(Free Lossless Audio Codec). The “free” part of this file format’s name means that the FLAC process is not covered by any patent. Like other lossless compression formats, FLAC can reduce an audio file by 30 to 50 percent. FLAC files are becoming more popular as a way to archive CD collections and as a way to transfer high-quality audio over the Internet. FLAC files can be read by a number of different programs on all major computer platforms.

m4a

(MPEG-4 Audio Standard). This is the file extension of an audio file that uses MPEG-4 Advanced Audio Coding. Apple made this format famous by using it for their iTunes software. Most software that will read MPEG-4 files will support the m4a format.

MP3

(Moving Picture Experts Group-1 Audio Layer 3). Without a doubt, this is currently the most popular audio compression format on the planet. The MP3 audio compression format was first developed as early as 1987 and finalized in 1992. The first commercial software to use the MP3 format was Winplay3 in 1995. Since that time, MP3 files started appearing on the Internet, and as they say, “the rest is history.” Today, the number of applications that can read or write MP3 files is enormous. It is the ubiquitous standard for sharing audio files over the Internet, and for good reason. It’s fast, it’s free (for the user), and it can drastically reduce the size of an audio file without totally destroying the quality. Like other lossy compression schemes, it does this by removing portions of the audio image that the algorithm determines can’t easily be heard. MP3 files can perform their compression magic at a number of different bit rates that can adjust the trade-off between file size and audio quality. Common bit rates range between 32 and 320 kilobits per second (Kbps), with 128 Kbps and 192 Kbps being the most popular compromise. For most civilians, an MP3 file at 128 Kbps will sound just fine. If you’re planning on listening to music files in a moving car, 128 Kbps might satisfy your ear. But for most musicians who are familiar with the sonic fingerprint of high-quality instruments, a 192 Kbps format or higher is often necessary.

OGG

(Off Vorbis). Vorbis is an audio codec that is often placed inside an OGG container. Together, they use the .ogg file extension. This format has been gaining popularity with those that support open source software, and it’s been showing up on several web sites around the world.

WMA

(Windows Media Audio). WMA files are compressed files that are played using Microsoft’s Windows Media Player. Originally, the format was developed to be a competitor to MP3; it’s now a competitor to Apple’s AAC format. The newest version of WMA is 9.1 and contains codecs for multichannel surround sound and lossless support. WMA files are most often contained inside an ASF file.

There are a number of other lossy and lossless file formats, and it seems that new ones are being developed at a surprising pace. Some, like Monkey’s Audio’s APE format, are only supported on the PC platform. Others, like RealPlayer’s RealAudio Lossless format, are designed to be read by a single host program (although other software may be able to import the files).

{pagebreak}

MIDI Format

formats

MIDI

is itself actually another acronym and stands for Musical Instrument Digital Interface. MIDI is a communication protocol rather than a file type, but files containing MIDI information are common.

SMF

(Standard MIDI File). The SMF specification was developed by the MIDI Manufacturers Associate (MMA) as a way to store the information contained in a MIDI performance. While software sequencers use proprietary file formats to handle their own files internally, just about any MIDI software program will let the user export to the SMF file type. Once saved, these files can be opened or imported into most music software programs. Using Standard MIDI Files, it’s possible to share a MIDI performance between a sequencing program and a notation program. It’s also possible to use the SMF format to transfer information between different computer platforms. SMF files are a common way for composers to share their compositions and transcriptions of performances on the Web.

There are three different MIDI file formats. Type-1 files are the most common, as they can contain an unlimited number of tracks and all the necessary information to reproduce an entire multitrack MIDI song performance. Type-0 files are much less common, as they contain all the MIDI messages inside a single track. Type-2 MIDI files are even more rare. If you want to send MIDI files over the Internet or collaborate with other musicians by trading MIDI files, you’ll want to use the Type 1 SMF format for the most musical flexibility.

Plug-In Formats

A plug-in is an additional computer program that runs inside another program, often called the “host” program. Plug-ins give a program additional features or expand the software’s functionality. If you’ve used image-editing programs, you may be familiar with plug-ins that alter the source image in some way. For example, you might run an image through a plug-in that is designed to make a photograph look like a watercolor painting or an old-time sepia tint photograph.

Audio plug-ins can be synths, samplers, signal processors, or other tools that help you work with or process digital audio. For example, plug-ins such as DFH Superior, Battery, or Absynth are sound generators that work within sequencing programs. Native Instruments’ Kontakt 2 is a popular soft sampler plug-in that operates in a similar manner. Other plug-ins, like iZotope’s Trash or Camel Audio’s CamelSpace, serve to modify the audio from the host program or even from other plug-ins.

When working with plug-ins, the host program provides a manner in which data is passed back and forth between the host program to the plug-in. The term API stands for Application Programming Interface, and it’s the set of rules that one piece of software uses to communicate with another piece of software. Below are some of the most common plug-in formats.

AU

(Audio Units). This plug-in format was developed by Apple Computer to add functionality to software running on their machines. It’s a system-level plug-in that is part of Apple’s Core Audio system in System X.

DirectX

. The DirectX format (originally called Game SDK) was designed to handle game programming on Microsoft Windows machines. Today, there are several DirectX-compatible music software applications and quite a few developers that offer their plug-ins in this format. The most current version is now DirectX 9.0c.

DSSI

(DSSI Soft Synth Instrument). This is the plug-in architecture for virtual instruments running the Linux operating system. Closely related to LADSPA, the DSSI label is used primarily for instruments, while LADSPA is used most often for other sound-processing applications.

LADSPA

(Linux Audio Developer’s Simple Plug-In API). LADSPA plug-ins are designed to run in host programs under the Linux operating system. Finalized in 2000, the format has quite a few plug-in developers. With the popularity of Linux expanding every year, you’re sure to see more plug-ins built for Linux in the future.

MAS

(MOTU Audio System). MAS is a proprietary format of the company Mark Of The Unicorn (commonly known as MOTU). MAS plug-ins include virtual instruments and audio effects.

RTAS

(Real Time Audio Suite). RTAS is a plug-in format that is used in the Pro Tools environment. While designed for Pro Tools, other host applications, like Digital Performer 4.5 or higher, are able to use RTAS plug-ins.

TDM

(Time-Division Multiplexing). Originally invented for the telephone industry, this is the way that Pro Tools communicates with other external input and output sources. It is also the file format used for plug-ins that can use Pro Tools as the host program.

VST

(Virtual Studio Technology). The VST format was developed by Steinberg for use in their Cubase software. Today, VST plug-ins (and there are hundreds of them) can be used in nearly every audio program on both the PC and Mac platforms. VST can be viewed as the de facto standard for audio plug-ins. VST plug-ins can either be soft instruments, such as synths or samplers, or effects such as reverbs, compressors, distortion boxes, and other methods of processing audio. Instruments are often given the VSTi acronym to distinguish themselves from effect plug-ins. The VST 2 format, released around 1999, added more flexibility to the original format.

If your host software and a particular plug-in aren’t compatible, all is not lost. There are a few companies that are making adapters or “wrappers” so that plug-ins written in one format can be used in a host that uses another. The fxpansion folks make popular programs that allows VST plug-ins to be used in host programs that read RTAS, AU, and Re-wire applications.

If you’re interested in learning more about plug-ins, check out kvraudio.com. It is the #1 web site for all audio plug-ins — instruments, effects, and hosts.

Sample Formats

If you’re at all involved with creating electronic music, at some point you’re sure to use sample libraries. It’s a fact that samplers (hardware and software) all read and play digital audio files, but how these files are incorporated into the sampler and how the sampler saves its files to a storage medium can be very different. For instance, if you’ve got a great drumkit in the NN-XT sampler inside Propellerhead’s Reason, you won’t be able to open that file directly into Cakewalk’s Project 5. Manufacturers have recently realized that it’s to their advantage to offer a product that is capable of reading a variety of file formats. Today, you can find soft samplers that claim to be “universal” in their ability to read a large number of proprietary formats and “interpret” them so that they can be played by the host sampler. Native Instruments’ Kontakt 2, for example, will natively read and interpret at least 20 different file formats.

It’s possible to purchase or download audio files in .wav and .aif formats that can be used to construct banks of sounds from scratch in any sampler format. However, if you don’t want to do all the work yourself, you can buy packages that have everything ready to go. Just copy the bank to your hard drive and then open it in your hardware or software sampler. The most popular sample formats are for Acid, Akai, E-mu, EXS24, Gigastudio, HALion, Kontakt, Kurzweil, Reason, Roland, RMX, SampleCell, and SoundFont.

One particular type of file is so unique that it deserves special attention. REX files (most often loops) are created by Propellerhead’s ReCycle Software. The original version of ReCycle exported REX files so that they could be used inside their Reason software package. An updated version, ReCycle 2.0 creates RX2 format files that use a compression codec to reduce the size of the file. REX files have been sliced into a number of different parts that can then be played back slice-by-slice. Using this technique, it’s possible to time-alter loops and alter the order of the various slices of file. REX files have become so popular that many software programs now read them natively.

Conversion Software

With so many different sampler formats, you might think that sharing samples between machines or software would be impossible. But just as there are programs that will convert one audio format into another, there are software programs designed to convert one sampler format into another. Two that this author has used and can recommend are CDXtract by Bernard Chavonnet and Translator by Chicken Systems. Both programs have a pretty wide matrix of source file formats to destination file formats.