Programming the audio component of a game is a lot easier these days, thanks to all the powerful audio libraries that are available. These libraries ease the burden on the developers by taking care of most of the low-level implementation details. While this is a good thing, it also makes it easier to dismiss the need to understand sound theory. For instance, we can easily play a sound file without knowing anything about its representation in memory.
However, even when we are using an audio library, there are still situations that will require some theoretical knowledge. For instance, we will often find parameters and function names related to the theory, such as the frequency of a sound, or the bit depth of an audio buffer. Knowing the meaning of these concepts is important to ensure that we are using them properly.
The goal of this chapter is to serve as a light introduction to the concepts that we will need the most during the course of this book.
Sound is created from the vibrations of objects. These vibrations produce variations in the atmospheric pressure which propagate away from the objects in the form of sound waves. Our ears are capable of detecting incoming sound waves and converting them into nerve signals that our brain interprets as sound.
One way to visualize sound is to draw a graph of the variations in the atmospheric pressure at each moment in time. However, understanding how those graphs relate to what we hear can be extremely complex. For that reason, we usually start by studying the simplest type of wave, the sine wave.
The sine wave is interesting for educational purposes, because we can easily identify two of the main properties of sound from it: volume and pitch. Most audio libraries allow us to control both of these properties for any sounds that we play.
Volume: This property corresponds to how loud or quiet the sound is. It depends directly on the amplitude (or the height) of the sound wave, as measured on the vertical axis. The main unit of volume is the decibel (dB), but most audio libraries use a scale between zero (silence) and one (full volume).
Pitch: This property determines how high or low the sound is. It depends on the frequency of the sound wave, which is the number of times that it repeats every second. The unit of frequency is the hertz (Hz). Two things that you should know about frequency are that the human ear can only hear frequencies within the 20 Hz and 20,000 Hz range, and that most sounds that you hear are actually a combination of several different frequencies.
Now that we know what sound is, let us turn our thoughts towards recording the sound and storing it on a computer. The first step in this process is to convert the sound wave into an electrical signal. When we use a continuous signal to represent another signal of a different quantity, we call it an analog signal or in the case of a sound wave, an analog audio signal. You are probably already familiar with the devices that perform this conversion:
Analog signals have many uses, but most computers cannot work with them directly. Computers can only operate on sequences of discrete binary numbers, also known as digital signals. We need to convert the analog signal recorded by the microphone into a digital signal, that is, digital audio, before the computer can understand it.
The most common method used to represent analog signals digitally is pulse code modulation (PCM). The general idea of PCM is to sample (or measure) the amplitude of the analog signal at fixed time intervals, and store the results as an array of numbers (called samples). Since the original data is continuous, and numbers on a computer are discrete, samples need to be rounded to the nearest available number, in a process known as quantization. Samples are usually stored as integer numbers, but it is also possible to use floating-point numbers as shown in the following example:
There are two ways to control the quality of the sampled audio:
Sampling rate: Also known as the sampling frequency, it is the amount of samples taken for each second of audio. According to the Nyquist sampling theorem, the sampling rate should be at least twice as high as the highest frequency of the analog signal, in order to allow a proper reconstruction. You will usually work with values of 44,100 Hz or 48,000 Hz. The following figure compares sampling at different rates:
Bit depth: Also known as the resolution, it is the amount of bits used to represent a single sample. This controls the number of possible discrete values that each sample can take, and needs to be high enough to avoid quantization errors. You will usually work with bit depths of 16 bits or 24 bits, stored as integer numbers, or 32 bits stored as floating-point numbers. The following figure compares sampling at different resolutions:
Another aspect that we should talk about is that many audio systems have more than one output. By sending different audio signals to separate outputs (called channels), it is possible to produce the illusion of directionality and space. The number of channels on these systems can vary from one (mono) or two (stereo), to several more on surround sound systems.
The PCM format described earlier can store audio for multiple channels at once, by interleaving one sample from each channel in the correct order. The following figure shows an example of this for a stereo system:
Besides volume and pitch, which we have examined earlier, there is another property that you will usually find in every audio library, called panning. Panning applies to stereo systems, and allows you to simulate the position of the sound, placing it anywhere between the left and the right channels. For positioning in configurations with more than two channels, you normally use other advanced features, such as 3D sound.
There are so many different file formats for storing audio on a computer that it is easy to feel overwhelmed at first. Thankfully, you will only use a couple of them in your games, most of the time. Audio file formats usually fall into one of the following categories:
Uncompressed audio files: These are audio files where the data is stored in its original state (normally PCM). This means that their data is already prepared for playback without any further processing. The downside is that they take up a lot of space on disc (approximately 10 MB for one minute of audio). For example, WAV and AIFF.
Lossless compression: These are audio files where the data is encoded using compression algorithms that only perform reversible changes, so that no information is permanently lost. These files can be up to half the size of the uncompressed formats, but need the computer to decode them before playback. For example, FLAC and APE.
Lossy compression: These are the audio files where the data is encoded using compression algorithms where some loss of the information is acceptable. These algorithms use heuristics to determine which parts of the data are less likely to be audible, in order to discard them. File sizes can be as small as 10 percent of the original size, although sound quality can suffer considerably if the compression is too strong. For example, MP3, WMA, and OGG.
Sequenced music : There are some formats that do not fit into any of the earlier mentioned categories. For example, MIDI files only store information about how the music should be played, but do not contain any sound data, leaving it to the computers to decide how they should be interpreted. For this reason, they are extremely small, but sound quality is limited, and varies from system to system. There are also hybrid formats such as MOD files (also known as module or tracker files), which are in many ways similar to MIDI files, but also contain any sound data that is required to play them (known as instruments).
Be aware that despite its popularity, the MP3 is a patented format, and you cannot use it commercially without paying royalties (refer to http://mp3licensing.com/ for more information). For this book, we will be using OGG files for long sounds, and WAV files for small sound effects.
In this chapter, we have seen that sound is a series of variations in atmospheric pressure, travelling in the form of sound waves. We also saw that sound waves have properties such as amplitude and frequency, which control how loud or high it is and that you can represent a sound wave using electrical signals (analog audio) and a series of numbers (digital audio). We learned that when converting an analog signal to a digital signal, you need to control the sampling rate and the bit depth. Finally, we saw that many audio systems have more than one output and that there are many different types of audio file formats.