In this article, by Patrik Lechner, the author of Multimedia Programming Using Max/MSP and TouchDesigner, focuses on the audio-specific examples.
We will take a look at the following audio processing and generation techniques:
- Additive synthesis
- Subtractive synthesis
- Sampling
- Wave shaping
Nearly every example provided here might be understood very intuitively or taken apart in hours of math and calculation. It's up to you how deep you want to go, but in order to develop some intuition; we'll have to be using some amount of Digital Signal Processing (DSP) theory. We will briefly cover the DSP theory, but it is highly recommended that you study its fundamentals deeper to clearly understand this scientific topic in case you are not familiar with it already.
(For more resources related to this topic, see here.)
Basic audio principles
We already saw and stated that it's important to know, see, and hear what's happening along a signal way. If we work in the realm of audio, there are four most important ways to measure a signal, which are conceptually partly very different and offer a very broad perspective on audio signals if we always have all of them in the back of our heads. These are the following important ways:
- Numbers (actual sample values)
- Levels (such as RMS, LUFS, and dB FS)
- Transversal waves (waveform displays, so oscilloscopes)
- Spectra (an analysis of frequency components)
There are many more ways to think about audio or signals in general, but these are the most common and important ones. Let's use them inside Max right away to observe their different behavior. We'll feed some very basic signals into them: DC offset, a sinusoid, and noise. The one that might surprise you the most and get you thinking is the constant signal or DC offset (if it's digital-analog converted). In the following screenshot, you can see how the different displays react:

In general, one might think, we don't want any constant signals at all; we don't want any DC offset. However, we will use audio signals a lot to control things later, say, an LFO or sequencers that should run with great timing accuracy. Also, sometimes, we just add a DC offset to our audio streams by accident. You can see in the preceding screenshot, that a very slowly moving or constant signal can be observed best by looking at its value directly, for example, using the [number~] object. In a level display, the [meter~] or [levelmeter~] objects will seem to imply that the incoming signal is very loud, in fact, it should be at -6 dB Full Scale (FS). As it is very loud, we just can't hear anything since the frequency is infinitely low. This is reflected by the spectrum display too; we see a very low frequency at -6 dB. In theory, we should just see an infinitely thin spike at 0 Hz, so everything else can be considered an (inevitable but reducible) measuring error.
Audio synthesis
Awareness of these possibilities of viewing a signal and their constraints, and knowing how they actually work, will greatly increase our productivity. So let's get to actually synthesizing some waveforms. A good example of different views of a signal operation is Amplitude Modulation (AM); we will also try to formulate some other general principles using the example of AM.
Amplitude modulation
Amplitude modulation means the multiplication of a signal with an oscillator. This provides a method of generating sidebands, which is partial in a very easy, intuitive, and CPU-efficient way. Amplitude modulation seems like a word that has a very broad meaning and can be used as soon as we change a signal's amplitude by another signal. While this might be true, in the context of audio synthesis, it very specifically means the multiplication of two (most often sine) oscillators. Moreover, there is a distinction between AM and Ring Modulation. But before we get to this distinction, let's look at the following simple multiplication of two sine waves, and we are first going to look at the result in an oscilloscope as a wave:

So in the preceding screenshot, we can see the two sine waves and their product. If we imagine every pair of samples being multiplied, the operation seems pretty intuitive as the result is what we would expect. But what does this resulting wave really mean besides looking like a product of two sine waves? What does it sound like? The wave seems to have stayed in there certainly, right? Well, viewing the product as a wave and looking at the whole process in the time domain rather than the frequency domain is helpful but slightly misleading. So let's jump over to the following frequency domain and look what's happening with the spectrum:

So we can observe here that if we multiply a sine wave a with a sine wave b, a having a frequency of 1000 Hz and b a frequency of 100 Hz, we end up with two sine waves, one at 900 Hz and another at 1100 Hz. The original sine waves have disappeared. In general, we can say that the result of multiplying a and b is equal to adding and subtracting the frequencies. This is shown in the Equivalence to Sum and difference subpatcher (in the following screenshot, the two inlets to the spectrum display overlap completely, which might be hard to see):

So in the preceding screenshot, you see a basic AM patcher that produces sidebands that we can predict quite easily.
Multiplication is commutative; you will say, 1000 + 100 = 1100, 1000 - 100 = 900; that's alright, but what about 100 - 1000 and 100 + 1000? We get -900 and 1100 once again? It still works out, and the fact that it does has to do with negative frequencies, or the symmetry of a real frequency spectrum around 0.
So you can see that the two ways of looking at our signal and thinking about AM lend different opportunities and pitfalls. Here is another way to think about AM: it's the convolution of the two spectra. We didn't talk about convolution yet; we will at a later point. But keep it in mind or do a little research on your own; this aspect of AM is yet another interesting one.
Ring modulation versus amplitude modulation
The difference between ring modulation and what we call AM in this context is that the former one uses a bipolar modulator and the latter one uses a unipolar one. So actually, this is just about scaling and offsetting one of the factors. The difference in the outcome is yet a big one; if we keep one oscillator unipolar, the other one will be present in the outcome. If we do so, it starts making sense to call one oscillator on the carrier and the other (unipolar) on the modulator. Also, it therefore introduces a modulation depth that controls the amplitude of the sidebands. In the following screenshot, you can see the resulting spectrum; we have the original signal, so the carrier plus two sidebands, which are the original signals, are shifted up and down:

Therefore, you can see that AM has a possibility to roughen up our spectrum, which means we can use it to let through an original spectrum and add sidebands.
Tremolo
Tremolo (from the Latin word tremare, to shake or tremble) is a musical term, which means to change a sound's amplitude in regular short intervals. Many people confuse it with vibrato, which is a modulating pitch at regular intervals. AM is tremolo and FM is vibrato, and as a simple reminder, think that the V of vibrato is closer to the F of FM than to the A of AM.
So multiplying the two oscillators' results in a different spectrum. But of course, we can also use multiplication to scale a signal and to change its amplitude. If we wanted to have a sine wave that has a tremolo, that is an oscillating variation in amplitude, with, say, a frequency of 1 Hertz, we would again multiply two sine waves, one with 1000 Hz for example and another with a frequency of 0.5 Hz. Why 0.5 Hz? Think about a sine wave; it has two peaks per cycle, a positive one and a negative one.
We can visualize all that very well if we think about it in the time domain, looking at the result in an oscilloscope. But what about our view of the frequency domain? Well, let's go through it; when we multiply a sine with 1000 Hz and one with 0.5 Hz, we actually get two sine waves, one with 999.5 Hz and one with 100.5 Hz. Frequencies that close create beatings, since once in a while, their positive and negative peaks overlap, canceling out each other. In general, the frequency of the beating is defined by the difference in frequency, which is 1 Hz in this case. So we see, if we look at it this way, we come to the same result again of course, but this time, we actually think of two frequencies instead of one being attenuated.
Lastly, we could have looked up trigonometric identities to anticipate what happens if we multiply two sine waves. We find the following:

Here, φ and θ are the two angular frequencies multiplied by the time in seconds, for example:

This is the equation for the 1000 Hz sine wave.
Feedback
Feedback always brings the complexity of a system to the next level. It can be used to stabilize a system, but can also make a given system unstable easily. In a strict sense, in the context of DSP, stability means that for a finite input to a system, we get finite output. Obviously, feedback can give us infinite output for a finite input. We can use attenuated feedback, for example, not only to make our AM patches recursive, adding more and more sidebands, but also to achieve some surprising results as we will see in a minute. Before we look at this application, let's quickly talk about feedback in general.
In the digital domain, feedback always demands some amount of delay. This is because the evaluation of the chain of operations would otherwise resemble an infinite amount of operations on one sample. This is true for both the Max message domain (we get a stack overflow error if we use feedback without delaying or breaking the chain of events) and the MSP domain; audio will just stop working if we try it. So the minimum network for a feedback chain as a block diagram looks something like this:

In the preceding screenshot, X is the input signal and x[n] is the current input sample; Y is the output signal and y[n] is the current output sample. In the block marked Z-m, i is a delay of m samples (m being a constant). Denoting a delay with Z-m comes from a mathematical construct named the Z-transform. The a term is also a constant used to attenuate the feedback circle. If no feedback is involved, it's sometimes helpful to think about block diagrams as processing whole signals. For example, if you think of a block diagram that consists only of multiplication with a constant, it would make a lot of sense to think of its output signal as a scaled version of the input signal. We wouldn't think of the network's processing or its output sample by sample. However, as soon as feedback is involved, without calculation or testing, this is the way we should think about the network. Before we look at the Max version of things, let's look at the difference equation of the network to get a better feeling of the notation. Try to find it yourself before looking at it too closely!

In Max, or rather in MSP, we can introduce feedback as soon as we use a [tapin~] [tapout~] pair that introduces a delay. The minimum delay possible is the signal vector size. Another way is to simply use a [send~] and [receive~] pair in our loop. The [send~] and [receive~] pair will automatically introduce this minimum amount of delay if needed, so the delay will be introduced only if there is a feedback loop. If we need shorter delays and feedback, we have to go into the wonderful world of gen~. Here, our shortest delay time is one sample, and can be introduced via the [history] object. In the Fbdiagram.maxpat patcher, you can find a Max version, an MSP version, and a [gen~] version of our diagram. For the time being, let's just pretend that the gen domain is just another subpatcher/abstraction system that allows shorter delays with feedback and has a more limited set of objects that more or less work the same as the MSP ones. In the following screenshot, you can see the difference between the output of the MSP and the [gen~] domain. Obviously, the length of the delay time has quite an impact on the output. Also, don't forget that the MSP version's output will vary greatly depending on our vector size settings.

Let's return to AM now. Feedback can, for example, be used to duplicate and shift our spectrum again and again. In the following screenshot, you can see a 1000 Hz sine wave that has been processed by a recursive AM to be duplicated and shifted up and down with a 100 Hz spacing:

In the maybe surprising result, we can achieve with this technique is this: if we the modulating oscillator and the carrier have the same frequency, we end up with something that almost sounds like a sawtooth wave.
    
        Unlock access to the largest independent learning library in Tech for FREE!
        
            
                Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
                Renews at $19.99/month. Cancel anytime
             
            
         
     
 
Frequency modulation
Frequency modulation or FM is a technique that allows us to create a lot of frequency components out of just two oscillators, which is why it was used a lot back in the days when oscillators were a rare, expensive good, or CPU performance was low. Still, especially when dealing with real-time synthesis, efficiency is a crucial factor, and the huge variety of sounds that can be achieved with just two oscillators and very few parameters can be very useful for live performance and so on. The idea of FM is of course to modulate an oscillator's frequency. The basic, admittedly useless form is depicted in the following screenshot:

While trying to visualize what happens with the output in the time domain, we can imagine it as shown in the following screenshot. In the preceding screenshot, you can see the signal that is controlling the frequency. It is a sine wave with a frequency of 50 Hz, scaled and offset to range from -1000 to 5000, so the center or carrier frequency is 2000 Hz, which is modulated to an amount of 3000 Hz.

You can see the output of the modulated oscillator in the following screenshot:

If we extend the upper patch slightly, we end up with this:

Although you can't see it in the screenshot, the sidebands are appearing with a 100 Hz spacing here, that is, with a spacing equal to the modulator's frequency. Pretty similar to AM right? But depending on the modulation amount, we get more and more sidebands.
Controlling FM
If the ratio between F(c) and F(m) is an integer, we end up with a harmonic spectrum, therefore, it may be more useful to rather control F(m) indirectly via a ratio parameter as it's done inside the SimpleRatioAndIndex subpatcher. Also, an Index parameter is typically introduced to make an FM patch even more controllable. The modulation index is defined as follows:

Here, I is the index, Am is the amplitude of the modulation, what we called amount before, and fm is the modulator's frequency. So finally, after adding these two controls, we might arrive here:

FM offers a wide range of possibilities, for example, the fact that we have a simple control for how harmonic/inharmonic our spectrum is can be useful to synthesize the mostly noisy attack phase of many instruments if we drive the ratio and index with an envelope as it's done in the SimpleEnvelopeDriven subpatcher. However, it's also very easy to synthesize very artificial, strange sounds. This basically has the following two reasons:
- Firstly, the partials appearing have amplitudes governed by Bessel functions that may seem quite unpredictable; the partials sometimes seem to have random amplitudes.
- Secondly, negative frequencies and fold back. If we generate partials with frequencies below 0 Hz, it is equivalent to creating the same positive frequency. For frequencies greater than the sample rate/2 (sample rate/2 is what's called the Nyquist rate), the frequencies reflect back into the spectrum that can be described by our sampling rate (this is an effect also called aliasing). So at a sampling rate of 44,100 Hz, a partial with a frequency of -100 Hz will appear at 100 Hz, and a partial with a frequency of 43100 kHz will appear at 1000 Hz, as shown in the following screenshot:
 
 
So, for frequencies between the Nyquist frequency and the sampling frequency, what we hear is described by this:

Here, fs is the sampling rate, f0 is the frequency we hear, and fi is the frequency we are trying to synthesize. Since FM leads to many partials, this effect can easily come up, and can both be used in an artistically interesting manner or sometimes appear as an unwanted error. In theory, an FM signal's partials extend to even infinity, but the amplitudes become negligibly small. If we want to reduce this behavior, the [poly~] object can be used to oversample the process, generating a bit more headroom for high frequencies. The phenomenon of aliasing can be understood by thinking of a real (in contrast to imaginary) digital signal as having a symmetrical and periodical spectrum; let's not go into too much detail here and look at it in the time domain:

In the previous screenshot, we again tried to synthesize a sine wave with 43100 Hz (the dotted line) at a sampling rate of 44100 Hz. What we actually get is the straight black line, a sine with 1000 Hz. Each big black dot represents an actual sample, and there is only one single band-limited signal connecting them: the 1000 Hz wave that is only partly visible here (about half its wavelength).
Feedback
It is very common to use feedback with FM. We can even frequency modulate one oscillator with itself, making the algorithm even cheaper since we have only one table lookup. The idea of feedback FM quickly leads us to the idea of making networks of oscillators that can be modulated by each other, including feedback paths, but let's keep it simple for now. One might think that modulating one oscillator with itself should produce chaos; FM being a technique that is not the easiest to control, one shouldn't care for playing around with single operator feedback FM. But the opposite is the case. Single operator FM yields very predictable partials, as shown in the following screenshot, and in the Single OP FBFM subpatcher:

Again, we are using a gen~ patch, since we want to create a feedback loop and are heading for a short delay in the loop. Note that we are using the [param] object to pass a message into the gen~ object. What should catch your attention is that although the carrier frequency has been adjusted to 1000 Hz, the fundamental frequency in the spectrum is around 600 Hz. What can help us here is switching to phase modulation.
Phase modulation
If you look at the gen~ patch in the previous screenshot, you see that we are driving our sine oscillator with a phasor. The cycle object's phase inlet assumes an input that ranges from 0 to 1 instead of from 0 to 2π, as one might think. To drive a sine wave through one full cycle in math, we can use a variable ranging from 0 to 2π, so in the following formula, you can imagine t being provided by a phasor, which is the running phase. The 2π multiplication isn't necessary in Max since if we are using [cycle~], we are reading out a wavetable actually instead of really computing the sine or cosine of the input.

This is the most common form of denoting a running sinusoid with frequency f0 and phase φ. Try to come up with a formula that describes frequency modulation!
Simplifying the phases by setting it to zero, we can denote FM as follows:

This can be shown to be nearly identical to the following formula:

Here, f0 is the frequency of the carrier, fm is the frequency of the modulator, and A is the modulation amount.
Welcome to phase modulation. If you compare it, the previous formula actually just inserts a scaled sine wave where the phase φ used to be. So phase modulation is nearly identical to frequency modulation. Phase modulation has some advantages though, such as providing us with an easy method of synchronizing multiple oscillators. But let's go back to the Max side of things and look at a feedback phase modulation patch right away (ignoring simple phase modulation, since it really is so similar to FM):

This gen~ patcher resides inside the One OP FBPM subpatcher and implements phase modulation using one oscillator and feedback. Interestingly, the spectrum is very similar to the one of a sawtooth wave, with the feedback amount having a similar effect of a low-pass filter, controlling the amount of partials. If you take a look at the subpatcher, you'll find the following three sound sources:
- Our feedback FM gen~ patcher
- A [saw~] object for comparison
- A poly~ object
We have already mentioned the problem of aliasing and the [poly~] object has already been proposed to treat the problem. However, it allows us to define the quality of parts of patches in general, so let's talk about the object a bit before moving on since we will make great use of it. Before moving on, I would like to tell you that you can double-click on it to see what is loaded inside, and you will see that the subpatcher we just discussed contains a [poly~] object that contains yet another version of our gen~ patcher.
Summary
In this article, we've finally come to talking about audio. We've introduced some very common techniques and thought about refining them and getting things done properly and efficiently (think about poly~). By now, you should feel quite comfortable building synths that mix techniques such as FM, subtractive synthesis, and feature modulation, as well as using matrices for routing both audio and modulation signals where you need them.
Further resources on this subject: