Reader small image

You're reading from  Hands-On Music Generation with Magenta

Product typeBook
Published inJan 2020
Reading LevelExpert
Publisher
ISBN-139781838824419
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Alexandre DuBreuil
Alexandre DuBreuil
author image
Alexandre DuBreuil

Alexandre DuBreuil is a software engineer and generative music artist. Through collaborations with bands and artists, he has worked on many generative art projects, such as generative video systems for music bands in concerts that create visuals based on the underlying musical structure, a generative drawing software that creates new content based on a previous artist's work, and generative music exhibits in which the generation is based on real-time events and data. Machine learning has a central role in his music generation projects, and Alexandre has been using Magenta since its release for inspiration, music production, and as the cornerstone for making autonomous music generation systems that create endless soundscapes.
Read more about Alexandre DuBreuil

Right arrow

Audio Generation with NSynth and GANSynth

In this chapter, we'll be looking into audio generation. We'll first provide an overview of WaveNet, an existing model for audio generation, especially efficient in text-to-speech applications. In Magenta, we'll use NSynth, a WaveNet autoencoder model, to generate small audio clips that can serve as instruments for a backing MIDI score. NSynth also enables audio transformations such as scaling, time stretching, and interpolation. We'll also use GANSynth, a faster approach based on Generative Adversarial Network (GAN).

The following topics will be covered in this chapter:

  • Learning about WaveNet and temporal structures for music
  • Neural audio synthesis with NSynth
  • Using GANSynth as a generative instrument

Technical requirements

In this chapter, we'll use the following tools:

  • The command line or Bash to launch Magenta from the Terminal
  • Python and its libraries to write music generation code using Magenta
  • Magenta to generate audio clips
  • Audacity to edit audio clips
  • Any media player to listen to the generated WAV files

In Magenta, we'll make the use of the NSynth and GANSynth models. We'll be explaining these models in depth, but if you feel like you need more information, the models' README in Magenta's source code (github.com/tensorflow/magenta/tree/master/magenta/models) is a good place to start. You can also take a look at Magenta's code, which is well documented. We also provide additional content in the Further reading section.

The code for this chapter is in this book's GitHub repository in the Chapter05 folder, located at github.com/PacktPublishing...

Learning about WaveNet and temporal structures for music

In the previous chapters, we've been generating symbolic content such as MIDI. In this chapter, we'll be looking at generating sub-symbolic content, such as raw audio. We'll be using the Waveform Audio File Format (WAVE or WAV, stored in a .wav file), a format containing uncompressed audio content, usable on pretty much every platform and device. See Chapter 1, Introduction on Magenta and Generative Art, for more information on waveforms in general.

Generating raw audio using neural nets is a rather recent feat, following the 2016 WaveNet paper, A Generative Model For Raw Audio. Other network architectures also perform well in audio generation, such as SampleRNN, also released in 2016 and used since to produce music tracks and albums (see databots for an example).

As stated in Chapter 2, Generating Drum Sequences...

Neural audio synthesis with NSynth

In this section, we'll be combining different audio clips together. We'll learn to encode the audio, optionally saving the resulting encodings on disk, mix (add) them, and then decode the added encodings to retrieve a sound clip.

We'll be handling 1-second audio clips only. There are two reasons for this: first, handling audio is costly, and second, we want to generate instrument notes in the form of short audio clips. The latter is interesting for us because we can then sequence the audio clips using MIDI generated by the models we've been using in the previous chapters. In that sense, you can view NSynth as a generative instrument, and the previous models, such as MusicVAE or Melody RNN, as a generative score (partition) composer. With both elements, we can generate full tracks, with audio and structure.

To generate sound...

Using GANSynth as a generative instrument

In the previous section, we used NSynth to generate new sound samples by combining existing sounds. You may have noticed that the audio synthesis process is very time-consuming. This is because autoregressive models, such as WaveNet, focus on a single audio sample, which makes the resulting reconstruction of the waveform really slow because it has to process them iteratively.

GANSynth, on the other hand, uses upsampling convolutions, making the training and generation processing in parallel possible for the entire audio sample. This is a major advantage over autoregressive models such as NSynth since those algorithms tend to be I/O bound on GPU hardware.

The results of GANSynth are impressive:

  • Training on the NSynth dataset converges in ~3-4 days on a single V100 GPU. For comparison, the NSynth WaveNet model converges in 10 days on 32...

Summary

In this chapter, we looked at audio generation using two models, NSynth and GANSynth, and produced many audio clips by interpolating samples and generating new instruments. We started by explaining what WaveNet models are and why they are used in audio generation, particularly in text-to-speech applications. We also introduced WaveNet autoencoders, an encoder and decoder network capable of learning its own temporal embedding. We talked about audio visualization using the reduced dimension of the latent space in rainbowgrams.

Then, we showed the NSynth dataset and the NSynth neural instrument. By showing an example of combining pairs of sounds, we learned how to mix two different encodings together in order to then synthesize the result into new sounds. Finally, we looked at the GANSynth model, a more performant model for audio generation. We showed the example of generating...

Questions

  1. Why is generating audio hard?
  2. What makes the WaveNet autoencoder interesting?
  3. What are the different colors in a rainbowgram? How many are there?
  4. How would you timestretch an audio clip, slowing it down by 2 seconds, using NSynth?
  5. Why is GANSynth faster that NSynth?
  6. What code is required to sample 10 instruments from GANSynth latent space?

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Music Generation with Magenta
Published in: Jan 2020Publisher: ISBN-13: 9781838824419
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alexandre DuBreuil

Alexandre DuBreuil is a software engineer and generative music artist. Through collaborations with bands and artists, he has worked on many generative art projects, such as generative video systems for music bands in concerts that create visuals based on the underlying musical structure, a generative drawing software that creates new content based on a previous artist's work, and generative music exhibits in which the generation is based on real-time events and data. Machine learning has a central role in his music generation projects, and Alexandre has been using Magenta since its release for inspiration, music production, and as the cornerstone for making autonomous music generation systems that create endless soundscapes.
Read more about Alexandre DuBreuil