You're reading from Hands-On Music Generation with Magenta

Product typeBook

Published inJan 2020

Reading LevelExpert

Publisher

ISBN-139781838824419

Edition1st Edition

Languages

Python

Tools

Magenta

Concepts

Machine Learning

Author (1)

Alexandre DuBreuil

Audio Generation with NSynth and GANSynth

In this chapter, we'll be looking into audio generation. We'll first provide an overview of WaveNet, an existing model for audio generation, especially efficient in text-to-speech applications. In Magenta, we'll use NSynth, a WaveNet autoencoder model, to generate small audio clips that can serve as instruments for a backing MIDI score. NSynth also enables audio transformations such as scaling, time stretching, and interpolation. We'll also use GANSynth, a faster approach based on Generative Adversarial Network (GAN).

The following topics will be covered in this chapter:

Learning about WaveNet and temporal structures for music
Neural audio synthesis with NSynth
Using GANSynth as a generative instrument

Technical requirements

In this chapter, we'll use the following tools:

The command line or Bash to launch Magenta from the Terminal
Python and its libraries to write music generation code using Magenta
Magenta to generate audio clips
Audacity to edit audio clips
Any media player to listen to the generated WAV files

In Magenta, we'll make the use of the NSynth and GANSynth models. We'll be explaining these models in depth, but if you feel like you need more information, the models' README in Magenta's source code (github.com/tensorflow/magenta/tree/master/magenta/models) is a good place to start. You can also take a look at Magenta's code, which is well documented. We also provide additional content in the Further reading section.

The code for this chapter is in this book's GitHub repository in the Chapter05 folder, located at github.com/PacktPublishing...

Learning about WaveNet and temporal structures for music

In the previous chapters, we've been generating symbolic content such as MIDI. In this chapter, we'll be looking at generating sub-symbolic content, such as raw audio. We'll be using the Waveform Audio File Format (WAVE or WAV, stored in a .wav file), a format containing uncompressed audio content, usable on pretty much every platform and device. See Chapter 1, Introduction on Magenta and Generative Art, for more information on waveforms in general.

Generating raw audio using neural nets is a rather recent feat, following the 2016 WaveNet paper, A Generative Model For Raw Audio. Other network architectures also perform well in audio generation, such as SampleRNN, also released in 2016 and used since to produce music tracks and albums (see databots for an example).

As stated in Chapter 2, Generating Drum Sequences...

Neural audio synthesis with NSynth

In this section, we'll be combining different audio clips together. We'll learn to encode the audio, optionally saving the resulting encodings on disk, mix (add) them, and then decode the added encodings to retrieve a sound clip.

We'll be handling 1-second audio clips only. There are two reasons for this: first, handling audio is costly, and second, we want to generate instrument notes in the form of short audio clips. The latter is interesting for us because we can then sequence the audio clips using MIDI generated by the models we've been using in the previous chapters. In that sense, you can view NSynth as a generative instrument, and the previous models, such as MusicVAE or Melody RNN, as a generative score (partition) composer. With both elements, we can generate full tracks, with audio and structure.

To generate sound...

Using GANSynth as a generative instrument

In the previous section, we used NSynth to generate new sound samples by combining existing sounds. You may have noticed that the audio synthesis process is very time-consuming. This is because autoregressive models, such as WaveNet, focus on a single audio sample, which makes the resulting reconstruction of the waveform really slow because it has to process them iteratively.

GANSynth, on the other hand, uses upsampling convolutions, making the training and generation processing in parallel possible for the entire audio sample. This is a major advantage over autoregressive models such as NSynth since those algorithms tend to be I/O bound on GPU hardware.

The results of GANSynth are impressive:

Training on the NSynth dataset converges in ~3-4 days on a single V100 GPU. For comparison, the NSynth WaveNet model converges in 10 days on 32...

Summary

In this chapter, we looked at audio generation using two models, NSynth and GANSynth, and produced many audio clips by interpolating samples and generating new instruments. We started by explaining what WaveNet models are and why they are used in audio generation, particularly in text-to-speech applications. We also introduced WaveNet autoencoders, an encoder and decoder network capable of learning its own temporal embedding. We talked about audio visualization using the reduced dimension of the latent space in rainbowgrams.

Then, we showed the NSynth dataset and the NSynth neural instrument. By showing an example of combining pairs of sounds, we learned how to mix two different encodings together in order to then synthesize the result into new sounds. Finally, we looked at the GANSynth model, a more performant model for audio generation. We showed the example of generating...

Questions

Why is generating audio hard?
What makes the WaveNet autoencoder interesting?
What are the different colors in a rainbowgram? How many are there?
How would you timestretch an audio clip, slowing it down by 2 seconds, using NSynth?
Why is GANSynth faster that NSynth?
What code is required to sample 10 instruments from GANSynth latent space?

Audio Signals in Python: An article on plotting audio signals in Python, explaining how to create a CQT plot (myinspirationinformation.com/uncategorized/audio-signals-in-python/)
Constant-Q transform toolbox for music processing: A paper (2010) on implementing CQTs for music (www.researchgate.net/publication/228523955_Constant-Q_transform_toolbox_for_music_processing)
WaveNet: A generative model for raw audio: A DeepMind article on WaveNet models for raw audio (deepmind.com/blog/article/wavenet-generative-model-raw-audio)
WaveNet: A Generative Model for Raw Audio: A WaveNet paper (2016) (arxiv.org/abs/1609.03499)
SampleRNN: An article explaining the differences between WaveNet and SampleRNN (deepsound.io/samplernn_first.html)
NSynth: Neural Audio Synthesis: A Magenta article on the NSynth model (magenta.tensorflow.org/nsynth)
Making a Neural Synthesizer Instrument...

The rest of the chapter is locked

You have been reading a chapter from

Hands-On Music Generation with Magenta

Published in: Jan 2020Publisher: ISBN-13: 9781838824419

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Alexandre DuBreuil

Alexandre DuBreuil is a software engineer and generative music artist. Through collaborations with bands and artists, he has worked on many generative art projects, such as generative video systems for music bands in concerts that create visuals based on the underlying musical structure, a generative drawing software that creates new content based on a previous artist's work, and generative music exhibits in which the generation is based on real-time events and data. Machine learning has a central role in his music generation projects, and Alexandre has been using Magenta since its release for inspiration, music production, and as the cornerstone for making autonomous music generation systems that create endless soundscapes.
Read more about Alexandre DuBreuil

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages