Reader small image

You're reading from  Hands-On Music Generation with Magenta

Product typeBook
Published inJan 2020
Reading LevelExpert
Publisher
ISBN-139781838824419
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Alexandre DuBreuil
Alexandre DuBreuil
author image
Alexandre DuBreuil

Alexandre DuBreuil is a software engineer and generative music artist. Through collaborations with bands and artists, he has worked on many generative art projects, such as generative video systems for music bands in concerts that create visuals based on the underlying musical structure, a generative drawing software that creates new content based on a previous artist's work, and generative music exhibits in which the generation is based on real-time events and data. Machine learning has a central role in his music generation projects, and Alexandre has been using Magenta since its release for inspiration, music production, and as the cornerstone for making autonomous music generation systems that create endless soundscapes.
Read more about Alexandre DuBreuil

Right arrow

Assessments

Chapter 1: Introduction to Magenta and Generative Art

  1. Randomness.
  2. Markov chain.
  3. Algorave.
  4. Long short-term memory (LSTM).
  5. Autonomous systems generate music without operator input; assisting music systems will complement an artist while working.
  6. Symbolic: sheet music, MIDI, MusicXML, AbcNotation. Sub-symbolic: raw audio (waveform), spectrogram.
  7. "Note On" and "Note Off" timing, pitch between 1 and 127 kHz, velocity, and channel.
  8. At a sample rate of 96 kHz, the Nyquist frequency is 96 kHz/2 = 48 kHz and the frequency range is 0 to 48 kHz. This is worse for listening to audio since 28 kHz of audio is lost on the ear (remember anything over 20 khz cannot be heard), and that sampling rate is not properly supported by much audio equipment. It is useful in recording and audio editing though.
  9. A single musical note, A4, is played for 1 second loudly.
  10. Drums, voice (melody...

Chapter 2: Generating Drum Sequences with the Drums RNN

  1. Given a current sequence, predict the score for the next note, then do a prediction for each step you want to generate.

  2. (1) RNNs operate on sequences of vectors, for the input and output, which is good for sequential data such as a music score, and (2) keep an internal state composed of the previous output steps, which is good for doing a prediction based on past inputs, not only the current input.
  3. (1) First, the hidden layer will get h(t + 1), which is the output of the previous hidden layer, and (2) it will also receive x(t + 2), which is the input of the current step.
  4. The number of bars generated will be 2 bars, or 32 steps, since we have 16 steps per bar. At 80 QPM, each step takes 0.1875 seconds, because you take the number of seconds in a minute, divide by the QPM, and divide by the number of steps per quarter: 60...

Chapter 3: Generating Polyphonic Melodies

  1. Vanishing gradients (values get multiplied by small values in each RNN step) and exploding gradients are common RNN problems that occur when training during the backpropagation step. LSTM provides a dedicated cell state that is modified by forget, input, and output gates to alleviate those problems.

  2. Gated recurrent units (GRUs) are simpler but less expressive memory cells, where the forget and input gates are combined into a single update gate.

  3. For a 3/4 time signature, you need 3 steps per quarter note, times 4 steps per quarter note, which equals 12 steps per bar. For a binary step counter to count to 12, you need 5 bits (like for 4/4 time) that will only count to 12. For 3 lookbacks, you'll need to look at the past 3 bars, with each bar being 12 steps, so you have [36, 24, 12].
  4. The resulting vector is the sum of the previous...

Chapter 4: Latent Space Interpolation with MusicVAE

  1. The main use is dimensionality reduction, to force the network to learn important features, making it possible to reconstruct the original input. The downside of AE is that the latent space represented by the hidden layer is not continuous, making it hard to sample since the decoder won't be able to make sense of some of the points.

  2. The reconstruction loss penalizes the network when it creates outputs that are different from the input.
  3. In VAE, the latent space is continuous and smooth, making it possible to sample any point of the space and interpolate between two points. It is achieved by having the latent variables follow a probability distribution of P(z), often a Gaussian distribution.
  4. The KL divergence measures how much two probability distributions diverge from each other. When combined with the reconstruction loss...

Chapter 5: Audio Generation with NSynth and GANSynth

  1. You have to handle 16,000 samples per second (at least) and keep track of the general structure at a bigger time scale.
  2. NSynth is a WaveNet-style autoencoder that learns its own temporal embedding, making it possible to capture long term structure, and providing access to a useful hidden space.
  3. The colors in the rainbowgram are the 16 dimensions of the temporal embedding.
  4. Check the timestretch method in the audio_utils.py file in the chapter's code.

  1. GANSynth uses upsampling convolutions, making the training and generation processing in parallel possible for the entire audio sample.
  2. You need to sample the random normal distribution using np.random.normal(size=[10, 256]), where 10 is the number of sampled instruments, and 256 is the size of the latent vector (given by the latent_vector_size configuration).

...

Chapter 6: Data Preparation for Training

  1. MIDI is not a text format, so it is harder to use and modify, but it is extremely common. MusicXML is rather rare and cumbersome but has the advantage of being in text format. ABCNotation is also rather rare, but has the advantage of being in text format and closer to sheet music.
  2. Use the code from chapter_06_example_08.py, and change the program=43 in the extraction.
  3. There are 1,116 rock songs in LMD and 3,138 songs for jazz, blues, and country. Refer to chapter_06_example_02.py and chapter_06_example_03.py to see how to make statistics with genre information.
  4. Use the RepeatSequence class in melody_rnn_pipeline_example.py.
  5. Use the code from chapter_06_example_09.py. Yes, we can train a quantized model with it since the data preparation pipeline quantizes the input.
  6. For small datasets, data augmentation plays an essential role in creating...

Chapter 7: Training Magenta Models

  1. See chapter_07_example_03.py.
  2. A network that underfits is a network that hasn't reached its optimum, meaning it won't predict well with the evaluation data, because it fits poorly the training data (for now). It can be fixed by letting it train long enough, by adding more network capacity, and more data.

  1. A network that overfits is a network that has learned to predict the input but cannot generalize to values outside of its training set. It can be fixed by adding more data, by reducing the network capacity, or by using regularization techniques such as dropout.
  2. Early stopping.
  3. Read On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima, which explains that a larger batch size leads to sharp minimizers, which in turn leads to poorer generalization. Therefore it is worse in terms of efficiency, but might...

Chapter 8: Magenta in the Browser with Magenta.js

  1. We can train models using TensorFlow.js, but we cannot train models using Magenta.js. We need to train the models in Magenta using Python and import the resulting models in Magenta.js.
  2. The Web Audio API enables audio synthesis in the browser using audio nodes for generation, transformation, and routing. The easiest way to use it is to use an audio framework such as Tone.js.
  3. The method is randomSample and the argument is the pitch of the generated note. As an example, using 60 will result in a single note at MIDI pitch 60, or C4 in letter notation. This is also useful as a reference for pitching the note up or down using Tone.js.

  1. The method is sample and the number of instruments depends on the model that is being used. In our example, we've used the trio model, which generates three instruments. Using a melody model will...

Chapter 9: Making Magenta Interact with Music Applications

  1. A DAW will have more functions geared towards music production such as recording, audio, MIDI editing, effects and mastering, and song composition. A software synthesizer like FluidSynth will have less functionalities, but have the advantage of being lightweight and easy to use.

  2. Most music software won't open MIDI ports by themselves, so to send sequences back and forth between them we have to manually open ports.
  3. See the code in chapter_09_example_05.py in this chapter's code.
  4. Because syncing two pieces of software that have desynced requires restarting them. A MIDI clock enables syncing once per beat.
  5. Because Magenta Studio integrates with existing music production tools such as DAWs and doesn't require any technical knowledge, it makes AI-generated music available to a greater audience, which is ultimately...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Music Generation with Magenta
Published in: Jan 2020Publisher: ISBN-13: 9781838824419
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alexandre DuBreuil

Alexandre DuBreuil is a software engineer and generative music artist. Through collaborations with bands and artists, he has worked on many generative art projects, such as generative video systems for music bands in concerts that create visuals based on the underlying musical structure, a generative drawing software that creates new content based on a previous artist's work, and generative music exhibits in which the generation is based on real-time events and data. Machine learning has a central role in his music generation projects, and Alexandre has been using Magenta since its release for inspiration, music production, and as the cornerstone for making autonomous music generation systems that create endless soundscapes.
Read more about Alexandre DuBreuil