In this tutorial, we will learn to build both simple and deep convolutional GAN models with the help of TensorFlow and Keras deep learning frameworks.
[box type="note" align="" class="" width=""]This article is an excerpt taken from the book Mastering TensorFlow 1.x written by Armando Fandango.[/box]
Simple GAN with TensorFlow
For building the GAN with TensorFlow, we build three networks, two discriminator models, and one generator model with the following steps:
Start by adding the hyper-parameters for defining the network:
# graph hyperparameters
g_learning_rate = 0.00001
d_learning_rate = 0.01
n_x = 784 # number of pixels in the MNIST image
# number of hidden layers for generator and discriminator
g_n_layers = 3
d_n_layers = 1
# neurons in each hidden layer
g_n_neurons = [256, 512, 1024]
d_n_neurons = [256]
# define parameter ditionary
d_params = {}
g_params = {}
activation = tf.nn.leaky_relu
w_initializer = tf.glorot_uniform_initializer
b_initializer = tf.zeros_initializer
Next, define the generator network:
z_p = tf.placeholder(dtype=tf.float32, name='z_p',
shape=[None, n_z])
layer = z_p
# add generator network weights, biases and layers
with tf.variable_scope('g'):
for i in range(0, g_n_layers):
w_name = 'w_{0:04d}'.format(i)
g_params[w_name] = tf.get_variable(
name=w_name,
shape=[n_z if i == 0 else g_n_neurons[i - 1],
g_n_neurons[i]],
initializer=w_initializer())
b_name = 'b_{0:04d}'.format(i)
g_params[b_name] = tf.get_variable(
name=b_name, shape=[g_n_neurons[i]],
initializer=b_initializer())
layer = activation(
tf.matmul(layer, g_params[w_name]) + g_params[b_name])
# output (logit) layer
i = g_n_layers
w_name = 'w_{0:04d}'.format(i)
g_params[w_name] = tf.get_variable(
name=w_name,
shape=[g_n_neurons[i - 1], n_x],
initializer=w_initializer())
b_name = 'b_{0:04d}'.format(i)
g_params[b_name] = tf.get_variable(
name=b_name, shape=[n_x], initializer=b_initializer())
g_logit = tf.matmul(layer, g_params[w_name]) + g_params[b_name]
g_model = tf.nn.tanh(g_logit)
Next, define the weights and biases for the two discriminator networks that we shall build:
with tf.variable_scope('d'):
for i in range(0, d_n_layers):
w_name = 'w_{0:04d}'.format(i)
d_params[w_name] = tf.get_variable(
name=w_name,
shape=[n_x if i == 0 else d_n_neurons[i - 1],
d_n_neurons[i]],
initializer=w_initializer())
b_name = 'b_{0:04d}'.format(i)
d_params[b_name] = tf.get_variable(
name=b_name, shape=[d_n_neurons[i]],
initializer=b_initializer())
#output (logit) layer
i = d_n_layers
w_name = 'w_{0:04d}'.format(i)
d_params[w_name] = tf.get_variable(
name=w_name, shape=[d_n_neurons[i - 1], 1],
initializer=w_initializer())
b_name = 'b_{0:04d}'.format(i)
d_params[b_name] = tf.get_variable(
name=b_name, shape=[1], initializer=b_initializer())
Now using these parameters, build the discriminator that takes the real images as input and outputs the classification:
# define discriminator_real
# input real images
x_p = tf.placeholder(dtype=tf.float32, name='x_p',
shape=[None, n_x])
layer = x_p
with tf.variable_scope('d'):
for i in range(0, d_n_layers):
w_name = 'w_{0:04d}'.format(i)
b_name = 'b_{0:04d}'.format(i)
layer = activation(
tf.matmul(layer, d_params[w_name]) + d_params[b_name])
layer = tf.nn.dropout(layer,0.7)
#output (logit) layer
i = d_n_layers
w_name = 'w_{0:04d}'.format(i)
b_name = 'b_{0:04d}'.format(i)
d_logit_real = tf.matmul(layer,
d_params[w_name]) + d_params[b_name]
d_model_real = tf.nn.sigmoid(d_logit_real)
Next, build another discriminator network, with the same parameters, but providing the output of generator as input:
# define discriminator_fake
# input generated fake images
z = g_model
layer = z
with tf.variable_scope('d'):
for i in range(0, d_n_layers):
w_name = 'w_{0:04d}'.format(i)
b_name = 'b_{0:04d}'.format(i)
layer = activation(
tf.matmul(layer, d_params[w_name]) + d_params[b_name])
layer = tf.nn.dropout(layer,0.7)
#output (logit) layer
i = d_n_layers
w_name = 'w_{0:04d}'.format(i)
b_name = 'b_{0:04d}'.format(i)
d_logit_fake = tf.matmul(layer,
d_params[w_name]) + d_params[b_name]
d_model_fake = tf.nn.sigmoid(d_logit_fake)
Now that we have the three networks built, the connection between them is made using the loss, optimizer and training functions. While training the generator, we only train the generator's parameters and while training the discriminator, we only train the discriminator's parameters. We specify this using the var_list parameter to the optimizer's minimize() function. Here is the complete code for defining the loss, optimizer and training function for both kinds of network:
g_loss = -tf.reduce_mean(tf.log(d_model_fake))
d_loss = -tf.reduce_mean(tf.log(d_model_real) + tf.log(1 - d_model_fake))
g_optimizer = tf.train.AdamOptimizer(g_learning_rate)
d_optimizer = tf.train.GradientDescentOptimizer(d_learning_rate)
g_train_op = g_optimizer.minimize(g_loss,
var_list=list(g_params.values()))
d_train_op = d_optimizer.minimize(d_loss,
var_list=list(d_params.values()))
Now that we have defined the models, we have to train the models. The training is done as per the following algorithm:
For each epoch:
For each batch:
get real images x_batch
generate noise z_batch
train discriminator using z_batch and x_batch
generate noise z_batch
train generator using z_batch
The complete code for training from the notebook is as follows:
n_epochs = 400
batch_size = 100
n_batches = int(mnist.train.num_examples / batch_size)
n_epochs_print = 50
with tf.Session() as tfs:
tfs.run(tf.global_variables_initializer())
for epoch in range(n_epochs):
epoch_d_loss = 0.0
epoch_g_loss = 0.0
for batch in range(n_batches):
x_batch, _ = mnist.train.next_batch(batch_size)
x_batch = norm(x_batch)
z_batch = np.random.uniform(-1.0,1.0,size=[batch_size,n_z])
feed_dict = {x_p: x_batch,z_p: z_batch}
_,batch_d_loss = tfs.run([d_train_op,d_loss],
feed_dict=feed_dict)
z_batch = np.random.uniform(-1.0,1.0,size=[batch_size,n_z])
feed_dict={z_p: z_batch}
_,batch_g_loss = tfs.run([g_train_op,g_loss],
feed_dict=feed_dict)
epoch_d_loss += batch_d_loss
epoch_g_loss += batch_g_loss
if epoch%n_epochs_print == 0:
average_d_loss = epoch_d_loss / n_batches
average_g_loss = epoch_g_loss / n_batches
print('epoch: {0:04d} d_loss = {1:0.6f} g_loss = {2:0.6f}'
.format(epoch,average_d_loss,average_g_loss))
# predict images using generator model trained
x_pred = tfs.run(g_model,feed_dict={z_p:z_test})
display_images(x_pred.reshape(-1,pixel_size,pixel_size))
We printed the generated images every 50 epochs:
As we can see the generator was producing just noise in epoch 0, but by epoch 350, it got trained to produce much better shapes of handwritten digits. You can try experimenting with epochs, regularization, network architecture and other hyper-parameters to see if you can produce even faster and better results.
Simple GAN with Keras
Now let us implement the same model in Keras:
The hyper-parameter definitions remain the same as the last section:
# graph hyperparameters
g_learning_rate = 0.00001
d_learning_rate = 0.01
n_x = 784 # number of pixels in the MNIST image
# number of hidden layers for generator and discriminator
g_n_layers = 3
d_n_layers = 1
# neurons in each hidden layer
g_n_neurons = [256, 512, 1024]
d_n_neurons = [256]
Next, define the generator network:
# define generator
g_model = Sequential()
g_model.add(Dense(units=g_n_neurons[0],
input_shape=(n_z,),
name='g_0'))
g_model.add(LeakyReLU())
for i in range(1,g_n_layers):
g_model.add(Dense(units=g_n_neurons[i],
name='g_{}'.format(i)
))
g_model.add(LeakyReLU())
g_model.add(Dense(units=n_x, activation='tanh',name='g_out'))
print('Generator:')
g_model.summary()
g_model.compile(loss='binary_crossentropy',
optimizer=keras.optimizers.Adam(lr=g_learning_rate)
)
This is what the generator model looks like:
In the Keras example, we do not define two discriminator networks as we defined in the TensorFlow example. Instead, we define one discriminator network and then stitch the generator and discriminator network into the GAN network. The GAN network is then used to train the generator parameters only, and the discriminator network is used to train the discriminator parameters:
# define discriminator
d_model = Sequential()
d_model.add(Dense(units=d_n_neurons[0],
input_shape=(n_x,),
name='d_0'
))
d_model.add(LeakyReLU())
d_model.add(Dropout(0.3))
for i in range(1,d_n_layers):
d_model.add(Dense(units=d_n_neurons[i],
name='d_{}'.format(i)
))
d_model.add(LeakyReLU())
d_model.add(Dropout(0.3))
d_model.add(Dense(units=1, activation='sigmoid',name='d_out'))
print('Discriminator:')
d_model.summary()
d_model.compile(loss='binary_crossentropy',
optimizer=keras.optimizers.SGD(lr=d_learning_rate)
)
This is what the discriminator models look:
Discriminator:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
d_0 (Dense) (None, 256) 200960
_________________________________________________________________
leaky_re_lu_4 (LeakyReLU) (None, 256) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 256) 0
_________________________________________________________________
d_out (Dense) (None, 1) 257
=================================================================
Total params: 201,217
Trainable params: 201,217
Non-trainable params: 0
_________________________________________________________________
Next, define the GAN Network, and turn the trainable property of the discriminator model to false, since GAN would only be used to train the generator:
# define GAN network
d_model.trainable=False
z_in = Input(shape=(n_z,),name='z_in')
x_in = g_model(z_in)
gan_out = d_model(x_in)
gan_model = Model(inputs=z_in,outputs=gan_out,name='gan')
print('GAN:')
gan_model.summary()
gan_model.compile(loss='binary_crossentropy',
optimizer=keras.optimizers.Adam(lr=g_learning_rate)
)
This is what the GAN model looks:
GAN:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
z_in (InputLayer) (None, 256) 0
_________________________________________________________________
sequential_1 (Sequential) (None, 784) 1526288
_________________________________________________________________
sequential_2 (Sequential) (None, 1) 201217
=================================================================
Total params: 1,727,505
Trainable params: 1,526,288
Non-trainable params: 201,217
_________________________________________________________________
Great, now that we have defined the three models, we have to train the models. The training is as per the following algorithm:
For each epoch:
For each batch:
get real images x_batch
generate noise z_batch
generate images g_batch using generator model
combine g_batch and x_batch into x_in and create labels y_out
set discriminator model as trainable
train discriminator using x_in and y_out
generate noise z_batch
set x_in = z_batch and labels y_out = 1
set discriminator model as non-trainable
train gan model using x_in and y_out,
(effectively training generator model)
For setting the labels, we apply the labels as 0.9 and 0.1 for real and fake images respectively. Generally, it is suggested that you use label smoothing by picking a random value from 0.0 to 0.3 for fake data and 0.8 to 1.0 for real data.
Here is the complete code for training from the notebook:
n_epochs = 400
batch_size = 100
n_batches = int(mnist.train.num_examples / batch_size)
n_epochs_print = 50
for epoch in range(n_epochs+1):
epoch_d_loss = 0.0
epoch_g_loss = 0.0
for batch in range(n_batches):
x_batch, _ = mnist.train.next_batch(batch_size)
x_batch = norm(x_batch)
z_batch = np.random.uniform(-1.0,1.0,size=[batch_size,n_z])
g_batch = g_model.predict(z_batch)
x_in = np.concatenate([x_batch,g_batch])
y_out = np.ones(batch_size*2)
y_out[:batch_size]=0.9
y_out[batch_size:]=0.1
d_model.trainable=True
batch_d_loss = d_model.train_on_batch(x_in,y_out)
z_batch = np.random.uniform(-1.0,1.0,size=[batch_size,n_z])
x_in=z_batch
y_out = np.ones(batch_size)
d_model.trainable=False
batch_g_loss = gan_model.train_on_batch(x_in,y_out)
epoch_d_loss += batch_d_loss
epoch_g_loss += batch_g_loss
if epoch%n_epochs_print == 0:
average_d_loss = epoch_d_loss / n_batches
average_g_loss = epoch_g_loss / n_batches
print('epoch: {0:04d} d_loss = {1:0.6f} g_loss = {2:0.6f}'
.format(epoch,average_d_loss,average_g_loss))
# predict images using generator model trained
x_pred = g_model.predict(z_test)
display_images(x_pred.reshape(-1,pixel_size,pixel_size))
We printed the results every 50 epochs, up to 350 epochs:
The model slowly learns to generate good quality images of handwritten digits from the random noise. There are so many variations of the GANs that it will take another book to cover all the different kinds of GANs. However, the implementation techniques are almost similar to what we have shown here.
Deep Convolutional GAN with TensorFlow and Keras
In DCGAN, both the discriminator and generator are implemented using a Deep
Convolutional Network:
1. In this example, we decided to implement the generator as the following network:
Generator:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
g_in (Dense) (None, 3200) 822400
_________________________________________________________________
g_in_act (Activation) (None, 3200) 0
_________________________________________________________________
g_in_reshape (Reshape) (None, 5, 5, 128) 0
_________________________________________________________________
g_0_up2d (UpSampling2D) (None, 10, 10, 128) 0
_________________________________________________________________
g_0_conv2d (Conv2D) (None, 10, 10, 64) 204864
_________________________________________________________________
g_0_act (Activation) (None, 10, 10, 64) 0
_________________________________________________________________
g_1_up2d (UpSampling2D) (None, 20, 20, 64) 0
_________________________________________________________________
g_1_conv2d (Conv2D) (None, 20, 20, 32) 51232
_________________________________________________________________
g_1_act (Activation) (None, 20, 20, 32) 0
_________________________________________________________________
g_2_up2d (UpSampling2D) (None, 40, 40, 32) 0
_________________________________________________________________
g_2_conv2d (Conv2D) (None, 40, 40, 16) 12816
_________________________________________________________________
g_2_act (Activation) (None, 40, 40, 16) 0
_________________________________________________________________
g_out_flatten (Flatten) (None, 25600) 0
_________________________________________________________________
g_out (Dense) (None, 784) 20071184
=================================================================
Total params: 21,162,496
Trainable params: 21,162,496
Non-trainable params: 0
The generator is a stronger network having three convolutional layers followed by tanh activation. We define the discriminator network as follows:
Discriminator:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
d_0_reshape (Reshape) (None, 28, 28, 1) 0
_________________________________________________________________
d_0_conv2d (Conv2D) (None, 28, 28, 64) 1664
_________________________________________________________________
d_0_act (Activation) (None, 28, 28, 64) 0
_________________________________________________________________
d_0_maxpool (MaxPooling2D) (None, 14, 14, 64) 0
_________________________________________________________________
d_out_flatten (Flatten) (None, 12544) 0
_________________________________________________________________
d_out (Dense) (None, 1) 12545
=================================================================
Total params: 14,209
Trainable params: 14,209
Non-trainable params: 0
_________________________________________________________________
The GAN network is composed of the discriminator and generator as demonstrated previously:
GAN:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
z_in (InputLayer) (None, 256) 0
_________________________________________________________________
g (Sequential) (None, 784) 21162496
_________________________________________________________________
d (Sequential) (None, 1) 14209
=================================================================
Total params: 21,176,705
Trainable params: 21,162,496
Non-trainable params: 14,209
_________________________________________________________________
When we run this model for 400 epochs, we get the following output:
As you can see, the DCGAN is able to generate high-quality digits starting from epoch 100 itself. The DGCAN has been used for style transfer, generation of images and titles and for image algebra, namely taking parts of one image and adding that to parts of another image.
We built a simple GAN in TensorFlow and Keras and applied it to generate images from the MNIST dataset. We also built a DCGAN where the generator and discriminator consisted of convolutional networks.
Do check out the book Mastering TensorFlow 1.x to explore advanced features of TensorFlow 1.x and obtain in-depth knowledge of TensorFlow for solving artificial intelligence problems.
5 reasons to learn Generative Adversarial Networks (GANs) in 2018
Implementing a simple Generative Adversarial Network (GANs)
Getting to know Generative Models and their types
Read more