Coursera

Deep Convolutional GAN (DCGAN)

Goal

In this notebook, you’re going to create another GAN using the MNIST dataset. You will implement a Deep Convolutional GAN (DCGAN), a very successful and influential GAN model developed in 2015.

Note: here is the paper if you are interested! It might look dense now, but soon you’ll be able to understand many parts of it :)

Learning Objectives

Get hands-on experience making a widely used GAN: Deep Convolutional GAN (DCGAN).
Train a powerful generative model.

Generator architecture

Figure: Architectural drawing of a generator from DCGAN from Radford et al (2016).

Getting Started

DCGAN

Here are the main features of DCGAN (don’t worry about memorizing these, you will be guided through the implementation!):

Use convolutions without any pooling layers
Use batchnorm in both the generator and the discriminator
Don’t use fully connected hidden layers
Use ReLU activation in the generator for all layers except for the output, which uses a Tanh activation.
Use LeakyReLU activation in the discriminator for all layers except for the output, which does not use an activation

You will begin by importing some useful packages and data that will help you create your GAN. You are also provided a visualizer function to help see the images your GAN will create.

import torch
from torch import nn
from tqdm.auto import tqdm
from torchvision import transforms
from torchvision.datasets import MNIST
from torchvision.utils import make_grid
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
torch.manual_seed(0) # Set for testing purposes, please do not change!


def show_tensor_images(image_tensor, num_images=25, size=(1, 28, 28)):
    '''
    Function for visualizing images: Given a tensor of images, number of images, and
    size per image, plots and prints the images in an uniform grid.
    '''
    image_tensor = (image_tensor + 1) / 2
    image_unflat = image_tensor.detach().cpu()
    image_grid = make_grid(image_unflat[:num_images], nrow=5)
    plt.imshow(image_grid.permute(1, 2, 0).squeeze())
    plt.show()

Generator

The first component you will make is the generator. You may notice that instead of passing in the image dimension, you will pass the number of image channels to the generator. This is because with DCGAN, you use convolutions which don’t depend on the number of pixels on an image. However, the number of channels is important to determine the size of the filters.

You will build a generator using 4 layers (3 hidden layers + 1 output layer). As before, you will need to write a function to create a single block for the generator’s neural network.

Since in DCGAN the activation function will be different for the output layer, you will need to check what layer is being created. You are supplied with some tests following the code cell so you can see if you’re on the right track!

At the end of the generator class, you are given a forward pass function that takes in a noise vector and generates an image of the output dimension using your neural network. You are also given a function to create a noise vector. These functions are the same as the ones from the last assignment.

Optional hint for make_gen_block

You’ll find nn.ConvTranspose2d and nn.BatchNorm2d useful!

# UNQ_C1 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION: Generator
class Generator(nn.Module):
    '''
    Generator Class
    Values:
        z_dim: the dimension of the noise vector, a scalar
        im_chan: the number of channels in the images, fitted for the dataset used, a scalar
              (MNIST is black-and-white, so 1 channel is your default)
        hidden_dim: the inner dimension, a scalar
    '''
    def __init__(self, z_dim=10, im_chan=1, hidden_dim=64):
        super(Generator, self).__init__()
        self.z_dim = z_dim
        # Build the neural network
        self.gen = nn.Sequential(
            self.make_gen_block(z_dim, hidden_dim * 4),
            self.make_gen_block(hidden_dim * 4, hidden_dim * 2, kernel_size=4, stride=1),
            self.make_gen_block(hidden_dim * 2, hidden_dim),
            self.make_gen_block(hidden_dim, im_chan, kernel_size=4, final_layer=True),
        )

    def make_gen_block(self, input_channels, output_channels, kernel_size=3, stride=2, final_layer=False):
        '''
        Function to return a sequence of operations corresponding to a generator block of DCGAN, 
        corresponding to a transposed convolution, a batchnorm (except for in the last layer), and an activation.
        Parameters:
            input_channels: how many channels the input feature representation has
            output_channels: how many channels the output feature representation should have
            kernel_size: the size of each convolutional filter, equivalent to (kernel_size, kernel_size)
            stride: the stride of the convolution
            final_layer: a boolean, true if it is the final layer and false otherwise 
                      (affects activation and batchnorm)
        '''

        #     Steps:
        #       1) Do a transposed convolution using the given parameters.
        #       2) Do a batchnorm, except for the last layer.
        #       3) Follow each batchnorm with a ReLU activation.
        #       4) If its the final layer, use a Tanh activation after the deconvolution.

        # Build the neural block
        if not final_layer:
            return nn.Sequential(
                #### START CODE HERE ####
                nn.ConvTranspose2d(in_channels = input_channels, out_channels = output_channels, kernel_size = kernel_size, stride = stride),
                nn.BatchNorm2d(num_features = output_channels),
                nn.ReLU(),
                #### END CODE HERE ####
            )
        else: # Final Layer
            return nn.Sequential(
                #### START CODE HERE ####
                nn.ConvTranspose2d(in_channels = input_channels, out_channels = output_channels, kernel_size = kernel_size, stride = stride),
                nn.Tanh(),
                #### END CODE HERE ####
            )

    def unsqueeze_noise(self, noise):
        '''
        Function for completing a forward pass of the generator: Given a noise tensor, 
        returns a copy of that noise with width and height = 1 and channels = z_dim.
        Parameters:
            noise: a noise tensor with dimensions (n_samples, z_dim)
        '''
        return noise.view(len(noise), self.z_dim, 1, 1)

    def forward(self, noise):
        '''
        Function for completing a forward pass of the generator: Given a noise tensor, 
        returns generated images.
        Parameters:
            noise: a noise tensor with dimensions (n_samples, z_dim)
        '''
        x = self.unsqueeze_noise(noise)
        return self.gen(x)

def get_noise(n_samples, z_dim, device='cpu'):
    '''
    Function for creating noise vectors: Given the dimensions (n_samples, z_dim)
    creates a tensor of that shape filled with random numbers from the normal distribution.
    Parameters:
        n_samples: the number of samples to generate, a scalar
        z_dim: the dimension of the noise vector, a scalar
        device: the device type
    '''
    return torch.randn(n_samples, z_dim, device=device)

# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
'''
Test your make_gen_block() function
'''
gen = Generator()
num_test = 100

# Test the hidden block
test_hidden_noise = get_noise(num_test, gen.z_dim)
test_hidden_block = gen.make_gen_block(10, 20, kernel_size=4, stride=1)
test_uns_noise = gen.unsqueeze_noise(test_hidden_noise)
hidden_output = test_hidden_block(test_uns_noise)

# Check that it works with other strides
test_hidden_block_stride = gen.make_gen_block(20, 20, kernel_size=4, stride=2)

test_final_noise = get_noise(num_test, gen.z_dim) * 20
test_final_block = gen.make_gen_block(10, 20, final_layer=True)
test_final_uns_noise = gen.unsqueeze_noise(test_final_noise)
final_output = test_final_block(test_final_uns_noise)

# Test the whole thing:
test_gen_noise = get_noise(num_test, gen.z_dim)
test_uns_gen_noise = gen.unsqueeze_noise(test_gen_noise)
gen_output = gen(test_uns_gen_noise)

Here’s the test for your generator block:

# UNIT TESTS
assert tuple(hidden_output.shape) == (num_test, 20, 4, 4)
assert hidden_output.max() > 1
assert hidden_output.min() == 0
assert hidden_output.std() > 0.2
assert hidden_output.std() < 1
assert hidden_output.std() > 0.5

assert tuple(test_hidden_block_stride(hidden_output).shape) == (num_test, 20, 10, 10)

assert final_output.max().item() == 1
assert final_output.min().item() == -1

assert tuple(gen_output.shape) == (num_test, 1, 28, 28)
assert gen_output.std() > 0.5
assert gen_output.std() < 0.8
print("Success!")

Success!

Discriminator

The second component you need to create is the discriminator.

You will use 3 layers in your discriminator’s neural network. Like with the generator, you will need create the function to create a single neural network block for the discriminator.

There are also tests at the end for you to use.

Optional hint for make_disc_block

You’ll find nn.Conv2d, nn.BatchNorm2d, and nn.LeakyReLU useful!

# UNQ_C3 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION: Discriminator
class Discriminator(nn.Module):
    '''
    Discriminator Class
    Values:
        im_chan: the number of channels in the images, fitted for the dataset used, a scalar
              (MNIST is black-and-white, so 1 channel is your default)
    hidden_dim: the inner dimension, a scalar
    '''
    def __init__(self, im_chan=1, hidden_dim=16):
        super(Discriminator, self).__init__()
        self.disc = nn.Sequential(
            self.make_disc_block(im_chan, hidden_dim),
            self.make_disc_block(hidden_dim, hidden_dim * 2),
            self.make_disc_block(hidden_dim * 2, 1, final_layer=True),
        )

    def make_disc_block(self, input_channels, output_channels, kernel_size=4, stride=2, final_layer=False):
        '''
        Function to return a sequence of operations corresponding to a discriminator block of DCGAN, 
        corresponding to a convolution, a batchnorm (except for in the last layer), and an activation.
        Parameters:
            input_channels: how many channels the input feature representation has
            output_channels: how many channels the output feature representation should have
            kernel_size: the size of each convolutional filter, equivalent to (kernel_size, kernel_size)
            stride: the stride of the convolution
            final_layer: a boolean, true if it is the final layer and false otherwise 
                      (affects activation and batchnorm)
        '''
        #     Steps:
        #       1) Add a convolutional layer using the given parameters.
        #       2) Do a batchnorm, except for the last layer.
        #       3) Follow each batchnorm with a LeakyReLU activation with slope 0.2.
        #       Note: Don't use an activation on the final layer
        
        # Build the neural block
        if not final_layer:
            return nn.Sequential(
                #### START CODE HERE #### #
                nn.Conv2d(in_channels = input_channels, out_channels = output_channels, kernel_size = kernel_size, stride = stride),
                nn.BatchNorm2d(num_features = output_channels),
                nn.LeakyReLU(negative_slope = .2)
                #### END CODE HERE ####
            )
        else: # Final Layer
            return nn.Sequential(
                #### START CODE HERE #### #
                nn.Conv2d(in_channels = input_channels, out_channels = output_channels, kernel_size = kernel_size, stride = stride),
                #### END CODE HERE ####
            )

    def forward(self, image):
        '''
        Function for completing a forward pass of the discriminator: Given an image tensor, 
        returns a 1-dimension tensor representing fake/real.
        Parameters:
            image: a flattened image tensor with dimension (im_dim)
        '''
        disc_pred = self.disc(image)
        return disc_pred.view(len(disc_pred), -1)

# UNQ_C4 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
'''
Test your make_disc_block() function
'''
num_test = 100

gen = Generator()
disc = Discriminator()
test_images = gen(get_noise(num_test, gen.z_dim))

# Test the hidden block
test_hidden_block = disc.make_disc_block(1, 5, kernel_size=6, stride=3)
hidden_output = test_hidden_block(test_images)

# Test the final block
test_final_block = disc.make_disc_block(1, 10, kernel_size=2, stride=5, final_layer=True)
final_output = test_final_block(test_images)

# Test the whole thing:
disc_output = disc(test_images)

Here’s a test for your discriminator block:

# Test the hidden block
assert tuple(hidden_output.shape) == (num_test, 5, 8, 8)
# Because of the LeakyReLU slope
assert -hidden_output.min() / hidden_output.max() > 0.15
assert -hidden_output.min() / hidden_output.max() < 0.25
assert hidden_output.std() > 0.5
assert hidden_output.std() < 1

# Test the final block

assert tuple(final_output.shape) == (num_test, 10, 6, 6)
assert final_output.max() > 1.0
assert final_output.min() < -1.0
assert final_output.std() > 0.3
assert final_output.std() < 0.6

# Test the whole thing:

assert tuple(disc_output.shape) == (num_test, 1)
assert disc_output.std() > 0.25
assert disc_output.std() < 0.5
print("Success!")

Success!

Training

Now you can put it all together! Remember that these are your parameters:

criterion: the loss function
n_epochs: the number of times you iterate through the entire dataset when training
z_dim: the dimension of the noise vector
display_step: how often to display/visualize the images
batch_size: the number of images per forward/backward pass
lr: the learning rate
beta_1, beta_2: the momentum term
device: the device type

criterion = nn.BCEWithLogitsLoss()
z_dim = 64
display_step = 500
batch_size = 128
# A learning rate of 0.0002 works well on DCGAN
lr = 0.0002

# These parameters control the optimizer's momentum, which you can read more about here:
# https://distill.pub/2017/momentum/ but you don’t need to worry about it for this course!
beta_1 = 0.5 
beta_2 = 0.999
device = 'cuda'

# You can tranform the image values to be between -1 and 1 (the range of the tanh activation)
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,)),
])

dataloader = DataLoader(
    MNIST('.', download=False, transform=transform),
    batch_size=batch_size,
    shuffle=True)

Then, you can initialize your generator, discriminator, and optimizers.

gen = Generator(z_dim).to(device)
gen_opt = torch.optim.Adam(gen.parameters(), lr=lr, betas=(beta_1, beta_2))
disc = Discriminator().to(device) 
disc_opt = torch.optim.Adam(disc.parameters(), lr=lr, betas=(beta_1, beta_2))

# You initialize the weights to the normal distribution
# with mean 0 and standard deviation 0.02
def weights_init(m):
    if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
        torch.nn.init.normal_(m.weight, 0.0, 0.02)
    if isinstance(m, nn.BatchNorm2d):
        torch.nn.init.normal_(m.weight, 0.0, 0.02)
        torch.nn.init.constant_(m.bias, 0)
gen = gen.apply(weights_init)
disc = disc.apply(weights_init)

Finally, you can train your GAN! For each epoch, you will process the entire dataset in batches. For every batch, you will update the discriminator and generator. Then, you can see DCGAN’s results!

Here’s roughly the progression you should be expecting. On GPU this takes about 30 seconds per thousand steps. On CPU, this can take about 8 hours per thousand steps. You might notice that in the image of Step 5000, the generator is disproprotionately producing things that look like ones. If the discriminator didn’t learn to detect this imbalance quickly enough, then the generator could just produce more ones. As a result, it may have ended up tricking the discriminator so well that there would be no more improvement, known as mode collapse: MNIST Digits Progression

n_epochs = 50
cur_step = 0
mean_generator_loss = 0
mean_discriminator_loss = 0
for epoch in range(n_epochs):
    # Dataloader returns the batches
    for real, _ in tqdm(dataloader):
        cur_batch_size = len(real)
        real = real.to(device)

        ## Update discriminator ##
        disc_opt.zero_grad()
        fake_noise = get_noise(cur_batch_size, z_dim, device=device)
        fake = gen(fake_noise)
        disc_fake_pred = disc(fake.detach())
        disc_fake_loss = criterion(disc_fake_pred, torch.zeros_like(disc_fake_pred))
        disc_real_pred = disc(real)
        disc_real_loss = criterion(disc_real_pred, torch.ones_like(disc_real_pred))
        disc_loss = (disc_fake_loss + disc_real_loss) / 2

        # Keep track of the average discriminator loss
        mean_discriminator_loss += disc_loss.item() / display_step
        # Update gradients
        disc_loss.backward(retain_graph=True)
        # Update optimizer
        disc_opt.step()

        ## Update generator ##
        gen_opt.zero_grad()
        fake_noise_2 = get_noise(cur_batch_size, z_dim, device=device)
        fake_2 = gen(fake_noise_2)
        disc_fake_pred = disc(fake_2)
        gen_loss = criterion(disc_fake_pred, torch.ones_like(disc_fake_pred))
        gen_loss.backward()
        gen_opt.step()

        # Keep track of the average generator loss
        mean_generator_loss += gen_loss.item() / display_step

        ## Visualization code ##
        if cur_step % display_step == 0 and cur_step > 0:
            print(f"Epoch {epoch}, step {cur_step}: Generator loss: {mean_generator_loss}, discriminator loss: {mean_discriminator_loss}")
            show_tensor_images(fake)
            show_tensor_images(real)
            mean_generator_loss = 0
            mean_discriminator_loss = 0
        cur_step += 1

Epoch 15, step 7500: Generator loss: 0.7426057936549177, discriminator loss: 0.68354387485981

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))






HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 17, step 8000: Generator loss: 0.7370726807713508, discriminator loss: 0.6845435484647745

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 18, step 8500: Generator loss: 0.7324315364360812, discriminator loss: 0.6876152647733689

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 19, step 9000: Generator loss: 0.7303068262338642, discriminator loss: 0.6900968568325042

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 20, step 9500: Generator loss: 0.726132873713971, discriminator loss: 0.6905966656208038

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 21, step 10000: Generator loss: 0.72223328757286, discriminator loss: 0.6911074345111841

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 22, step 10500: Generator loss: 0.7235446470379824, discriminator loss: 0.6927497286796566

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 23, step 11000: Generator loss: 0.7199815992712977, discriminator loss: 0.6929034396409992

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 24, step 11500: Generator loss: 0.7160641835331916, discriminator loss: 0.6977822486162186

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 25, step 12000: Generator loss: 0.7114899269342421, discriminator loss: 0.6967020390033714

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 26, step 12500: Generator loss: 0.7128051674962034, discriminator loss: 0.695447943329811

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 27, step 13000: Generator loss: 0.7135791792273521, discriminator loss: 0.6946035917997354

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 28, step 13500: Generator loss: 0.7135360001325612, discriminator loss: 0.6953468122482303

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 29, step 14000: Generator loss: 0.7078500325679774, discriminator loss: 0.6962512389421459

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 30, step 14500: Generator loss: 0.7084810738563546, discriminator loss: 0.6968788719177247

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 31, step 15000: Generator loss: 0.7087363355159758, discriminator loss: 0.6960488370656966

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))






HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 33, step 15500: Generator loss: 0.7080996999144555, discriminator loss: 0.6972092391252517

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 34, step 16000: Generator loss: 0.7029097426533701, discriminator loss: 0.6978738325834267

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 35, step 16500: Generator loss: 0.7055498111248015, discriminator loss: 0.6973779677152641

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 36, step 17000: Generator loss: 0.7040870054960251, discriminator loss: 0.6967932003736488

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 37, step 17500: Generator loss: 0.7035507672429089, discriminator loss: 0.6979363918304441

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 38, step 18000: Generator loss: 0.7009520027637484, discriminator loss: 0.6969191544055939

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 39, step 18500: Generator loss: 0.7014745416641242, discriminator loss: 0.69747545671463

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 40, step 19000: Generator loss: 0.7009152213335038, discriminator loss: 0.6969891947507859

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 41, step 19500: Generator loss: 0.7008637129068376, discriminator loss: 0.6968104602098467

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 42, step 20000: Generator loss: 0.700638920426369, discriminator loss: 0.6977266001701352

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 43, step 20500: Generator loss: 0.6996113978624351, discriminator loss: 0.6960013154745098

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 44, step 21000: Generator loss: 0.6997355434894567, discriminator loss: 0.6955853649377824

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 45, step 21500: Generator loss: 0.700611741423606, discriminator loss: 0.6962998285293576

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 46, step 22000: Generator loss: 0.6978706670999527, discriminator loss: 0.6955201606750482

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 47, step 22500: Generator loss: 0.6993815381526947, discriminator loss: 0.6956112550497053

png

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))






HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value='')))


Epoch 49, step 23000: Generator loss: 0.6990457600355149, discriminator loss: 0.6951236095428471

png