Coursera

Week 2 Assignment: CIFAR-10 Autoencoder

For this week, you will create a convolutional autoencoder for the CIFAR10 dataset. You are free to choose the architecture of your autoencoder provided that the output image has the same dimensions as the input image.

After training, your model should meet loss and accuracy requirements when evaluated with the test dataset. You will then download the model and upload it in the classroom for grading.

Let’s begin!

Important: This colab notebook has read-only access so you won’t be able to save your changes. If you want to save your work periodically, please click File -> Save a Copy in Drive to create a copy in your account, then work from there.

Imports

# Install packages for compatibility with the autograder
!pip install tensorflow==2.8.0 --quiet
!pip install keras==2.8.0 --quiet

try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass

import tensorflow as tf
import tensorflow_datasets as tfds

from keras.models import Sequential

Colab only includes TensorFlow 2.x; %tensorflow_version has no effect.

Load and prepare the dataset

The CIFAR 10 dataset already has train and test splits and you can use those in this exercise. Here are the general steps:

Load the train/test split from TFDS. Set as_supervised to True so it will be convenient to use the preprocessing function we provided.
Normalize the pixel values to the range [0,1], then return image, image pairs for training instead of image, label. This is because you will check if the output image is successfully regenerated after going through your autoencoder.
Shuffle and batch the train set. Batch the test set (no need to shuffle).

# preprocessing function
def map_image(image, label):
  image = tf.cast(image, dtype=tf.float32)
  image = image / 255.0

  return image, image # dataset label is not used. replaced with the same image input.

# parameters
BATCH_SIZE = 128
SHUFFLE_BUFFER_SIZE = 1024


### START CODE HERE (Replace instances of `None` with your code) ###

# use tfds.load() to fetch the 'train' split of CIFAR-10
train_dataset = tfds.load("cifar10", as_supervised=True, split="train")

# preprocess the dataset with the `map_image()` function above
train_dataset = train_dataset.map(map_image)

# shuffle and batch the dataset
train_dataset = train_dataset.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)


# use tfds.load() to fetch the 'test' split of CIFAR-10
test_dataset = tfds.load("cifar10", as_supervised=True, split="test")

# preprocess the dataset with the `map_image()` function above
test_dataset = test_dataset.map(map_image)

# batch the dataset
test_dataset = test_dataset.batch(BATCH_SIZE)

### END CODE HERE ###

Build the Model

Create the autoencoder model. As shown in the lectures, you will want to downsample the image in the encoder layers then upsample it in the decoder path. Note that the output layer should be the same dimensions as the original image. Your input images will have the shape (32, 32, 3). If you deviate from this, your model may not be recognized by the grader and may fail.

We included a few hints to use the Sequential API below but feel free to remove it and use the Functional API just like in the ungraded labs if you’re more comfortable with it. Another reason to use the latter is if you want to visualize the encoder output. As shown in the ungraded labs, it will be easier to indicate multiple outputs with the Functional API. That is not required for this assignment though so you can just stack layers sequentially if you want a simpler solution.

# suggested layers to use. feel free to add or remove as you see fit.
from keras.layers import Conv2D, MaxPooling2D, UpSampling2D

# use the Sequential API (you can remove if you want to use the Functional API)
# model = Sequential()

### START CODE HERE ###
# use `model.add()` to add layers (if using the Sequential API)

# Encoder
inputs = tf.keras.layers.Input(shape=(32, 32, 3,))

conv_1 = Conv2D(filters=64, kernel_size=(3,3), activation='relu', padding='same')(inputs)
max_pool_1 = MaxPooling2D(pool_size=(2,2))(conv_1)

conv_2 = Conv2D(filters=128, kernel_size=(3,3), activation='relu', padding='same')(max_pool_1)
max_pool_2 = MaxPooling2D(pool_size=(2,2))(conv_2)

# Bottleneck
bottleneck = Conv2D(filters=256, kernel_size=(3,3), activation='sigmoid', padding='same')(max_pool_2)

# Decoder
conv_3 = Conv2D(filters=128, kernel_size=(3,3), activation='relu', padding='same')(bottleneck)
up_1 = UpSampling2D(size=(2,2))(conv_3)

conv_4 = Conv2D(filters=64, kernel_size=(3,3), activation='relu', padding='same')(up_1)
up_2 = UpSampling2D(size=(2,2))(conv_4)

autoencoder_output = Conv2D(filters=3, kernel_size=(3,3), activation='sigmoid', padding='same')(up_2)

model = tf.keras.Model(inputs=inputs, outputs=autoencoder_output)
### END CODE HERE ###

model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 32, 32, 3)]       0         
                                                                 
 conv2d (Conv2D)             (None, 32, 32, 64)        1792      
                                                                 
 max_pooling2d (MaxPooling2D  (None, 16, 16, 64)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 16, 16, 128)       73856     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 8, 8, 128)        0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 8, 8, 256)         295168    
                                                                 
 conv2d_3 (Conv2D)           (None, 8, 8, 128)         295040    
                                                                 
 up_sampling2d (UpSampling2D  (None, 16, 16, 128)      0         
 )                                                               
                                                                 
 conv2d_4 (Conv2D)           (None, 16, 16, 64)        73792     
                                                                 
 up_sampling2d_1 (UpSampling  (None, 32, 32, 64)       0         
 2D)                                                             
                                                                 
 conv2d_5 (Conv2D)           (None, 32, 32, 3)         1731      
                                                                 
=================================================================
Total params: 741,379
Trainable params: 741,379
Non-trainable params: 0
_________________________________________________________________

Configure training parameters

We have already provided the optimizer, metrics, and loss in the code below.

# Please do not change the model.compile() parameters
model.compile(optimizer='adam', metrics=['accuracy'], loss='mean_squared_error')

Training

You can now use model.fit() to train your model. You will pass in the train_dataset and you are free to configure the other parameters. As with any training, you should see the loss generally going down and the accuracy going up with each epoch. If not, please revisit the previous sections to find possible bugs.

Note: If you get a dataset length is infinite error. Please check how you defined train_dataset. You might have included a method that repeats the dataset indefinitely.

# parameters (feel free to change this)
train_steps = 100 # len(train_dataset) // BATCH_SIZE
val_steps = 50 #len(train_dataset) // BATCH_SIZE

### START CODE HERE ###
model.fit(train_dataset,
          validation_data=test_dataset,
          validation_steps=val_steps,
          epochs=50)
### END CODE HERE ###

Epoch 1/50
391/391 [==============================] - 43s 71ms/step - loss: 0.0153 - accuracy: 0.5625 - val_loss: 0.0065 - val_accuracy: 0.7189
Epoch 2/50
391/391 [==============================] - 12s 30ms/step - loss: 0.0058 - accuracy: 0.7490 - val_loss: 0.0050 - val_accuracy: 0.7557
Epoch 3/50
391/391 [==============================] - 12s 31ms/step - loss: 0.0048 - accuracy: 0.7721 - val_loss: 0.0043 - val_accuracy: 0.7954
Epoch 4/50
391/391 [==============================] - 12s 30ms/step - loss: 0.0041 - accuracy: 0.7893 - val_loss: 0.0038 - val_accuracy: 0.8054
Epoch 5/50
391/391 [==============================] - 12s 32ms/step - loss: 0.0037 - accuracy: 0.7988 - val_loss: 0.0035 - val_accuracy: 0.8173
Epoch 6/50
391/391 [==============================] - 12s 31ms/step - loss: 0.0034 - accuracy: 0.8034 - val_loss: 0.0032 - val_accuracy: 0.7947
Epoch 7/50
172/391 [============>.................] - ETA: 5s - loss: 0.0032 - accuracy: 0.8051


---------------------------------------------------------------------------

KeyboardInterrupt                         Traceback (most recent call last)

<ipython-input-6-fb40c373c96c> in <cell line: 6>()
      4 
      5 ### START CODE HERE ###
----> 6 model.fit(train_dataset, 
      7           validation_data=test_dataset,
      8           validation_steps=val_steps,


/usr/local/lib/python3.10/dist-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
     62     filtered_tb = None
     63     try:
---> 64       return fn(*args, **kwargs)
     65     except Exception as e:  # pylint: disable=broad-except
     66       filtered_tb = _process_traceback_frames(e.__traceback__)


/usr/local/lib/python3.10/dist-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
   1382                 _r=1):
   1383               callbacks.on_train_batch_begin(step)
-> 1384               tmp_logs = self.train_function(iterator)
   1385               if data_handler.should_sync:
   1386                 context.async_wait()


/usr/local/lib/python3.10/dist-packages/tensorflow/python/util/traceback_utils.py in error_handler(*args, **kwargs)
    148     filtered_tb = None
    149     try:
--> 150       return fn(*args, **kwargs)
    151     except Exception as e:
    152       filtered_tb = _process_traceback_frames(e.__traceback__)


/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
    913 
    914       with OptionalXlaContext(self._jit_compile):
--> 915         result = self._call(*args, **kwds)
    916 
    917       new_tracing_count = self.experimental_get_tracing_count()


/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
    945       # In this case we have created variables on the first call, so we run the
    946       # defunned version which is guaranteed to never create variables.
--> 947       return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
    948     elif self._stateful_fn is not None:
    949       # Release the lock early so that multiple threads can perform the call


/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/function.py in __call__(self, *args, **kwargs)
   2954       (graph_function,
   2955        filtered_flat_args) = self._maybe_define_function(args, kwargs)
-> 2956     return graph_function._call_flat(
   2957         filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
   2958 


/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
   1851         and executing_eagerly):
   1852       # No tape is watching; skip to running the function.
-> 1853       return self._build_call_outputs(self._inference_function.call(
   1854           ctx, args, cancellation_manager=cancellation_manager))
   1855     forward_backward = self._select_forward_and_backward_functions(


/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/function.py in call(self, ctx, args, cancellation_manager)
    497       with _InterpolateFunctionError(self):
    498         if cancellation_manager is None:
--> 499           outputs = execute.execute(
    500               str(self.signature.name),
    501               num_outputs=self._num_outputs,


/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     52   try:
     53     ctx.ensure_initialized()
---> 54     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     55                                         inputs, attrs, num_outputs)
     56   except core._NotOkStatusException as e:


KeyboardInterrupt:

Model evaluation

You can use this code to test your model locally before uploading to the grader. To pass, your model needs to satisfy these two requirements:

loss must be less than 0.01
accuracy must be greater than 0.6

result = model.evaluate(test_dataset, steps=10)

10/10 [==============================] - 0s 20ms/step - loss: 0.0031 - accuracy: 0.8158

If you did some visualization like in the ungraded labs, then you might see something like the gallery below. This part is not required.

import numpy as np
import matplotlib.pyplot as plt

def display_one_row(disp_images, offset, shape=(28, 28)):
  '''Display sample outputs in one row.'''
  for idx, test_image in enumerate(disp_images):
    plt.subplot(3, 10, offset + idx + 1)
    plt.xticks([])
    plt.yticks([])
    test_image = np.reshape(test_image, shape)
    plt.imshow(test_image, cmap='gray')


def display_results(disp_input_images, disp_encoded, disp_predicted, enc_shape=(8,4)):
  '''Displays the input, encoded, and decoded output values.'''
  plt.figure(figsize=(15, 5))
  display_one_row(disp_input_images, 0, shape=(32,32,3))
  display_one_row(disp_encoded, 10, shape=enc_shape)
  display_one_row(disp_predicted, 20, shape=(32,32,3))

# take 1 batch of the dataset
test_dataset = test_dataset.take(1)

# take the input images and put them in a list
output_samples = []
for input_image, image in tfds.as_numpy(test_dataset):
      output_samples = input_image

# pick 10 indices
idxs = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# prepare test samples as a batch of 10 images
conv_output_samples = np.array(output_samples[idxs])
conv_output_samples = np.reshape(conv_output_samples, (10, 32, 32, 3))

# get the encoder ouput
encoded = model.predict(conv_output_samples)

# get a prediction for some values in the dataset
predicted = model.predict(conv_output_samples)

# display the samples, encodings and decoded values!
display_results(conv_output_samples, encoded, predicted, enc_shape=(32,32, 3))

png

Save your model

Once you are satisfied with the results, you can now save your model. Please download it from the Files window on the left and go back to the Submission portal in Coursera for grading.

model.save('mymodel.h5')

Congratulations on completing this week’s assignment!