I am training a deep autoencoder to map human faces to a 128 dimensional latent space, and then decode them back to its original 128x128x3 format.
I was hoping that after training the autoencoder, I would somehow be able to 'slice' the second half of the autoencoder, i.e. the decoder network responsible for mapping the latent space (128,) to the image space (128, 128, 3) by using the functional Keras API and autoenc_model.get_layer()
Here are the relevant layers of my model:
INPUT_SHAPE=(128,128,3)
input_img = Input(shape=INPUT_SHAPE, name='enc_input')
#1
x = Conv2D(64, (3, 3), padding='same', activation='relu')(input_img)
x = BatchNormalization()(x)
//Many Conv2D, BatchNormalization(), MaxPooling() layers
.
.
.
#Flatten
fc_input = Flatten(name='enc_output')(x)
y = Dropout(DROP_RATE)(fc_input)
y = Dense(128, activation='relu')(y)
y = Dropout(DROP_RATE)(y)
fc_output = Dense(128, activation='linear')(y)
#Reshape
decoder_input = Reshape((8, 8, 2), name='decoder_input')(fc_output)
#Decoder part
#UnPooling-1
z = UpSampling2D()(decoder_input)
//Many Conv2D, BatchNormalization, UpSampling2D layers
.
.
.
#16
decoder_output = Conv2D(3, (3, 3), padding='same', activation='linear', name='decoder_output')(z)
autoenc_model = Model(input_img, decoder_output)
here is the notebook containing the entire model architecture.
To get the decodeer network from the trained autoencoder, I have tried using:
dec_model = Model(inputs=autoenc_model.get_layer('decoder_input').input, outputs=autoenc_model.get_layer('decoder_output').output)
and
dec_model = Model(autoenc_model.get_layer('decoder_input'), autoenc_model.get_layer('decoder_output'))
neither of which seem to work.
I need to extract the decoder layers out of the autoencoder as I want to train the entire autoencoder model first, then use the encoder and the decoder independently.
I could not find a satisfactory answer anywhere else. The Keras blog article on building autoencoders only covers how to extract the decoder for 2 layered autoencoders.
The decoder input/output shape should be: (128, ) and (128, 128, 3), which is the input shape of the 'decoder_input' and output shape of the 'decoder_output' layers respectively.