I am training a simple convolutional neural network for regression, where the task is to predict the (x,y) location of a box in an image, e.g.:
The output of the network has two nodes, one for x, and one for y. The rest of the network is a standard convolutional neural network. The loss is a standard mean squared error between the predicted position of the box, and the ground truth position. I am training on 10000 of these images, and validating on 2000.
The problem I am having, is that even after significant training, the loss does not really decrease. After observing the output of the network, I notice that the network tends to output values close to zero, for both output nodes. As such, the prediction of the box's location is always the centre of the image. There is some deviation in the predictions, but always around zero. Below shows the loss:
I have run this for many more epochs than shown in this graph, and the loss still never decreases. Interestingly here, the loss actually increases at one point.
So, it seems that the network is just predicting the average of the training data, rather than learning a good fit. Any ideas on why this may be? I am using Adam as the optimizer, with an initial learning rate of 0.01, and relus as activations
If you are interested in some of my code (Keras), it is below:
# Create the model
model = Sequential()
model.add(Convolution2D(32, 5, 5, border_mode='same', subsample=(2, 2), activation='relu', input_shape=(3, image_width, image_height)))
model.add(Convolution2D(64, 5, 5, border_mode='same', subsample=(2, 2), activation='relu'))
model.add(Convolution2D(128, 5, 5, border_mode='same', subsample=(2, 2), activation='relu'))
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(2, activation='linear'))
# Compile the model
adam = Adam(lr=0.01, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(loss='mean_squared_error', optimizer=adam)
# Fit the model
model.fit(images, targets, batch_size=128, nb_epoch=1000, verbose=1, callbacks=[plot_callback], validation_split=0.2, shuffle=True)