background
I have created a neural network that can be of n inputs, n hidden layers of n length, n outputs. When using it for handwriting recognition - using the Kaggle dataset (a 76mb text file of 28x28 matrix of 0-255 values for hand written numbers), the results are showing that somewhere, something must be wrong. In this case, i am using 784 inputs (each pixel 28x28), 1 hidden layer of 15 neurons, and an output layer of 10 neurons.
Output guesses are a vector like this [0,0,0,1,0,0,0,0,0,0] - which would mean its guessing a 3. This is based on this http://neuralnetworksanddeeplearning.com/chap1.html#a_simple_network_to_classify_handwritten_digits (same principals and set up)
I am assuming my problem is somewhere within the back propagation - and because my program has a completely flexible network size in all dimensions (layers, length of layers, etc), my algorithm for back propagating is quite complex - and based on the chain rule explained here https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/ Where essentially, the total error for each output is calculated with respect to each weight, and for hidden layers, the sum of the weight changes in previous layers are used.
when using a learning rate of 0.5, e_total starts at 2.252 and within a minute gets to 0.4462, and then within 5 mins gets no lower than 0.2.
This makes me think somethings must be working. But, when i output the desired outputs and the output guesses, they rarely match, even after 5 mins of iteraton/learning. I would hope to see results like this
output layer: [0.05226,0.0262,0.03262,0.0002, 0.1352, 0.99935, 0.00, etc]
output desired: [0,0,0,0,0,1,0, etc]
(all < 0.1 except the correct guess value should be > 0.9)
but instead i get things like
output layer: [0.15826,0.0262,0.33262,0.0002, 0.1352, 0.0635, 0.00, etc]
output desired: [0,1,0,0,0,0,0, etc]
(all < 0.1, so no clear classification, let alone an accurate one.)
I even added a line of code to output 'correct' when the guess value and desired value match - and even though, as i said, the e_total decreases, 'correct' was always happening about 1 in 10 times - which is no better than random!
I have tried different hidden layer lengths, different all sorts of different learning rates - but no good.
I've given more information in comments which may help
UPDATE:
As recommend, I have used my system to try and learn XOR function - with 2 inputs, 1 hidden layer of 2 neurons, and 1 output.
meaning, the desired_list
is now a single element array, either [1] or [0]. Output values seem to be random >0.5 and < 0.7, with no clear relation to desired output. Just to confirm, I have manually tested my feed forward and back prop many times, and they defiantly work how explained in tutorials i've linked.