2

background

I have created a neural network that can be of n inputs, n hidden layers of n length, n outputs. When using it for handwriting recognition - using the Kaggle dataset (a 76mb text file of 28x28 matrix of 0-255 values for hand written numbers), the results are showing that somewhere, something must be wrong. In this case, i am using 784 inputs (each pixel 28x28), 1 hidden layer of 15 neurons, and an output layer of 10 neurons.

Output guesses are a vector like this [0,0,0,1,0,0,0,0,0,0] - which would mean its guessing a 3. This is based on this http://neuralnetworksanddeeplearning.com/chap1.html#a_simple_network_to_classify_handwritten_digits (same principals and set up)

I am assuming my problem is somewhere within the back propagation - and because my program has a completely flexible network size in all dimensions (layers, length of layers, etc), my algorithm for back propagating is quite complex - and based on the chain rule explained here https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/ Where essentially, the total error for each output is calculated with respect to each weight, and for hidden layers, the sum of the weight changes in previous layers are used.

when using a learning rate of 0.5, e_total starts at 2.252 and within a minute gets to 0.4462, and then within 5 mins gets no lower than 0.2.

This makes me think somethings must be working. But, when i output the desired outputs and the output guesses, they rarely match, even after 5 mins of iteraton/learning. I would hope to see results like this

output layer: [0.05226,0.0262,0.03262,0.0002, 0.1352, 0.99935, 0.00, etc]
output desired: [0,0,0,0,0,1,0, etc]

(all < 0.1 except the correct guess value should be > 0.9)

but instead i get things like

output layer: [0.15826,0.0262,0.33262,0.0002, 0.1352, 0.0635, 0.00, etc]
output desired: [0,1,0,0,0,0,0, etc] 

(all < 0.1, so no clear classification, let alone an accurate one.)

I even added a line of code to output 'correct' when the guess value and desired value match - and even though, as i said, the e_total decreases, 'correct' was always happening about 1 in 10 times - which is no better than random!

I have tried different hidden layer lengths, different all sorts of different learning rates - but no good.

I've given more information in comments which may help

UPDATE:

As recommend, I have used my system to try and learn XOR function - with 2 inputs, 1 hidden layer of 2 neurons, and 1 output. meaning, the desired_list is now a single element array, either [1] or [0]. Output values seem to be random >0.5 and < 0.7, with no clear relation to desired output. Just to confirm, I have manually tested my feed forward and back prop many times, and they defiantly work how explained in tutorials i've linked.

19
  • When I was debugging my neural net code, I simply built some simple examples and performed the computations by hand, then compared them with the results given by the code. Tedious, but it worked ;)
    – BlackBear
    Commented Nov 26, 2016 at 13:58
  • debugging a neural network is a nightmare! I have tried doing exactly that - and as far as i can tell, it does work. So i worry i am doing something conceptually wrong Commented Nov 26, 2016 at 14:00
  • 5
    Good luck getting an answer. I'll check back and see what kind of feedback you get. Commented Nov 26, 2016 at 14:02
  • thank you - ill be very impressed if someone manages to answer this one! Commented Nov 26, 2016 at 14:03
  • did you try modifying the learning rate? I wouldn't be surprised if 0.5 is way too high, and would expect something like 0.01 or 0.001 to work better Commented Nov 26, 2016 at 18:45

2 Answers 2

1

You used in this example one hidden layer. Error backpropagation is capable correct learn one or two hidden layers. In comments you claim the weight initialization from interval 0-1. Once I tried a recognition a paper from picture and obtain miserably results. I have weight init from interval 0-1. When I improve this on -1 to 1 results were excelent.

Ok. Your parameters:
768 15 10

Parameter of deep network doing same task:
768 500 500 2000 10

Error backpropagation is capable doing this task. Try use an two hidden layers with more neurons.

For example, something like this
768 2000 1500 10 :)

And also, you should normalize input from 0-255 to 0-1.

Update: XOR trainig has long duration. Please try 100 000 epochs. If results will be bad, something wrong is in BP implementation. Please initialize weights from -1 to 1 for XOR problem and hidden and output unit must have a bias.

11
  • I have already tried -1,1. Which have even worse results. What do you mean about the hidden layer count and the ability of back prop? Edit answer if you can't comment :) Commented Nov 28, 2016 at 11:38
  • In short, Gradient vanishing problem. You compute gradient on output units and then on hidden layers units and than on hidden layer units and than.... and than you have zero error signal and first layers aren't learning anything.
    – viceriel
    Commented Nov 28, 2016 at 11:47
  • And maybe. You try classificate a handwriting characters. 768 input units, 15 hidden, 10 output.
    – viceriel
    Commented Nov 28, 2016 at 12:09
  • Deep net resolving same task has parameters: 1. 768 2. 500 - first hidden
    – viceriel
    Commented Nov 28, 2016 at 12:09
  • Are you saying you do not recommend that structure ? Commented Nov 28, 2016 at 12:10
0
+100

You don't need to reinvent the wheel... You may use pybrain module, which provide optimized "Supervised Learning" features like Back-Propagation, R-Prop, etc... (and have also supervised learning, unsupervised learning, reinforcement learning and black-box optimization algorithm features)

You may find here an example of how to use pybrain module to make OCR with a 10×9 inputs array (just adapt to your 28x28 need)

If you definitely would to reinvent the wheel... you may do some introspection of the pybrain source code (because the back prop version of pybrain works) in order to explain/double-check why your code version is not working.

As NN debug is a difficult task, you may also publish more code and share any ressources which are relative to your code ...

Regards

Not the answer you're looking for? Browse other questions tagged or ask your own question.