Skip to main content
19 events
when toggle format what by license comment
Nov 30, 2016 at 15:20 history edited viceriel CC BY-SA 3.0
added 239 characters in body
Nov 29, 2016 at 16:12 history undeleted Matt
Nov 29, 2016 at 15:41 history edited user1228 CC BY-SA 3.0
You most likely had this answer deleted because you implied it should have been a comment. But this is definitely worth an answer. I'm editing to fix layout and remove that comment (fyi, two spaces at the end of a line == newline). Also flagging to undelete
Nov 28, 2016 at 17:44 history deleted Martijn Pieters via Vote
Nov 28, 2016 at 14:32 review Low quality answers
Nov 28, 2016 at 15:22
Nov 28, 2016 at 13:48 comment added viceriel I don't know. But if you can test your BP algorithm, use your BP on XOR problem. Net parameters can be 2 2 1
Nov 28, 2016 at 13:36 comment added harry lakins it takes 30 secs to train one epoch with that many! How come in the tutorial i linked, they get a 98% success within just minutes from using 784,15,10?
Nov 28, 2016 at 13:32 comment added viceriel Ok, once again. Net trained by better approach using on this classification totally 4 000 hidden units, your net using 80. Yes, more neurons more slow :/
Nov 28, 2016 at 13:26 comment added harry lakins having two hidden layers of 40 neurons each results in an error of 2.24999 and it never improves. It is also dramatically slower (have tried with many different learning rates). Im pretty sure there is something wrong with my actual aglorithm
Nov 28, 2016 at 12:14 history edited viceriel CC BY-SA 3.0
added 59 characters in body
Nov 28, 2016 at 12:13 comment added harry lakins Will do - and will get back to you - how many neurons on each so you suggest?
Nov 28, 2016 at 12:12 comment added viceriel Yes, use two hidden layers and much more hidden neurons.
Nov 28, 2016 at 12:12 history edited viceriel CC BY-SA 3.0
added 254 characters in body
Nov 28, 2016 at 12:10 comment added harry lakins Are you saying you do not recommend that structure ?
Nov 28, 2016 at 12:09 comment added viceriel Deep net resolving same task has parameters: 1. 768 2. 500 - first hidden
Nov 28, 2016 at 12:09 comment added viceriel And maybe. You try classificate a handwriting characters. 768 input units, 15 hidden, 10 output.
Nov 28, 2016 at 11:47 comment added viceriel In short, Gradient vanishing problem. You compute gradient on output units and then on hidden layers units and than on hidden layer units and than.... and than you have zero error signal and first layers aren't learning anything.
Nov 28, 2016 at 11:38 comment added harry lakins I have already tried -1,1. Which have even worse results. What do you mean about the hidden layer count and the ability of back prop? Edit answer if you can't comment :)
Nov 28, 2016 at 11:35 history answered viceriel CC BY-SA 3.0