1

I am trying to write my first neural network but am completely stuck on this problem for over a week now. I am following along Andrew NG's course on machine learning and I implemented the following functions in python.

    forwardPropogate() #does forward propagation
    backwardPropogate() #computes the gradients using backpropogation
    costFunction() #takes as input, all the parameters of the neural network in a rolled up single array and computes its cost
    gradientDescend() #tries to minimise the cost using gradient descend

When I tried training the network, I found that it was giving me very bad results and when I couldn't figure out what was wrong with the code, I downloaded the MATLAB version of the code and tried comparing it with my own.

To ensure my implementation was correct, I ran the MATLAB code, took the parameters from that and ran it through my backwardPropogate() and costFunction().

Running backwardPropogate() this is the plot of the gradient as given by the MATLAB and my own code.enter image description here Python As you can see, they are very similar. In addition, I have also done a manual inception of the two outputs enough to convince me that by backwardPropogate() is implemented correctly. I also did numerical gradient checking and that also is matching up pretty well.

The cost of the parameters as found by the MATLAB code is J = 0.14942 and Python gives out J = 0.149420032652. I am convinced that costFunction() and backwardPropogate() are implemented correctly(Should I not be?).

When I run my gradientDescend() I am getting this plot of cost values against the number of iterations. J values. This again looks good.

I cannot understand why the code is still giving me bad values. The success rate is almost 10% even on the training set.

Here is my Gradient Descend and the call to it.

   def gradientDescend(self,gradientFunction,thetaValues):

        JValues = np.zeros(MAX_ITER)

        for i in range(0,MAX_ITER):            
            thetaValues = thetaValues - ALPHA * gradientFunction(thetaValues)

            J = self.costFunction(thetaValues)
            JValues[i] = J

            print i/MAX_ITER * 100 #show percentage completed

        return thetaValues,JValues

    def train(self):

        thetaValues = (np.random.rand(NoTheta1+NoTheta2,1) * (2 * INIT_EPSILON)) - INIT_EPSILON 

        trainedThetas,JVals = self.gradientDescend(self.getGradients,thetaValues)        
        self.theta1,self.theta2 = self.unrollParameters(thetaValues)

        xaxis = np.arange(0,len(JVals))
        plt.plot(xaxis,JVals)
        plt.show()

        return self.theta1,self.theta2 

Upon further inspection, I found out that the initial random values of the parameters we doing just as bad as my trained ones! Of all the things, this is what I least understand. The cost function seems to be decreasing from the start of the loop to the end. So even if the final parameters are not good, they should, at the very least, be doing better than the initial ones. I do not know where to go from here. Any suggestions would be welcome.

3
  • Where do you use trainedThetas? It looks like unrollParameters() just takes the original random thetaValues. Commented May 9, 2017 at 5:08
  • Wow. That was the problem. I have been killing myself over this. Thank you
    – Ananda
    Commented May 9, 2017 at 5:27
  • Glad I could help! I moved my comment into an actual answer, see below. Please accept it by clicking the check mark if this solved your problem. Commented May 9, 2017 at 5:34

1 Answer 1

3

In train(), the output of your GradientDescend() function, trainedThetas, isn't actually used. In the line after GradientDescend(), self.unrollParameters(thetaValues) takes the original random vector of thetaValues. That's why you don't see any learning or improvement in your cost function.

Replace thetaValues with trainedValues in unrollParameters() and you'll be good to go.

Not the answer you're looking for? Browse other questions tagged or ask your own question.