I am doing Twitter sentiment classification. For that I am using LSTM with pretrained 50d GloVe word embeddings(not training them as of now, might do in future).
The tweets are of variable lengths ranging from 1 to 250. Hence I sorted the tweets and divided them into batches of almost similar length. Further I zero padded them with maxLen equal to the highest length of tweet in that particular batch, also in embedding layer I have set 'mask_zero = True'.
I am using adam optimizer with default values, and fit generator to train the model.
The first issue I observed was that for each epoch initially the accuracy starts increasing and goes till 80, but when its around halfway through the training set the accuracy starts decreasing and fall to 74 point something. This happens for each epoch.
To solve this I shuffled the batches and then fed them to model, as I thought the model might be adjusting itself to smaller length tweets as those are being fed to it first and hence not generalizing to longer tweets.
After doing this, the accuracy randomly fluctuates and again lands on 74 something at the end of each epoch.
I don't know why I am getting stuck at local minima, ANY HELP IS MUCH APPRECIATED...:)
I will attach some code if required...