0
$\begingroup$

I am doing Twitter sentiment classification. For that I am using LSTM with pretrained 50d GloVe word embeddings(not training them as of now, might do in future).

The tweets are of variable lengths ranging from 1 to 250. Hence I sorted the tweets and divided them into batches of almost similar length. Further I zero padded them with maxLen equal to the highest length of tweet in that particular batch, also in embedding layer I have set 'mask_zero = True'.

I am using adam optimizer with default values, and fit generator to train the model.

The first issue I observed was that for each epoch initially the accuracy starts increasing and goes till 80, but when its around halfway through the training set the accuracy starts decreasing and fall to 74 point something. This happens for each epoch.

To solve this I shuffled the batches and then fed them to model, as I thought the model might be adjusting itself to smaller length tweets as those are being fed to it first and hence not generalizing to longer tweets.

After doing this, the accuracy randomly fluctuates and again lands on 74 something at the end of each epoch.

I don't know why I am getting stuck at local minima, ANY HELP IS MUCH APPRECIATED...:)

I will attach some code if required...

$\endgroup$

2 Answers 2

1
$\begingroup$

First of all, this is a binary classification problem (positive sentiment / negative sentiment), correct? And the dataset is roughly balanced? What were you trying to achieve by sorting them by length? Why are you using a generator and not just running fit on the dataset?

Don't worry about local minima. What you're looking at is a model/optimizer combination that converges to around 74% on this dataset. The one being fed sorted data has a higher accuracy at the start of the epoch because short tweets are easier to classify than long ones. And I wouldn't even worry about the batch-by-batch accuracy, just the epoch-by-epoch curve.

$\endgroup$
2
  • $\begingroup$ Yes the data set is roughly balanced (65%,35%). I am sorting the data by length of words to minimize the padding in each batch, other wise the batches may contain lots of zeros. I am using fit_generator so that I can feed batches with different time steps(don't know if it can be done with fit, if yes please tell how). The thing I am worrying is that all the epochs have same accuracy, which could mean my model is not learning at all. $\endgroup$
    – boredaf
    Commented Sep 21, 2019 at 7:12
  • $\begingroup$ Update : I tried without sorting the tweets, but still the accuracy is same for each epoch. Also checked the weights of an LSTM layer, they are same after each epochs... $\endgroup$
    – boredaf
    Commented Sep 21, 2019 at 8:31
0
$\begingroup$

because it shows score after each batch and in the end of epoch score is 0.74, don't look on scores after each batch since data can be shuffeled like this: first batch contains examples that your NN performs well so you have higher score, next we have examples that NN performs poorly and score is going down and so on... also keep in mind that accuracy is "human" metric, real indicator of performance of neural network on this dataset is loss value, also loss can decrease even after 1000 epochs but metric will be stuck since epoch e.g. 20 and will not change (especially for accuracy)

if it's accuracy minimum maybe your data is not stratified or other fundamental improvement is needed?

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.