0

I'm trying to solve a very simple problem (simple supposedly, it's giving me nightmares).

My data is this

   0.64900194,  2.32144675,  4.36117903,  6.8795263 ,  8.70335759,
   10.52469321, 12.50494439, 14.92118469, 16.31657096, 18.69954666,
   20.653336  , 22.08447934, 24.29878371, 26.01567801, 28.3626067 ,
   30.75065028, 32.81166691, 34.52029737, 36.90956918, 38.55743122

and the corresponding target for the above sequence of data is 40.24253

As you can see it's a simple lstm sequence prediction problem, where input data is past 20 values of a 2's multiplication sequence, and target is the next number in sequence + some random uniform number (for adding a little noise).

Sample input and target sizes are: (batch_size, 20, 1) and (batch_size, )

This is the code I'm using for prediction:

def univariate_data(dataset, start_index, end_index, history_size, target_size):
    data = []
    labels = []

    start_index = start_index + history_size
    if end_index is None:
        end_index = len(dataset) - target_size

    for i in range(start_index, end_index):
        indices = range(i-history_size, i)
        # Reshape data from (history_size,) to (history_size, 1)
        data.append(np.reshape(dataset[indices], (history_size, 1)))
        labels.append(dataset[i+target_size])
    return np.array(data), np.array(labels)



uni_data = np.array([(i*2)+random.random() for i in range(0,400000)])


TRAIN_SPLIT = 300000

uni_train_mean = uni_data[:TRAIN_SPLIT].mean()
uni_train_std = uni_data[:TRAIN_SPLIT].std()


uni_data = (uni_data-uni_train_mean)/uni_train_std


univariate_past_history = 20
univariate_future_target = 0

x_train_uni, y_train_uni = univariate_data(uni_data, 0, TRAIN_SPLIT,
                                           univariate_past_history,
                                           univariate_future_target)
x_val_uni, y_val_uni = univariate_data(uni_data, TRAIN_SPLIT, None,
                                       univariate_past_history,
                                       univariate_future_target)

print ('Single window of past history')
print (x_train_uni.shape)
print ('\n Target temperature to predict')
print (y_train_uni.shape)

BATCH_SIZE = 256
BUFFER_SIZE = 10000

train_univariate = tf.data.Dataset.from_tensor_slices((x_train_uni, y_train_uni))
train_univariate = train_univariate.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()

val_univariate = tf.data.Dataset.from_tensor_slices((x_val_uni, y_val_uni))
val_univariate = val_univariate.batch(BATCH_SIZE).repeat()



simple_lstm_model = tf.keras.models.Sequential([
    tf.keras.layers.LSTM(8, input_shape=x_train_uni.shape[-2:]),
    tf.keras.layers.Dense(1)
])

simple_lstm_model.compile(optimizer='adam', loss='mae')

for x, y in val_univariate.take(1):
    print(simple_lstm_model.predict(x).shape)

EVALUATION_INTERVAL = 200
EPOCHS = 10

simple_lstm_model.fit(train_univariate, epochs=EPOCHS,
                      steps_per_epoch=EVALUATION_INTERVAL,
                      validation_data=val_univariate, validation_steps=50)

Prediction of for any given sequence is way off the actual value, any suggestions would help.

Some previous searches gave suggestions for Normalizing, Standardizing, I've tried both. I also tried varying layers of LSTM, and tried with SimpleRNN, GRU. Tried with different activation function, 'tanh', 'relu'. Tried using past 10, 30 and 50 values instead of past 20. None of them helped. I believe i'm making very simple mistake, any guidance would help a lot. Thanks and stay safe!!

3
  • do you apply inverse scaling on target when computing predictions on test/valid and evaluate performances? Commented May 6, 2020 at 18:44
  • @MarcoCerliani yes, predictions are like, if it supposed to be 12382 it's actually 12273. So you can see it's not waaay off actual, but it's kinda not what I want. Commented May 7, 2020 at 5:30
  • add your code where you apply inverse scaling of predictions thanks Commented May 7, 2020 at 6:43

1 Answer 1

0

So I finally figured out the solution.

Problem in above approach is that the mean and std of my train and test data were very different. In other words, I was training model with data of range(0,400000) and my test set was of range(400000, 500000). Now the mean and standard deviation which I obtained from training data was vastly different than test data, also std deviation in the above case is around 173,250 (of training data). It's very difficult for any model to predict accurately when trained with data having such high standard deviation.

The solution is, that instead of directly feeding data into the model, feed the difference of consecutive elements. Example, instead of feeding the data p = [0, 1, 2, 3, 4, 5, 6], feed the data q = [2, 2, 2, 2, 2, 2, 2], where q is obtained by q[i] = p[i] - p[i-1]. So now if we feed the model with data q, ofc model will predict 2, as model has only seen input of 2, which we can just add to the last actual value and obtain the result.

So, basic problem with the model is high standard deviation of training data and unseen values in test, and the solution is to feed the difference of values.

But another question can be how do we do it, if we want to predict next element of 2**x i.e. exponential of 2, in this case again model will possibly learn trend, given data of type q, but still model won't be very accurate as at some point it'll again have values that have a very high mean and std.

Lastly, I read somewhere LSTM isn't meant for extrapolating data from an embedding space model hasn't been exposed to, there are other models for extrapolating data, but it's not LSTM.

Not the answer you're looking for? Browse other questions tagged or ask your own question.