We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I am a data analyst trying to improve my knowledge of machine learning.
I've completed a model for a time-series dataset, where each point is 1 day apart, with no gaps. The specific model type that I have attempted is a multi-layered auto-regression testing tool bi-directional LSTM using tensorflow's keras, see model-specific code below:
model = keras.Sequential() model.add(Bidirectional(LSTM( units = 128, input_shape = (X_train.shape[1], X_train.shape[2]), return_sequences=True))) model.add(Bidirectional(LSTM( units = 64, input_shape = (X_train.shape[1], X_train.shape[2]), return_sequences=True))) model.add(Bidirectional(LSTM( units = 32, input_shape = (X_train.shape[1], X_train.shape[2]), return_sequences=True))) model.add(Bidirectional(LSTM( units = 16, input_shape = (X_train.shape[1], X_train.shape[2]), return_sequences=False))) model.add(keras.layers.Dense(16)) model.add(keras.layers.Dropout(rate = 0.5)) model.add(keras.layers.Dense(1)) model.compile(loss='mean_squared_error', optimizer='Adam') history = model.fit( X_train, y_train, epochs = 100, batch_size = 128, validation_split = 0.2, shuffle = False ) print(model.summary())
I've been told that this is likely overkill for this specific learning task by a superior member of staff but wanted to add it for full transparency. See summary below:
Layer (type) Output Shape Param # ================================================================= bidirectional (Bidirectiona (None, 50, 256) 133120 l) bidirectional_1 (Bidirectio (None, 50, 128) 164352 nal) bidirectional_2 (Bidirectio (None, 50, 64) 41216 nal) bidirectional_3 (Bidirectio (None, 32) 10368 nal) dense (Dense) (None, 16) 528 dropout (Dropout) (None, 16) 0 dense_1 (Dense) (None, 1) 17 ================================================================= Total params: 349,601 Trainable params: 349,601 Non-trainable params: 0 _________________________________________________________________
The model reports the loss values (after 100 epochs, using Mean Squared Error):
loss: 0.0040 - val_loss: 0.0050 (Overfit)
With an RMSE derived with: math.sqrt(mean_squared_error(y_train,train_predict)) and math.sqrt(mean_squared_error(y_test,test_predict)) with sklearn.metrics and the built-in function mean_squared_error from the aforementioned package.
math.sqrt(mean_squared_error(y_train,train_predict))
math.sqrt(mean_squared_error(y_test,test_predict))
sklearn.metrics
mean_squared_error
Train RMSE: 28.795422522129595
Test RMSE: 34.17014386085355
And for a graphical representation:
To which I finally arrive at my question; how do I better fit my model to more closely representing the noise within the data, as this is what I believe to be causing the high RMSE values. I have looked into attention mechanisms, in the hopes that I might be able to highlight specific peaks and troughs within the data, but it seems that these are best used with image/text prediction oriented models. I could try training over more epochs, but the model is already slightly overfitting, so this would exasperate this particular issue further.
I understand this is a fairly open-ended question but I have best tried to "show my working" and thank you in advance.
Hi!
The task sounds like function extrapolation.
Could you describe what inputs/outputs are used in the model? Is it some sort of floating window?