I created a binary classification model with Tensorflow. My data is timeseries, with 100day windows, with 4 features each day. My input is (221000x100x4). I am using a model with Conv1D and LSTM layers. I tried using Adam and RMSprop optimizers. My data is imbalanced but not too bad, Training set has 200,000 negative 21,000 positive examples. Test set has 50,000 negative, 8500 positive examples.
After one or two epochs model starts assigning everything to negative class.
I experimented with class weights 0.2 for negative and 1.5 for positive class.
Any suggestions on things I should look at to generalize the model better?
Thank you
Hey @Ozgur_Yilmazel,
Welcome to the community. Using class weights is indeed a nice approach. Other simple approaches that you could take a look at include Under-sampling and Over-sampling techniques. In general, I have used packages and ways to apply under-sampling/over-sampling techniques to image and text domains without any temporal aspect, but I am sure, if you do a google search you could get some easy ways to apply these techniques to time-series data as well.
And as far as under-sampling is concerned, one of the easiest methods is to random sampling from the majority class. In your case, you can consider all the positive examples and pick 21,000 negative examples randomly to form your train set. I hope this helps.
Regards,
Elemento