What should be the LSTM model architecture in order to forecast disease probability?

I am working on disease (sepsis) forecasting using Deep Learning (LSTM). The sepsis data is EHR-time-series data. Where, the target variable is SepsisLabel. The 0 represents No-sepsis and 1 represents sepsis. Each patient data is converted to a fixed-length tensor. I want to make a LSTM model that will take these tensors and train on it, and will forecast the sepsis probability. The threshold is 0.5. Patients with probability > 0.5 will be sepsis and patients with probability < 0.5 will be no-sepsis.

The data in the form of tensors:

x_train
x_test
y_train
y_test

The model architecture:

# construct inputs
x = Input((None, x_train.shape[-1]) , name='input')
mask = Masking(0, name='input_masked')(x) # Masking layer because data is post-padded with zeros

# stack LSTMs
lstm_kwargs = {'dropout': 0.20, 'recurrent_dropout': 0.1, 'return_sequences': True, 'implementation': 2}
lstm1 = LSTM(200, name='lstm1', **lstm_kwargs)(mask)
lstm2 = LSTM(200, name='lstm2', **lstm_kwargs)(lstm1)
lstm3 = LSTM(200, name='lstm3', **lstm_kwargs)(lstm2)

btch = BatchNormalization()(lstm3)

dns = Dense(50, name = 'Dense')(btch)

# output: sigmoid layer
output = TimeDistributed(Dense(1, activation='sigmoid'), name='output')(dns)
model = Model(inputs=x, outputs=output)

# compile model
optimizer = RMSprop(learning_rate=0.001)
model.compile(optimizer=optimizer, loss='binary_crossentropy')

sw = np.ones(shape=(len(y_train),))

history = model.fit(x_train, y_train, sample_weight = sw, batch_size=128, epochs=500, verbose=1)

What model architecture should I use? Also, what optimizer should I use? What loss function should I use? I am thinking of this architecture but am unsure about the choice of loss function and optimizer. The model trained on current architecture gives AUROC=0.75. How I can achieve high AUROC?

Need suggestions.

Hi, @huzaifa_arshad !

There are always a couple of settings you can play with to improve the metrics:

  1. First of all, make sure where are you failing at. Is your model underfitting or overfitting? Are your train metrics higher or aprox the same as the test metrics? If underfitting, try with a more complex model, say, with more layers, more neurons, etc.
  2. Binary cross entropy seems just right for this binary classification task. Regarding the optimizer, I’ve always had good performance with Adam so it’s my first choice, but RMSProp could work well too.
  3. Transformers have been doing really well with time series. They were introduced for NLP but it might be worth giving them a try.
1 Like

Thanks. I will try with Adam optimizer. Regarding Transformers, I have heard of them but never used them. I will explore them. Thanks :slight_smile: