I am working on disease (sepsis) forecasting using Deep Learning (LSTM). The sepsis data is EHR-time-series data. Where, the target variable is `SepsisLabel`

. The `0`

represents `No-sepsis`

and `1`

represents sepsis. Each patient data is converted to a fixed-length `tensor`

. I want to make a LSTM model that will take these tensors and train on it, and will forecast the sepsis probability. The threshold is `0.5`

. Patients with probability > 0.5 will be `sepsis`

and patients with probability < 0.5 will be `no-sepsis`

.

The data in the form of tensors:

The model architecture:

```
# construct inputs
x = Input((None, x_train.shape[-1]) , name='input')
mask = Masking(0, name='input_masked')(x) # Masking layer because data is post-padded with zeros
# stack LSTMs
lstm_kwargs = {'dropout': 0.20, 'recurrent_dropout': 0.1, 'return_sequences': True, 'implementation': 2}
lstm1 = LSTM(200, name='lstm1', **lstm_kwargs)(mask)
lstm2 = LSTM(200, name='lstm2', **lstm_kwargs)(lstm1)
lstm3 = LSTM(200, name='lstm3', **lstm_kwargs)(lstm2)
btch = BatchNormalization()(lstm3)
dns = Dense(50, name = 'Dense')(btch)
# output: sigmoid layer
output = TimeDistributed(Dense(1, activation='sigmoid'), name='output')(dns)
model = Model(inputs=x, outputs=output)
# compile model
optimizer = RMSprop(learning_rate=0.001)
model.compile(optimizer=optimizer, loss='binary_crossentropy')
sw = np.ones(shape=(len(y_train),))
history = model.fit(x_train, y_train, sample_weight = sw, batch_size=128, epochs=500, verbose=1)
```

What model architecture should I use? Also, what optimizer should I use? What loss function should I use? I am thinking of this architecture but am unsure about the choice of loss function and optimizer. The model trained on current architecture gives `AUROC=0.75`

. How I can achieve high AUROC?

Need suggestions.