Course3 Week4. Why was kernel regularizer, Bi-LSTM used?

Optional HW on LSTM and Shakespeare.

(1) Kernel regularizer was used and that’s ok. I have no problem using it. But how does one know when to use it and when not to? e.g. instructor did not use kernel regularizer in ANY of previous lectures or homeworks or courses in this TF Specialization, so how are we magically supposed to dream that OK, I should use it for this problem and I should not use it for that problem?

(2) Why bidirectional LSTM after Embedding layer and not 1-way LSTM?
(3) Why 2nd LSTM layer was 1-way LSTM and not bidirectional.

For each of above choices, what is the intuition behind the choice? I am seeking a richer explanation/intuition beyond an empirical “because it works and gives good accuracy” type answer. Thanks

2 Likes

Hi @adityahpatel,
“because it works and gives good accuracy” is actually a good reason for fine-tuning and even big model choices. Just a couple of reasons that I can give you for the topics you’re talking about:

  1. Kernel regularizer keeps the weights under control, by applying a penalty on the layer’s kernel (weights), not bias. You mainly use it when your network shows an unstable training.

  2. Bidirectional LSTM is something to try when you have the whole sentence at your disposal since the beginning.

Best

2 Likes