Optional HW on LSTM and Shakespeare.
(1) Kernel regularizer was used and that’s ok. I have no problem using it. But how does one know when to use it and when not to? e.g. instructor did not use kernel regularizer in ANY of previous lectures or homeworks or courses in this TF Specialization, so how are we magically supposed to dream that OK, I should use it for this problem and I should not use it for that problem?
(2) Why bidirectional LSTM after Embedding layer and not 1-way LSTM?
(3) Why 2nd LSTM layer was 1-way LSTM and not bidirectional.
For each of above choices, what is the intuition behind the choice? I am seeking a richer explanation/intuition beyond an empirical “because it works and gives good accuracy” type answer. Thanks