Course3 Week4. Why was kernel regularizer, Bi-LSTM used?

adityahpatel · December 20, 2021, 9:47pm

Optional HW on LSTM and Shakespeare.

(1) Kernel regularizer was used and that’s ok. I have no problem using it. But how does one know when to use it and when not to? e.g. instructor did not use kernel regularizer in ANY of previous lectures or homeworks or courses in this TF Specialization, so how are we magically supposed to dream that OK, I should use it for this problem and I should not use it for that problem?

(2) Why bidirectional LSTM after Embedding layer and not 1-way LSTM?
(3) Why 2nd LSTM layer was 1-way LSTM and not bidirectional.

For each of above choices, what is the intuition behind the choice? I am seeking a richer explanation/intuition beyond an empirical “because it works and gives good accuracy” type answer. Thanks

maurizioscibilia · December 21, 2021, 12:56pm

Hi @adityahpatel,
“because it works and gives good accuracy” is actually a good reason for fine-tuning and even big model choices. Just a couple of reasons that I can give you for the topics you’re talking about:

Kernel regularizer keeps the weights under control, by applying a penalty on the layer’s kernel (weights), not bias. You mainly use it when your network shows an unstable training.
Bidirectional LSTM is something to try when you have the whole sentence at your disposal since the beginning.

Best

Topic		Replies	Views
Question about the use of Bidirectional LSTM for Text Generation Natural Language Processing in TensorFlow week-4	8	350	October 18, 2024
Bidirectional vs vanilla LSTM NLP with Attention Models week-1	6	351	March 4, 2024
Why did the RNN after picking the learning rate did not use Bidirectional Layer? Sequences, Time Series and Prediction week-4	1	523	March 28, 2022
Week2 Emojifier-V2 Model Architecture Sequence Models	1	498	May 4, 2023
Training accuracy can't be augmented Natural Language Processing in TensorFlow week-4	2	370	April 17, 2022

Course3 Week4. Why was kernel regularizer, Bi-LSTM used?

Related topics