W3A2 Trigger_word_detection_v2a

Fusseldieb · October 23, 2023, 4:22pm

What is the intuition of the model (figure 3) used in this assignment? Specifically, why are two GRU layers used instead of one?

arvyzukai · October 24, 2023, 5:02am

Hi @Fusseldieb

This question probably belongs to another Course (there is no trigger word detection in NLP Specialization). But I can try to answer your question because it’s the same for all RNNs (the application does not matter).

In general, you stack RNNs (GRUs in your case) when you want to have more complexity from the input patterns (in other words, the inputs are complex and you want to capture that). Single layer might not be enough to capture the complexity of the sequence.

The sequence information is the reason for another GRU and not a Linear (Dense) layer. It is somewhat similar to when you have Deep Neural Network (Dense layers) to represent a single example in other applications (like predicting MNIST digit) to capture hierarchical level of features. Likewise with multiple RNN layers - you try to capture hierarchical level of features but with sequence in mind - where previous input could/should influence next input representation in the sequence.

The last layer is usually a Linear (Dense) layer for application (classification/regression) purposes.

Cheers

Fusseldieb · October 26, 2023, 7:02pm

Hi,

Thanks for the answer. It’s true, the topic should have been “Deep Learning Specialisation” not “NLP Specialisation”.

It seems, just as with deep NNs, sort of random how many layers are used. Is this always trial and error? Sort of not very scientific!

Cheers

arvyzukai · October 27, 2023, 6:03am

Hi @Fusseldieb

It’s not totally random approach, it’s somewhat systematic search (scientific) and it depends on the problem at hand and your experience (art).

The art part is that you already tried different number of layers and nodes for similar datasets, so you kind of have a “feeling” how many, what type etc., you need to start iterating.

The scientific part is that there are different hyperparameter optimization techniques.

For example, one simple approach for illustration:

just take the mean (or mode) of the labels (predicted variable) and predict that (the dumbest model possible and measure further models against that (on accuracy or whatever metric is most important));
then the simplest possible configuration - a linear regression (one layer, number of nodes is equal to number of features); and check how much you improved;
then start increasing the numbers of layers (and nodes) and check for the rate of improvement; the increase of layers could be exponential (like 2, 4, 8, 16) so you could find the upper limit early and not waste compute. The complexity of your data might indicate if you need more of a deep network or a wide network.
This step (iterations on different architectures) is not very straightforward (the reason for “might” in previous sentence) because sometimes there are jumps (non-linear increase) in performance when you train for “long enough”.

Of course, the approaches differ for tabular data vs. vision vs. natural language etc. For example, from experience you might know that you would xgboost for tabular data, CNNs for vision, Transformers for natural language. And of course research/science vs. production/commerce is also a big factor of what you might try.

So, it’s a combination of scientific and “creative” or random approach.

Cheers

paulinpaloalto · December 29, 2023, 4:05pm

Just noticed this thread and you’re right that it is about DLS Course 5 Week 3, so I moved the thread category by using the little “edit pencil” on the title.

Topic		Replies	Views
Week3 - Trigger word detection - Why do we need 2 GRU layers Sequence Models coursera-platform	1	496	July 8, 2022
Week 3, Question about architecture used in Programming Assignment: Trigger Word Detection Sequence Models week-module-3 , coursera-platform	4	23	January 19, 2025
Model architecture: Embedding dimension size and GRU number of cells NLP with Sequence Models week-module-2	8	1197	January 3, 2023
When to use Deep RNNs and intuition behind Deep RNNs? Sequence Models coursera-platform	4	612	May 18, 2021
RNN Layer lecture video week 1 Sequence Models coursera-platform	4	539	June 24, 2021

W3A2 Trigger_word_detection_v2a

Related topics