Week 3, Question about architecture used in Programming Assignment: Trigger Word Detection

Regarding the programming assignment, we are instructed to build a very specific network but no explanation or intuition is given as to why we would expect that architecture to work. In particular, why use two consecutive GRU layers? Why isn’t one good enough? Why not three or four of them?

It’s based on experimentation. You can try different numbers of layers but to pass the grader, you should follow the instructions.

1 Like

Thank you for your reply. I am still wondering if there is a more theoretical answer in addition to “this is based on experimentation”.

It’s possible that there is, but I don’t recall Prof Ng ever mentioning anything about that. Please note that the course authors and Prof Ng are not really listening here. It’s just your fellow students. The mentors are fellow students who have completed the course in question successfully, but that doesn’t mean we are academic level experts in the field. We are also volunteers, meaning we don’t get paid to do this. Meaning that you are not guaranteed an answer to any given question.

As an example of how theory and practice work in ML, there is the Universal Approximation Theorem. That only applies to fully connected feed forward networks, not Sequence Models, but I hope we can use it as an analogy. What it tells us is that we can approximate any of a very large class of functions using feed forward neural networks. The problem is that it gives you exactly zero guidance in how to actually construct such networks in practice. It’s very useful to know that we’re not fundamentally wasting our time by trying to use NNs to approximate complex functions, but beyond that the UAT is not really all that helpful from a practical standpoint.

Maybe this is yet another instance in which the famous A. Einstein quote applies: “In theory, theory and practice are the same. In practice, they’re not.” :nerd_face:

1 Like

Thank you for your very informative message.