I’ve been stuck for a while on Course 5, week 4 on the EncoderLayer() as many in this forum. It is obvious that this week should’ve been removed or reworked because there’s very little link or help on this.
I partially agree with you on this as well. Prof Andrew indeed spends a little less time on teaching the ins and outs of Transformers as compared to the other concepts in this specialization, but at the same time, he wants the learners to learn how to figure out a concept by themselves. Assuming a learner has learnt in all the previous courses and the previous weeks of the 5th course meticulously, in my opinion, the assumption that the interested learners will try to delve deeper into transformers by themselves makes complete sense. Also, there’s only so much that Prof Andrew can discuss in a single course, otherwise, the learners will lose interest and will not complete the course anyways.
As to this, the reason is pretty straight-forward. If we put the solutions in the public forums, then the learners will simply refer to the solutions when they are stuck. I don’t see any learning happening there. If you want that, why to have the assignments in the first place at all? We could have simply included notebooks with all the code included.
Now, coming to your query, the error clearly asserts that your implementation for the EncoderLayer is wrong. Can you please DM your implementation for the same to me?
I could fix the problem. Just for anyone who will face the same problem, please pay attention to the hint "
# apply layer normalization on sum of the output from multi-head attention (skip connection) and ffn output to get the
# output of the encoder layer (~1 line)
These forums are supposed to have answers. I am the third person (at least) to have this problem, and this forum says nothing about how I might solve it. I have the same exact problem as Rafael. Can you also leave a clue or answer in this forum so others who might have this problem can understand why, please?
For anyone who is having the same problem, I found the solution. I was adding self_mha_output to ffn_output instead of adding skip_x_attention to ffn_output for the second layer norm
Hey @william27,
Apologies for the delay, and your poor experience; and thanks a ton for sharing the insights with the community.
Please note that we try our best to answer as many queries as we can, but sometimes in the lot of queries, we, as mentors, tend to miss out on certain queries, due to multiple reasons. We hope that you won’t have to go through the same experience again.