I’m confused about the model architecture used for sentiment prediction in Week 1 of the 3rd NLP course (NLP with Sequence Models).
Specifically, I’m trying to understand if the dense layer is linear or non-linear.
In previous DNN classes and the literature, dense usually refers to the connectivity between two layers (so z(n+1) = M*a(n) + b where M connects all units in layer n to all units in layer 2), but you still need to specify the activation function.
I don’t see any arguments to tell tl.Dense how to calculated the activations a from z. My best guess is that TRAX represents what is usually shown as a single non-linear layer as two layers: first linear, then the activation function. If that is the case, it seems like this should be made more explicit.
More generally, I don’t see any diagram or description of the model architecture in the lectures, only in the weekly exercise (and even there, the diagram is incomplete, omitting the mean layer)