Understanding Deep Sentiment Model NLP C3W1

I’m confused about the model architecture used for sentiment prediction in Week 1 of the 3rd NLP course (NLP with Sequence Models).

Specifically, I’m trying to understand if the dense layer is linear or non-linear.

In previous DNN classes and the literature, dense usually refers to the connectivity between two layers (so z(n+1) = M*a(n) + b where M connects all units in layer n to all units in layer 2), but you still need to specify the activation function.

I don’t see any arguments to tell tl.Dense how to calculated the activations a from z. My best guess is that TRAX represents what is usually shown as a single non-linear layer as two layers: first linear, then the activation function. If that is the case, it seems like this should be made more explicit.

More generally, I don’t see any diagram or description of the model architecture in the lectures, only in the weekly exercise (and even there, the diagram is incomplete, omitting the mean layer)

Hi @David_Fox

No, tl.Dense is plain Linear layer without any activation function.

Same thing here, but no “2)” point is not mandatory. TensorFlow implementations require to specify activation function but by default it is None (docs). Similarly here, but the activations are separate layers (more similar to PyTorch).

Cheers