In Exercise 2 - alpaca_model:
Why do we use linear activation, not sigmoid?
Because when we compile the model we use BinaryCrossEntropyLoss with the from_logits = True argument. That means the sigmoid calculation is performed by the loss function. This is a very common practice: it works better and it’s less code for us to write.