For this week’s lab assignment, exercise 1, why is it necessary when using Keras Sequential model and Dense Layer with a sigmoid activation to construct the network described to specify that the last layer Dense(1, activation='sigmoid') has a sigmoid activation function?
In Keras Sequential model, there are examples very similar to this (as shown below) but the last later doesn’t have an activation function specified. **How can we choose whether (or not) to specify an activation function for a specific layer? **Please let me know.
The key is in how you invoke the loss function. You have two choices: you can explicitly include sigmoid or softmax as the output layer activation (depending on whether it’s a binary or multiclass classification) or you can omit the output activation and use the from_logits = True argument to tell the loss function to do the activation computation along with the loss internally. The two methods are logically equivalent, but the latter is more efficient: less code to write and it gives more accurate results. Here’s a thread which discusses that and explains more about it.
Mind you, I am not a mentor for this particular course, so I don’t know if the assignment here has any requirements for which way you implement it in this particular case. You’ll need to consult the instructions.
Following to @paulinpaloalto detailed explanation, just like to add that the requirement for this exercise is to have 3 layers with sigmoid as activation function. Please see below:
The neural network you will use in this assignment is shown in the figure below.
This has three dense layers with sigmoid activations.
Recall that our inputs are pixel values of digit images.
Since the images are of size 20×2020×20, this gives us 400400 inputs