So In this lecture slide, we can see that this is training neural network with more numerically accurate option. And here logit is a collection vector Z each element just being a z not representing probability.

In that case why are we putting Z into tf.nn.sigmoid? Shouldn’t we always use tf.nn.softmax? So it can do some calculation for each element like e^z1/e^z1 + e^z2 … + e^zn ?

From the screenshot, the model has a dense layer of 1 unit as its output layer, so it is actually a simple logistic regression that predicts whether the sample belongs to a class or not, instead of a more general multi-class classification which has more than one unit in its output layer.

The simple binary logistic regression uses a sigmoid, whereas the multi-class uses a softmax. Here it is a simple binary logistic regression.

Question pertaining to the original question and the answer. In one of the lecture slides shown here (class 2, week 2, “Improved implementation of softmax”):

It shows the code for what the professor says is a logistic regression. This video goes on and edits the code to use the more numerically accurate option (output is ‘linear’ and from_logits=True is used in the .compile function). However, the output layer remains with units = 10. Questions on this:
1.) If we are doing a binary logistic regression, I thought the output layer had to be 1 unit, and a multiclassification problem would have the number of units in the output layer equal to the number of possible categories. Did I miss something?
2.) Side question: If we wanted to, could we use the
kernel_regularizer=tf.keras.regularizers.l2(0.1)
variable/input within a layer to make the calculation regularized for binary logistic regression? (I only recall seeing that in use with a multiclass classification problems later in this course).