Mushi
June 25, 2024, 2:06pm
1
Hi, my question is why did we change the output layer activation function to linear? we had multiclass classification problem where we had 10 possibility of outputs. how are we going to predict our output label y? and what does the “from_logits = true” does and what does it mean ?

Video created by DeepLearning.AI, Stanford University for the course "Advanced Learning Algorithms". This week, you'll learn how to train your model in TensorFlow, and also learn about other important activation functions (besides the sigmoid ...

mplementation-of-softmax

See if you searched a bit about it one of our mentors here @rmwkwok has written a great post about this, check it out:

From C2 W2 “Improved implementation of softmax”, we know that, for a binary classification problem, the following approach A is more stable than the approach B:
Approach
Output layer’s activation
Loss Function
A
“linear”
tf.keras.losses.BinaryCrossentropy(from_logits=True)
B
“sigmoid”
tf.keras.losses.BinaryCrossentropy(from_logits=False)
This post will show the maths reason, and begin with the following slide:
[image]
The lecture replaces the middle equation (Approach B) with th…
2 Likes