Question - Model Structure

dmslava · January 24, 2023, 1:43am

Hello everyone,

I have a general question about the model structure that is proposed in the assignment.

Since we are trying to predict binary outcome, why do we use final layer with 2 nodes and softmax activation function? Would it be better to use a single node with sigmoid activation functions?

The official Tax documentation also has an example with binary target and 2 nodes and softmax, so I assume there should be a good reason. As I understand, the sigmoid might be considered as a special case of the softmax, am I thinking about it correctly? But even if so, why not to use simpler structure instead?

Finally, I couldn’t find any examples/tutorials for Tax with sigmoid in the internet. I tried to implement it following the same code/framework as in the assignment (changing final Dense layer to 1, replacing softmax with sigmoid and using binary cross entry loss), but I’m getting some error that I can’t debug. So, if anyone can point to some code snippets that would be much appreciated.

Thank you beforehand!

reinoudbosch · February 2, 2023, 3:21pm

Hi dmslava,

Good questions.

My guess about why the softmax is used is that this makes it easier to generalize to multi-class classification. I would read the use of softmax rather than sigmoid in the trax docs as indicating that any additional compute costs are negligible in practice.

Maybe relatedly, it’s difficult to come up with code examples implementing sigmoid with trax in binary classification. One related code example is this one. As is done in the example, you could try using trax.fastmath.ops.expit. But you might as well decide to stick to softmax if you run into too much trouble.

Topic		Replies	Views
Course 3 Week 2 quiz; Why not use softmax Structuring Machine Learning Projects	4	547	September 19, 2022
Activation Function for Last Layer - Lab Assignment: Neural Networks for Binary Classification Advanced Learning Algorithms week-1	2	499	August 1, 2023
Softmax as activation function for output layer of DenseNet121 model AI for Medical Diagnosis week-1	6	484	August 13, 2023
C3W1_TensorFlow_Tutorial NLP with Sequence Models week-1	5	312	December 21, 2023
Why in this lecture slide we are putting vector Z in to tf.nn.sigmoid when we used softmax? Advanced Learning Algorithms week-2	3	525	October 21, 2022

Question - Model Structure

Related topics