Hi all, I noticed in the course the “Improved implementation of softmax” is used a linear function, i.e.: tf.keras.Input(shape=(XXX,)), Dense(25, activation=‘relu’, name = “L1”), Dense(15, activation=‘relu’, name = “L2”), Dense(10, activation=‘linear’, name = “L3”), model.compile( loss=tf.ke…

Output layer: why a linear activation function instead of a relu?

Course Q&A Machine Learning Specialization Advanced Learning Algorithms

paulinpaloalto July 3, 2024, 2:40pm 3

Right! Here’s a thread which discusses the reasons for using from_logits = True mode and more about what that means. And here’s one from Raymond that does a much more complete explanation of the math behind this.

2 Likes

Topic		Replies	Views
Improved implementation of softmax - Neural network training \| Coursera Advanced Learning Algorithms week-module-2	1	68	June 25, 2024
Why does the final layer become linear in the softmax lab? Advanced Learning Algorithms week-module-2	1	488	November 17, 2022
Why Activation function in last layer - linear - C4W2 Convolutional Neural Networks coursera-platform	1	528	January 16, 2022
[Week 2] Assignment 2, Exercise 2 : Why should we choose 'linear' output instead of sigmoid output if it's binary classification problem and not linear regression? Convolutional Neural Networks coursera-platform	1	761	April 19, 2021
Why ReLU and softmax? NLP with Probabilistic Models week-module-4	1	613	November 2, 2021

Output layer: why a linear activation function instead of a relu?

Related topics