Output layer: why a linear activation function instead of a relu?

Right! Here’s a thread which discusses the reasons for using from_logits = True mode and more about what that means. And here’s one from Raymond that does a much more complete explanation of the math behind this.

2 Likes