Here’s a thread which discusses the point about the prediction outputs being “logits” instead of activation values.
Here’s another thread which lists the most common mistakes on that function.
Here’s a thread which discusses the point about the prediction outputs being “logits” instead of activation values.
Here’s another thread which lists the most common mistakes on that function.