I am just wondering why we do not do activations for the last linear Z3 to compute to loss,
or e.g why to pass this argument logits – output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
1 Like
1 Like