Derivative of Relu in output layer

This is probably covered on Rashmi’s link, but it looks like there is some definite pattern in the wrong answers: they are all just 0 and they seem to alternate with the correct answers. Seems like it’s worth some analysis to see if you can see any patterns in the inputs that give bad results versus the good ones. The other thing to consider is that maybe it’s not such a great idea to use ReLU as the output layer activation. The reason you are getting zero answers must be that the predictions were negative at the linear activation level, right? Try using Leaky ReLU and see if that gives negative predictions for some values. I assume negative values would not make sense in your application. Other possibilities would be swish. Or if any output value between -\infty and \infty makes sense, just eliminate the output activation altogether.