I am working on a regression problem whose inputs and outputs are positive or zero. I am using ‘linear’ activation in the output layer of a neural network, and It predicts negative values for the outputs that actually should be zero. This happens when I use the data on a larger scale, and for a smaller scale of data that doesn’t happen.
Now, I was wondering if I could use ‘relu’ in the output layer instead of ‘linear’ to resolve this issue. Or an appropriate model should predict nonnegative values if even ‘linear’ activation is used?

Real-valued outputs can be any value. It depends on what the model learns about the data set. There is no way (nor should you) to force the output values to be limited to the range of the input values.

No, do not use ReLU in the output layer.

If you wish, you can artificially clip the negative values to zero, but I don’t recommend it. That won’t add value. It would only artificially make the outputs look better.

Yes, that’s what you want/need if the output values on your training set are all positive. So this is driven by your loss function, right? In a regression problem, you are typically using some form of Euclidean distance as the loss. Usually the square of the distance is used, because it has nicer mathematical properties in terms of the gradients. So if the label output on a given sample is positive, but the model predicts a negative value for that sample, then the loss is even greater than it would be if you used ReLU to “clip” the negative answer to be zero. So that means the gradients should push even more strongly to correct the model outputs during training in cases that produce negative predictions.

At least that’s what I would intuitively expect, but this is an experimental science: try it both ways (linear activation vs ReLU at the output layer) and see what you notice in terms of how long the training takes and the quality (accuracy) of the results you get with the trained model.