There are two separate issues here:
- Which activation function to use at the output layer
- Which activation function(s) to use in the hidden layers of the network
For the output layer, the choice is determined by what your network is predicting. If it is a classification problem, then you use sigmoid for binary classifications (cat/not a cat) and softmax for multiclass classifications (cat, dog, zebra, horse, kangaroo …). Also note that there is a loss function that goes naturally with sigmoid and softmax, which is the cross entropy (“log loss”) loss function. You can think of softmax as the multiclass generalization of sigmoid.
For “regression” problems where you are predicting a continuous numeric value (stock price, temperature, …), then you’re right that it might make sense to just the linear output or ReLU in the case that a negative output value does not make sense. In that type of problem you want a distance based loss function, so typically that would be either MSE (mean squared error) or perhaps MAE (mean absolute error).
For the hidden layers of the network, you have a lot more freedom. Here’s a thread which discusses that.