If so, what’s the activation function?

If the activation function is simply linear, multiple hidden layers become a simple linear function

But the point is that the hidden layers are not linear, right? So this is not “linear regression”: it is applying a DNN to solve a regression problem. This is perfectly possible: you just have to choose an appropriate output layer activation (as you say) and then a cost function. Depending on the nature of your output values (e.g. can they be negative) you can either just use the linear output or apply ReLU at the output layer. Then you’ll probably want to used a distance based loss function like MSE.

Please show me some examples of the activation functions for linear regression.

You can just use either ReLU if the values need to be positive or just use no activation function at all at the output layer if negative values are meaningful for whatever the quantity is that you are trying to predict.

If no activation function at all at the output layer, NN becomes linear regression, no matter how many hidden layers.

That is not true: we are only talking about the *output* layer here, right? The point is that there are non-linear activations in all the hidden layers. You can choose the functions to use, e.g. sigmoid, tanh, swish, ReLU, Leaky ReLU etc. The point is that it is only at the *output* layer that we would consider just using the linear output in this type of case.

sigmoid, tanh, swish, ReLU, Leaky ReLU etc. restricts output to (-1, or) or positive.

do we have activation functions that output both large positive and large negative?

The range of Leaky ReLU is (-\infty, \infty). Or if you simply don’t use an activation function at the output layer, then the range is also (-\infty, \infty).