Actually, this wouldn’t be “linear” regression, but only regression. When you do linear regression, you find parameters a1, a2… so that the n-dimensional plan a1 x1 + a2 x2 + a3 x3 + … + an xn + b = 0 fits your n-dimensional data. But if the data you have can’t be fitted correctly with a hyperplane you may want to find some unknown complex function that a neural network can express. Indeed, when you stack layers you are left with a big function from consecutive matrices/g(y)'s
So :
you can try any activation function in the hidden layers and see what you get
the output node is to be compared to something you want to predict. The use of an activation function or not depends on your goal. If for instance, you fit your data using the (n-1) first feature trying to predict the nth one, which can take any floating value, you may keep your output layer without any activation, or if it takes a compact set of value [a, b] you could try arctan/sigmoid appropriately rescaled (for instance a + sigmoid()*(b-a)
You will have topics about that with recurrent neural networks if I remember correctly