Residual analysis for neural networks

Hi,

I’ve been delving into neural network models with a linear activation function in the output layer and ReLU in the hidden layers. Different AIs (like ChatGPT, etc.) suggested that when evaluating the model, one can perform a residual analysis, which is also done for classic GLMs. My question is, does it make sense to perform this residual analysis for a neural network like the one I described? If so, should it be done in the same way as for a linear model, or should it be analyzed differently?

Thank you very much!

Hello, @emi2025,

I have not used it for a long time, and I am not experienced in that with Neural Network, only to share some views here.

Residual analysis lets us discover some potentially missing terms in our model, such as a quaduatic term if the residual plot looks quaduatic. For this case, certainly we can do feature engineering to add some quaduatic terms and look for improvement which I think is good if we already know what those terms should be. However, if we don’t know or if the terms are not in any simple mathematical form, since hidden layer adds non-linearity, then instead of treating the plot as a signal for adding certain terms, it could as well be a signal for a larger network. However, we have also learned about Variance and Bias (in MLS Course 2), so high Bias might also have hinted us to use a larger network. The challenge, then, is how you decide if the current J_{train} (refer to C2W3 lecture Diagnosing bias and variance) is too high.

On the other hand, if, instead of additional terms, the plot shows a possible “unequal variance” (i.e. heteroscedasticity) situation (a plot of such below)

then, traditionally, it could hint us to “stablize” these variances with some transformation on the label such as \sqrt{y}, \frac{1}{y}, and so on… These transformations, of course, might also be discovered by your domain knowledge, if not from the residual plots. Again, if, for example, \sqrt{y} can work, because \sqrt{y} = f_1(x) \implies y = f_1(x)^2 \implies y = f_2(x) where f_1, f_2 are some neural networks, we might also say that it is possible for us to change our NN from f_1 to f_2 to take care of that transformation.

Having said the above, it seems that residual analysis was not quite necesary, but as a tool and if you know the tool well, then it may reveal some details that your other tools (e.g. Bias and Variance) can’t directly. If I knew a quadratic term and \sqrt{y} were required, then I would apply them right the way instead of just guessing what my next NN should be, because I could have a lot of problems on my way ahead and wouldn’t it be wonderful to get rid of the obvious ones first?

Lastly, as I said, I have not used it for some time. Although I could share some views based on my understanding, it also implies that a full understanding of the tool is needed to give you the best answer.

Cheers,
Raymond

Hello @rmwkwok,

Thank you for your helpful response—it really guided me in focusing my analysis better. I’ve been working with a neural network with the characteristics I mentioned earlier, and after evaluating different alternatives, I arrived at a model with low bias and low variance. Since I didn’t have a baseline for evaluating the final metrics of the chosen model, I relied on the Mean Absolute Error (MAE) divided by the mean of the target variable and the Root Mean Squared Error (RMSE) divided by the mean. Additionally, the model achieved a high R-squared value.

However, during the residual analysis, I noticed that for the lowest values (approximately the first 30% of the data), the model tends to produce negative residuals. I believe this behavior might be due to the cost function being the Mean Squared Error (MSE), which prioritizes minimizing larger residuals. As a result, the model may be less accurate for smaller values of the predicted variable.

I’ve decided that this limitation is acceptable. Whether this is acceptable or not ultimately depends on the specific goals and requirements of the project. For now, I’ve chosen to move forward and continue studying other topics, but I plan to revisit this project later to refine it further.

If you have any additional insights or comments on my analysis, I’d greatly appreciate your input!

Thanks again for all your help!

Cheers,

Emiliano

1 Like

Hello, Emiliano @emi2025,

I dare not say insights because I have not examined anything myself, but I think it’s reasonable to say MSE can pay more attention to samples with larger label values, and if you had used MAE as the cost and then saw a corresponding change, then that would be better. What’s left is then, what didn’t change from using MSE as cost to using MAE as cost?

If the model tended to see negative residuals, regardless of where it is, it might be a sign for improvement, perhaps that’s a sign of high bias in that range because it’s consistently over-/under-estimating, but I can’t be sure, again, without examining it myself.

I don’t want to generate many questions because you are moving forward. If you will come back to this project in the future and if I will be around and see your next post about it and if you don’t mind to share your work here and if you are still with the residual analysis approach, perhaps then we can, together, find and go through some online materials about it.

Cheers,
Raymond

1 Like

Thanks @rmwkwok!
I agree with you.

Cheers,
Emiliano

1 Like