How does these techniques satisfy orghogonalization?

Sara · May 26, 2021, 12:57pm

Thinking about the different techniques to reduce bias/variance, I don’t understand how they satisfy orthogonalization.

The techniques we learned to reduce avoidable bias include the below, and some of them seem to affect variance:

bigger model → would increase variance as well?
train longer → possibility of over-fitting, thus, might affect variance
better optimization algorithm (seems innocent enough)
different architecture / hyperparameters (already answered here)

And the techniques we learned to reduce variance include:

more data (seems innocent enough)
regularization → would increase bias
different architecture / hyperparameters (already answered here)

manifest · May 27, 2021, 2:40pm

Hey Sara,

I believe this post contains the answer to your question.

Note that

A bigger model and different architecture may be seen as a procedure of changing a model capacity.
A hyperparameter is any parameter that we fix during training process. In that sense, a number of layers is a hyperparameter that changes model capacity, a regularization coefficient of norm penalties is a hyperparameter that allows us to increase or decrease regularization, etc.
An optimization algorithm definitely has an effect on learning and its learning rate is probably the most important hyperparameter, but we do not consider it as an instrument for reducing bias or variance. It determines how fast we converge to some solution.

Sara · May 28, 2021, 7:39am

I think I got confused because I thought orthogonalization means taking actions to decrease bias/variance that would not affect variance/bias (from the analogy Andrew mentioned something like tuning the height/width without affecting the width/hight of the TV).

But from the post you linked to, it seems like we cannot be sure addressing one problem without affecting the other. So it seem like there really isn’t orthogonalization in training ML models. At the beginning, we just have to address the bias problem first than variance. And maybe after some preliminary result, we chose which problem to address through comparing avoidable bias and variance.

Is my understanding correct?

Also I realized I spelled the title wrong, is there a way to change it

manifest · May 28, 2021, 2:34pm

We just can’t do otherwise, because our judgement about high variance based on the estimation of the difference between train and dev set errors. That said, we need to achieve a low error on the train set first (address a high bias problem) first.

To my understanding, orthogonalization is just a technique that allows us to observe an effect of our actions in an isolated way.

For example, if we changed model capacity by adding new layers and also used additional features, we wouldn’t be able to tell which of these actions decreased bias. But, if we only changed model capacity, we would be able to estimate the effect it caused. The same for additional features. That is high-level idea behind orthogonalization.

manifest · May 28, 2021, 2:36pm

I can’t change it either. It will stay there forever

Topic		Replies	Views
Changing NN architecture/hyperparameters and orthogonalization Structuring Machine Learning Projects	1	670	May 22, 2021
Is orthogonalization for all ML projects, or just deep learning projects? Structuring Machine Learning Projects	1	589	May 27, 2021
Network size and bias variance tradeoff Improving Deep Neural Networks: Hyperparameter tun	3	646	April 26, 2021
Week 1: dropout vs reducing network? Improving Deep Neural Networks: Hyperparameter tun	14	1351	August 19, 2023
Avoidable bias and Variance Structuring Machine Learning Projects	2	657	July 4, 2022

How does these techniques satisfy orghogonalization?

Related topics