I am not entirely sure how adding a regularisation term helps achieving disentanglement. Lecture doesn’t talk about the intuition behind it as well.

Can somebody explain this?

I am not entirely sure how adding a regularisation term helps achieving disentanglement. Lecture doesn’t talk about the intuition behind it as well.

Can somebody explain this?

Hi Richeek,

I will try to give you a very naive intuition just for the sake of understanding.

Without any regularization, the model may find it easier to represent the data by focusing on a few dominant factors. For example, it may use a single component in the learned vector space to encode most of the information related to face shape, expression, and hair color. This can lead to a highly entangled representation where it becomes challenging to disentangle and control individual attributes.

By adding regularization term to the training process, the model is encouraged to spread the influence of different factors across the representation. For instance, L2 regularization penalizes large weights, encouraging sparsity and pushing some weights toward zero. This encourages the model to distribute the information across multiple components rather than relying on a single dominant component. And that’s why regularization is introduced here.

Hope you get what I’m trying to say. If not, feel free to post your queries.

Regards,

Nithin

Thanks so much Nithin. Makes sense