Hello @ajallooe,

That is because the “scalar chain rule” does not apply here.

This post has some references to wikipedia, and also an example.

In the example, step 1 to 3 is for setting up the example; step 4 shows an expected result we want to derive; step 5 assumed the chain rule applied but turned out unable to calculate the answer; step 6 does it with the right way which is not following our chain rule.

If you have accepted that the scalar chain rule does not apply but were looking for a different version of chain rule that applies, I am not familiar with that topic. However, as I have also quoted in that post

Matrix calculus refers to a number of different notations that use matrices and vectors to collect the derivative of each component of the dependent variable with respect to each component of the independent variable.

I think this is the key condition that you can base on to verify any rule that you make for the situation of your interest. The wikipedia page has listed many identities but if it does not cover what you are interested in, you might resort to this condition and try to establish some identities yourself. My example in that post was just trying to verify it that way.

Cheers,

Raymond