You can see that the problem is that your cost value ends up being a 3 x 3 matrix in the first case and a 4 x 4 matrix in the second case. So how could that happen? Note that in the first test case, both Y and A are 1 x 3 vectors. So Y^T is 3 x 1. If you dot 3 x 1 with 1 x 3, you get a 3 x 3 matrix, right? I don’t think that is what you want for the cost.
Here’s a previous thread that works through some examples that are relevant to the problem here.