C3_W1 quiz 1 answer doesn't match the hint

It seems like the answer of first quiz from week 1 of course 3 doesn’t match the hint.
Maybe the answer is wrong?

In Multi-label you can have multiple output classes identified at the output of the model but in multi-class only one class will be identified at the output. So when it says identify all it means many labels at the output.

@Po-Yu_Liang ,

I just leave two cents here for an intuitive description to explain. Correct me if you have found what’s wrongly.

First of all, if I saw this question I would also select True. After some while I have understood the question, actually, this is my poor language-wise.

Suppose you have a mini-batch from the training dataset:

X was two rows of flattened images.

[[.1,.2,.3,.4,.4],
  [.1,.2,.3,.4,.4]]

Y was two labels for them, we suppose we have two classes of the image, so

[[1],
  [1]]

The possible Y_hat might be like this or similar.

[[0.97],
  [0.88]]

This is multi-classes. This is a classical coin game.

max{p(img_class_1=1|v) + p(img_class_2=1|v)}

Another side:

For multi-labels:
The Y needs to be

[[0, 1],
  [0, 1]]

The possible Y_hat might be like this or similar.

[[0.23, 0.97],
  [0.11, 0.88]]

This is multi-labels. <------ Cover the statement: identify all different items.
So the Y_hat has two units and both don’t exclude each other, they are independent events. The algorithm (looks like sigmoid for two units) is trying to get

max{p(img_class_1=1|v) * p(img_class_2=1|v)}

The descriptions seem to be right to me, mostly, but the equations they don’t seem to be right.

For multi-class the sum of probabilities is 1 and you choose the maximum not the max of the sum.

For multi-label each output can have a range (if of course you use sigmoid) from 0 to 1, and there should be no need to take any maximum because every class is independent of each other, it excludes each other. This is my understanding.

good to know.

I will take these questions with me as I continue to read some of the documentation. Really appreciate some feedback from you. Right or wrong, it’s good to give advice, thanks.

In the real project, as I experienced, the multi-label seems more useful, especially when there are different classes on the same object, ie.
x: Face photo y: women/man, with/without mask, long/short hair

1 Like

Definitely a good way to learn, I am also learning here and when I give my opinion, it may not be always correct but its a way of discussing points of view and delving more into the subject.

Hey, I quickly checked page 9. in http://cs229.stanford.edu/notes2020fall/notes2020fall/cs229-notes2.pdf , it describes that Naives Bayes, which addresses the story of the multi independent events.

I think for the multi-label, the goal of the algorithm is to reach

eq := maxOf{p(img_class_1=1|v) * p(img_class_2=1|v) * .......}

According to the solution of homework stanford-CS229/3_Gaussian_Discriminant_Analysis.ipynb at master · ccombier/stanford-CS229 · GitHub
The Bayes rule, in other words, just one event, can be derived into logistic form.

If certain multi-label will be applied, assume we have 3 labels, I would like to put 3 units sigmoid in the last layer, or let’s say 3 logistic functions. Each unit tries to reach the max so that the eq will reach the max.

WDYT? Maybe I am wrong, just want to exchange for this.

1 Like

No their equation is right definitely, I can not argue that, it also makes sense because independetly each label’s probability tends to become higher so the product of them will be higher.