It seems like the answer of first quiz from week 1 of course 3 doesn’t match the hint.
Maybe the answer is wrong?
In Multi-label you can have multiple output classes identified at the output of the model but in multi-class only one class will be identified at the output. So when it says identify all it means many labels at the output.
I just leave two cents here for an intuitive description to explain. Correct me if you have found what’s wrongly.
First of all, if I saw this question I would also select True. After some while I have understood the question, actually, this is my poor language-wise.
Suppose you have a mini-batch from the training dataset:
X
was two rows of flattened images.
[[.1,.2,.3,.4,.4],
[.1,.2,.3,.4,.4]]
Y
was two labels for them, we suppose we have two classes of the image, so
[[1],
[1]]
The possible Y_hat
might be like this or similar.
[[0.97],
[0.88]]
This is multi-classes. This is a classical coin game.
max{p(img_class_1=1|v) + p(img_class_2=1|v)}
Another side:
For multi-labels:
The Y
needs to be
[[0, 1],
[0, 1]]
The possible Y_hat
might be like this or similar.
[[0.23, 0.97],
[0.11, 0.88]]
This is multi-labels. <------ Cover the statement: identify all different items.
So the Y_hat
has two units and both don’t exclude each other, they are independent events. The algorithm (looks like sigmoid for two units) is trying to get
max{p(img_class_1=1|v) * p(img_class_2=1|v)}
The descriptions seem to be right to me, mostly, but the equations they don’t seem to be right.
For multi-class the sum of probabilities is 1 and you choose the maximum not the max of the sum.
For multi-label each output can have a range (if of course you use sigmoid) from 0 to 1, and there should be no need to take any maximum because every class is independent of each other, it excludes each other. This is my understanding.
good to know.
I will take these questions with me as I continue to read some of the documentation. Really appreciate some feedback from you. Right or wrong, it’s good to give advice, thanks.
In the real project, as I experienced, the multi-label seems more useful, especially when there are different classes on the same object, ie.
x: Face photo y: women/man, with/without mask, long/short hair
Definitely a good way to learn, I am also learning here and when I give my opinion, it may not be always correct but its a way of discussing points of view and delving more into the subject.
Hey, I quickly checked page 9. in http://cs229.stanford.edu/notes2020fall/notes2020fall/cs229-notes2.pdf , it describes that Naives Bayes, which addresses the story of the multi independent events.
I think for the multi-label, the goal of the algorithm is to reach
eq := maxOf{p(img_class_1=1|v) * p(img_class_2=1|v) * .......}
According to the solution of homework stanford-CS229/3_Gaussian_Discriminant_Analysis.ipynb at master · ccombier/stanford-CS229 · GitHub
The Bayes rule, in other words, just one event, can be derived into logistic form.
If certain multi-label will be applied, assume we have 3 labels, I would like to put 3 units sigmoid in the last layer, or let’s say 3 logistic functions. Each unit tries to reach the max so that the eq will reach the max.
WDYT? Maybe I am wrong, just want to exchange for this.
No their equation is right definitely, I can not argue that, it also makes sense because independetly each label’s probability tends to become higher so the product of them will be higher.