What about all CNNs? Are they all one-to-one architectures?
I think no one answered this question because it belongs in Course 4, but you posted it in Course 5. (Because CNN’s and YOLO are both covered in Course 4).
It’s an interesting question, but I think it’s really more a question of quibbling about the meaning of the concept of “one to one” or “one to many”. My take is that the way Prof Ng uses those terms they are really kind of specific to Sequence Models. The whole point of RNNs is that “one sample” is a sequence. That is not true for CNNs or the FC networks we learned about in Courses 1 and 4. But I guess you could say that YOLO is a bit different than other simpler classification systems in that it takes one input (an image) and provides quite a lot of distinct outputs: potentially many bounding boxes and object classification probabilities. So I guess you could say that it’s a “one to many” architecture, but I think that concept doesn’t really mean the same thing in the CNN case: it’s taking in one sample and providing one “classification”, but that classification output is pretty complicated and has a lot of elements in it, not a simple “yes/no” answer. Is that the same thing as an RNN that takes in an English sentence with 5 words and outputs a French sentence with 7 words? Hmmmm, I’m not sure that’s really getting at the same concept.