I have removed my last reply, since there were some discrepancies in that, which were pointed out by Balaji. Let me re-write a new reply from scratch, in which I will add some of the things conveyed by Balaji. We are going to consider this as the reference lecture video.
First of all, let me begin my quoting some excerpts from this lecture video:
You’re already familiar with the image classification task where an algorithm looks at this picture and might be responsible for saying this is a car. So that was classification.
… classification with localization. Which means not only do you have to label this as say a car but the algorithm also is responsible for putting a bounding box, or drawing a red rectangle around the position of the car in the image. So that’s called the classification with localization problem. Where the term localization refers to figuring out where in the picture is the car you’ve detective.
… detection problem where now there might be multiple objects in the picture and ,you have to detect them all and and localized them all.
So in the terminology we’ll use this week, the classification and the classification of localization problems usually have one object. Usually one big object in the middle of the image that you’re trying to recognize or recognize and localize. In contrast, in the detection problem there can be multiple objects. And in fact, maybe even multiple objects of different categories within a single image.
After reading these excerpts, we can easily define some of the concepts:
- Classification → Determining the category for the object in a given image. This category can be referred to as “class” or “label”, both are good.
- Classification + Localization (or simply Localization) → Determining the category for the object and drawing a bounding box around it. For the bounding box, it predicts the bounding box co-ordinates, and hence, it is a regression task as well.
- Detection → Determining the categories and drawing the bounding boxes around all (multiple) the objects in a given image.
To help us determine which of the previous tasks fall under the umbrella of single-task settings, and which ones fall under multi-task settings, let’s define some more concepts:
- Binary Classification → Determine whether the object belongs to a particular class or not. For example, “Cat or Not-cat classifier”, “Spam or Not-spam classifier”, etc.
- Multi-class Classification → Determines the class from a set of classes for the given object. Note that the image can contain object belonging to one of the given classes only, and not more than one. For example, “Cat vs Dog vs Horse classifier”, “1 or 2 or 3 or 4 or 5 classifier”, etc.
- Multi-label classification - Determines the class(es) from a set of classes for the given object(s). Note that the image may contain more than one object, and the different objects may belong to different classes.
Now, let me quote an excerpt from Wikipedia as well, for our reference:
Further examples of settings for MTL (Multi-task learning) include multiclass classification and multi-label classification.
Now, let’s determine which of the aforementioned tasks are single-task and/or multi-task settings:
- Classification - Based on which type of classification we are doing, it can be either of them.
- Classification + Localization - If we are using 2 different NNs to perform the individual tasks, and we are performing binary classification, then only it’s a single-task setting, otherwise a multi-task setting.
- Detection - It’s a multi-task setting, since it employs multi-label classification.
Well, that’s the entire crux. @balaji.ambresh, @canxkoz and @Charalampos_Inglezos, please do let me know if I have overlooked anything, or mentioned something incorrect. I would be more than happy to rectify it, so that this thread can help other learners as well.
P.S. - @Charalampos_Inglezos, I hope that this resolves your issue