C3 W2 Quiz Question 14

The question says “To recognize a stop sign you use the following approach: First, we localize any traffic sign in an image. After that, we determine if the sign is a stop sign or not. We are using multi-task learning. True/False?”

I answered True but the grader said incorrect with comment “Multi-task learning is about joining several tasks that can benefit from each other.” I thought that the image that we are analyzing is possible to contain multiple signs and so we have multiple tasks to solve at the same time, including the stop signs. I understand the binary nature of “is it a stop sign or not?”, but we could have to deal with multiple signs at the same time per image. Here multi-task why cannot be used?

Hey @Charalampos_Inglezos,
That’s an interesting question indeed. I just revisited the lecture video once to answer this question better.

I believe what we refer to “task” in “multi-task learning” here is to recognize different objects, and not different instances of the same object. You may ask, how do I know this? If you watch the lecture video of “Multi-task learning”, you will find that Prof Andrew has kept the label has a 4-dimensional binary vector, i.e., whether there is a car or not, whether there is a pedestrian or not, and so on. The label, doesn’t contain how many cars are there, or even their bounding boxes.

Now, as per the quiz question, we just need to determine whether there is a stop sign or not. There may be multiple traffic signs in a single image, out of which, there definitely could be more than one stop sign. But, we still have only one task to train for, i.e., “Determining whether there is a stop sign or not”. So, by definition, this is not an example of multi-task learning.

P.S. - In the 4th course of this specialization, you will get to learn about techniques using which you can identify multiple instances of the same object in an image, using a single network, trained for a single task, so I believe, “Why this is not an example of multi-task learning?” will make more sense to you then.

Let us know if this resolves your query.

Cheers,
Elemento

I have completed the Course 4 as well and yes I think I understand it better now you’re right.
Considering the image detection lectures in C4, what would be a multi-task problem in image detection? Isn’t image detection a multi-task problem itself by nature? Multiple objects multiple classes simultaneously, plus bounding boxes. For example what is multi-task in Course 3, in Course 4 seems to be just image detection with one ConvNet - one task with… multiple objectives?! Seems kind of counterintuitive.

Hello @Charalampos_Inglezos ,
Here are a few images to describe the differences between single task and multitask learning.

Multi-Task Learning: In multi task learning, especially in the field of Medical Imaging, two different losses are combined to do segmentation.


Single-Task Learning: One loss function is used to complete the task. Single U-Net for example.

Here are descriptions of Binary and Multi-class classification which should be observed differently from the terms above:

Binary Classification: is a type of supervised machine learning where the goal is to predict one of two possible outcomes.
The two possible outcomes are often referred to as classes, labels, or categories. For example, a binary classification model could be used to predict whether a customer will churn (cancel their subscription) or not. In this case, the two classes would be “churn” and “not churn”.

Multi-Class Classification: is a type of supervised machine learning where the goal is to predict one of possible outcomes. The possible outcomes are often referred to as classes, labels, or categories.
For example, a multi-class classification model could be used to predict the breed of a dog. In this case, the possible classes would be the different breeds of dogs.

I hope the definitions of the terms above help you with figuring out your issue.
Please feel free to share a followup,
Regards,
Can Koz

I think the confusion is between multi-class and multi-label. What you describe in the second problem is multi-class because all the objects are instances of a single label, an animal, either a cat or dog etc.

But if I got it right, multi-task refers to multi-label, not only multi-class. For example in the self-driving car, the multi-task problem is to detect simultaneously cars, lanes, people, animals, trees, signs and everything else, which are multiple labels.

So the question becomes “is multi-task = multi-label or multi-task = multi-class?”

But I thought that classes is a different thing than labels. So we have 3 types of classification: 1) binary (1 label & 2 classes), 2) multi-class (1 label & multiple classes) and 3) multi-label (multiple labels and multiple classes). In the general image detection problem, the most common scenario is to recognize multiple objects of different type probably, which means multiple labels and hence multi-label. Does multi-task mean multi-label?

Hey @Charalampos_Inglezos,
I have removed my last reply, since there were some discrepancies in that, which were pointed out by Balaji. Let me re-write a new reply from scratch, in which I will add some of the things conveyed by Balaji. We are going to consider this as the reference lecture video.


First of all, let me begin my quoting some excerpts from this lecture video:

You’re already familiar with the image classification task where an algorithm looks at this picture and might be responsible for saying this is a car. So that was classification.

… classification with localization. Which means not only do you have to label this as say a car but the algorithm also is responsible for putting a bounding box, or drawing a red rectangle around the position of the car in the image. So that’s called the classification with localization problem. Where the term localization refers to figuring out where in the picture is the car you’ve detective.

… detection problem where now there might be multiple objects in the picture and ,you have to detect them all and and localized them all.

So in the terminology we’ll use this week, the classification and the classification of localization problems usually have one object. Usually one big object in the middle of the image that you’re trying to recognize or recognize and localize. In contrast, in the detection problem there can be multiple objects. And in fact, maybe even multiple objects of different categories within a single image.


After reading these excerpts, we can easily define some of the concepts:

  1. Classification → Determining the category for the object in a given image. This category can be referred to as “class” or “label”, both are good.
  2. Classification + Localization (or simply Localization) → Determining the category for the object and drawing a bounding box around it. For the bounding box, it predicts the bounding box co-ordinates, and hence, it is a regression task as well.
  3. Detection → Determining the categories and drawing the bounding boxes around all (multiple) the objects in a given image.

To help us determine which of the previous tasks fall under the umbrella of single-task settings, and which ones fall under multi-task settings, let’s define some more concepts:

  • Binary Classification → Determine whether the object belongs to a particular class or not. For example, “Cat or Not-cat classifier”, “Spam or Not-spam classifier”, etc.
  • Multi-class Classification → Determines the class from a set of classes for the given object. Note that the image can contain object belonging to one of the given classes only, and not more than one. For example, “Cat vs Dog vs Horse classifier”, “1 or 2 or 3 or 4 or 5 classifier”, etc.
  • Multi-label classification - Determines the class(es) from a set of classes for the given object(s). Note that the image may contain more than one object, and the different objects may belong to different classes.

Now, let me quote an excerpt from Wikipedia as well, for our reference:

Further examples of settings for MTL (Multi-task learning) include multiclass classification and multi-label classification.


Now, let’s determine which of the aforementioned tasks are single-task and/or multi-task settings:

  1. Classification - Based on which type of classification we are doing, it can be either of them.
  2. Classification + Localization - If we are using 2 different NNs to perform the individual tasks, and we are performing binary classification, then only it’s a single-task setting, otherwise a multi-task setting.
  3. Detection - It’s a multi-task setting, since it employs multi-label classification.

Well, that’s the entire crux. @balaji.ambresh, @canxkoz and @Charalampos_Inglezos, please do let me know if I have overlooked anything, or mentioned something incorrect. I would be more than happy to rectify it, so that this thread can help other learners as well.

P.S. - @Charalampos_Inglezos, I hope that this resolves your issue :nerd_face:

Cheers,
Elemento

Thanks for the post, @Elemento.

Here are a few points:

Multi-label classification - Determines the class(es) from a set of classes for the given object(s). Note that the image may contain more than one object, and the different objects may belong to different classes.

Feedback: Given an input image, output is a vector of label probabilities for that image. There is no object localization / detection involved here.

Classification + Localization - If we are using 2 different NNs to perform the individual tasks, and we are performing binary classification, then only it’s a single-task setting, otherwise a multi-task setting.

Feedback: Localization has to do with figuring out the bounding box coordinates. Localization+Classification is a problem where classification can be multi-class / binary / multi-label. We have a multi-task learning setting only if both classification and bounding box predictions share the same underlying network.

Detection - It’s a multi-task setting, since it employs multi-label classification.

Feedback: Detection has to do with identifying multiple objects within the input image and classifying each object. Again, the classification part of the problem can be binary, multi-class or multi-label in this problem setting.

Multi-task does not mean multi-label.
You can do binary segmentation using multitask learning.

Thank you @Elemento, @balaji.ambresh for your detailed feedback!
So to come back to the quiz question, we have an image that contains perhaps multiple instances of road objects (pedestrians, cars, traffic lights, signs etc) and we need to classify them as stop signs or not. Since we have possible multiple labels in the images, isn’t it a multi-task problem of multiple object detection with classification, even though we only care about “is it a stop sign or not” for each one of the objects we come across? Because what the algorithm has to do is probably to locate and identify each one of the objects simultaneously in case there are multiple objects for example, and tell us if there are any stop signs among them or not.

Here’s the current version of the quiz question:

To recognize a stop sign you use the following approach:
First, we localize any traffic sign in an image. After that, we determine if the sign is a stop sign or not. We are using multi-task learning. True/False?

The question states that we localize i.e. predict bounding box co-ordinates for ONLY 1 traffic sign within the input image. The input image might have 0+ traffic signs.

Once this is done, think of a function that takes the original image and the co-ordinates of the bounding box as inputs and returns the smaller image within the bounding box.

This smaller image is fed to another neural network to make a binary class prediction if the class is stop sign or not.

This is the suggested correction of the quiz question (waiting for staff approval) to make things clear:

To recognize a stop sign you use the following approach:
First, we use a NN to predict bounding box co-ordinates around any one traffic sign within the input image. After that, we determine if the sign is a stop sign or not using a different NN. We are using multi-task learning. True/False?

@balaji.ambresh
“To recognize a stop sign you use the following approach:
First, we use a NN to predict bounding box co-ordinates around any one traffic sign within the input image. After that, we determine if the sign is a stop sign or not using a different NN. We are using multi-task learning. True/False?”

“We are using multi-task learning” goes to the first NN or the second NN? Because if it refers to the second one then not, if it refers to the first one then yes because of the “any traffic sign” part. In total in our system yes we would say we use multi-task since we have multiple objects detection for the localization stage.

We draw only 1 box as part of the localization stage. How is it multi-task?

You say “First, we use a NN to predict bounding box co-ordinates around any one traffic sign within the input image”

Bounding box for any sign it detects inside the image. If we have 5 signs, then the system shall detect 5 different signs, localize them with bounding boxes and classify each one of them. How is it that we draw only 1 box in case there are multiple signs in the input image?

Please watch this video again. Andrew says that object localization predicts bounding box co-ordinates of 1 object.

image

In this case we don’t have only 1 object per image, we probably have multiple signs per image since in general on the road we can encounter multiple signs together. In the quiz question we have to deal with object detection, which localizes multiple objects (multiple bounding boxes) and classifies them (and stop signs is what we’re looking for).

You are correct. We do have to predict bounding boxes of all traffic signs.

Hope this clears things:

To recognize a stop sign you use the following approach:
First, we use a NN to predict bounding box co-ordinates around all traffic signs within the input image. After that, we determine if each of the traffic signs is a stop sign or not using a different NN. We are using multi-task learning. True/False?

And the answer is True or False? In the original quiz question it was False (and that’s why I started this thread because I believed the False answer of the grader was incorrect). But after all the above, I think we have concluded that since it is a problem of detection (multiple objects), we deal with multi-task learning, so now is it True? The sentence “We are using multi-task learning” refers to the first NN or the second NN? If it refers to the first one, it is multi-task, if it refers to the second one then it is not multi-task. As a system combined, the total NN (which comprises of the two NNs) is using multi-task.

You are correct IMO. The words different NN are missing when it comes to classification side of the problem. Guess we’ll have to wait for the staff to share their take on this.

Thanks for bringing this up btw.