Course 3 week 2 Quiz

Hi Paul
I am just going through this quiz for the first time and am really struggling with the wording of the questions and the reasoning of just what is a right answer.
A typical example is my question 9 where the question explains at length that 7.2 percent of the overall dev set is 47% of the dev set error. the three options offered then refer us to 7.2% of the dev set error or the test set error.
The question talks about occluded (meaning blocked up) when the question clearly means ‘obscured by the sun’
Again there are a few questions in the quiz that don’t make real sense or where the answers offered don’t really connect with the question.
Can you please discuss this with the quiz setting team. At the moment I feel that the best way to answer both the quizzes on this course is by random selection
Maybe the real purpose of the quiz is to get us to code a model that deep learns the optimum answers :laughing:


Hello @Ian_Proffitt ,
I am sorry you got no public answer for a while, but note Paul is just one mentor, who cannot cover all questions (though I see him super-active trying to do so).
Is your concern is still relevant? Or you already discussed this in private messaging?

I noticed several confusing questions as well:

  1. There’s a question near the end of the exam which refers to Strategy A instead of strategy B (from the previous question). However the previous question does not refer to “A” or “B”. (This question comes after the question about cattle crossing the road.)

  2. Another question begins with a table containing 2%, 2.3%, 1.3%, 1.1% error (for training, training-dev, dev, and test errors). “Based on the information given you conclude that the Bayes error for the dev/test distribution is higher than for the train distribution. True/False?”
    Please double-check that the grader has the correct answer for this question. When I answer this question “incorrectly”, it provides feedback which contradicts what it thinks is the correct answer.

  3. Another question: “One of your colleagues at the startup is is starting a project to classify road signs as stop, dangerous curve, construction ahead, dead-end, and speed limit signs. He has approximately 30,000 examples of each and 30,000 images without a sign. This task could benefit from using multi-task learning. True/False?” I might be missing the point of this question, but it seemed like it was important to know whether these new images can contain more than one object in each image, or only one. (From the wording of the question, I assumed that each image contains at most one street sign.)

In addition there was another question which was not incorrectly graded. But students might complain it is tricky to answer because the test also wants us to justify the reason for the answer. (It’s not just enough to select True or False.) The student also has to agree with the reasoning stated in the answer (eg. “since 4.1+3.0+1.0=8.1>7.2”). I learned to read these questions more carefully.

There was a question(2 or 3) in the quiz that asked about the activation function of multi-task learning, in the video, Prof Andrew Ng only mentioned the difference between the multi-task and Softmax activation function, I didn’t see him mention the activation function of multi-task learning. What is the answer to this question? and can I have a brief explanation of why using that activation function?

I believe he mentioned that you don’t want to use softmax, because you want each output to be independently able to be 0 or 1. Softmax would scale all of the outputs so they sum to 1.

ReLU is not used as an output activation - only for hidden layers.

Yes there were 2 questions one just ask whether we should use Softmax for output layers, and the other asked of 4 options: sigmoid, linear,… which activation function to choose for the output layer, which confused me as the video of multi-task learning did not say anything about that.