C3 W2 Quiz Question 15 Bug - No point given

The question is "An end-to-end approach doesn’t require that we hand-design useful features, it only requires a large enough model. True/False?

Obviously the first half part is correct, but the second one is False due to the word model. I believe the correct word would be large enough dataset. I answered False because of that, but it graded it as 0/1 points saying “This is one of the major characteristics of deep learning models, that we don’t need to hand-design the features.” Yes, I obviously agree with that part, but the word “model” is not correct; for it to be True it should say dataset. Because “model” refers to something else, to layers, units, etc, not training data. In theory it says that a large dataset is required for end-to-end learning, not a large model. For example, if we have many many layers and many many units per layer, but only few hundreds of training data as dataset, end-to-end would not work, even though we would have a large network/model.

Is there any way for this to be fixed on the spot and the quiz be automatically regraded? This is my last attempt for today (the third one) and I don’t want to wait 24 hours again. I can attach screenshot to mod/mentor if necessary.

Hi Charalampos_Inglezos,

At 0:24-0:37 in the video 'Whether to use End-to-end Deep Learning", professor Ng states:
“So if you have enough X,Y data then whatever is the most appropriate function mapping from X to Y, if you train a big enough neural network, hopefully the neural network will figure it out”.
So it’s not just about having enough data, it’s also about having a big enough model.

I believe the critical prerequisite here is not only the length of the model, but mostly the amount of data provided. For example, if we have many many layers and many many units per layer, but only few hundreds of training data as dataset, end-to-end would not work, even though we would have a large network/model.

I strongly believe that the wording of the question is wrong at two points, 1) at the word “model” (correct one would be “dataset”) and 2) at the word “only”, since this implies that the amount of data provided is irrelevant, as long as we have only a large model/network.

Because of the above, I am asking for my submission to be regraded as correct, after fixing the question.

The point is that the requirement of a large enough model is not false. Replacing the word ‘model’ by ‘dataset’ would in fact be incorrect, because the large enough model is needed. The word ‘only’ can be interpreted to refer to the process between the input data and the output. So going from data to output, you do not need feature engineering, you only need a large enough model.

I thought I answered exactly on that with my previous comment: For example, if we have many many layers and many many units per layer, but only few hundreds of training data as dataset, end-to-end would not work, even though we would have a large network/model. And the reason would be the low amount of data.

Also, anywhere in the lectures, the slides and the PDFs of the material provided about end-to-end, what is constantly highlighted is the importance of a large dataset in order for end-to-end to work. Not the nature of the network itself.

Is the general notion, idea of the end-to-end lecture about a large model or a large dataset? What is one thing that you think prof Ng would like us, the students, to learn about end-to-end? That only if we have a large dataset we can talk about end-to-end actually to be working or only a large network is enough for end-to-end to work? I strongly believe that the large dataset is the main idea of the lecture, and this is what prof Ng keeps telling us in the lecture. Also, the word “only” seems to be subject to different interpretations and this alone should be enough to invalidate the question. I understood it as “a large model is the only thing required for an end-to-end approach”, (I think the majority of students would understand the same by reading this), while you understood it differently, in a way that I disagree.

A more general remark, what is very frustrating in the quizzes is the wording that is used. One word can decide the grade. This does not evaluate the student’s understanding of the material, but focuses more on how observant he is. Here the text could simply state “large enough dataset” and everyone would be happy. Why complicate it in such a weird way?

Hello @Charalampos_Inglezos,

Reading your posts, I started to think what I could try and experiment when I have a large model but not a large dataset.

  1. Data argumentation

  2. Transfer learning (in other words, initalization of my model’s parameters to something known to be useful, and of course given that my model and the model transferred share some common architecture)

  3. Regularization

If I may quote from one of Andrew’s letters in The Batch (full text here),

Thanks to … I suspect that scaling up neural networks — these days, I don’t hesitate to train a 20 million-plus parameter network (like ResNet-50) even if I have only 100 training examples — has also made them more robust.

Data Augmentation is covered in Course 4, Transfer Learning in Course 3 and 4, and Regularization first introduced in Course 2 and more in subsequent courses. Some also suggested that a carefully designed training schedule can help but it definitely requires us more reading but concepts of training schedule and optimization algorithms are covered in Course 2.

Among all of the above ideas, I think Regularization is the easiest to understand, because it combats overfitting which can be a result of a small dataset plus a large model.

A large dataset will certainly give us an easier life, but it seems to me that we cannot block the chance that a small dataset can survive when we need a large model to do something that a small model can’t do. As learner, I hope you are ready to dive into those content in Course 4, perhaps some online articles or papers discussing experiences of dealing with small dataset, and the best - experiment them yourself!

Cheers,
Raymond

Yes of course a large model is very important, but the end-to-end concept here at C3 W2 is talking about a large dataset to be the critical point for end-to-end to work -not a large model instead or only-, which is the question of the quiz.

Also, the “only” part of the question formulation, does not include data augmentation, regularization etc. It refers to number of layers and units, at least to the extent of my understanding and in the scope of Course 3.

@ paulinpaloalto
What I realize is that the wording, the formulation of the question is not proper because it leaves margin for misinterpretation. You, the mentors, need to take into account what the students will think seeing the question, not other senior professionals in deep-learning.

I think we all saying the same thing. Yes of course a large network is very important and can be more effective in end-to-end, but then why in the lectures, in the prof Ng video, the network is not pointed out, but what is constantly highlighted in the end-to-end lecture is the amount of data?

The easiest solution would be either to clarify it in the lectures or simply reformulate the question by removing the word “only” and add “instead” or something like that, which would give the intent that a large network is one thing required instead of hand-picked features, but also the large network is not the only one requirement needed for end-to-end to properly work.

Update: paulinpaloalto Did you delete your comment?

Also, trying the quiz again, I stumbled upon two other interesting questions about end-to-end. I think you will find the grader explanations very very interesting and they are essentially consolidating/supporting what I was saying yesterday.

  1. Correct. To get good results when using an end-to-end approach, it is necessary to have a big dataset.
  2. My selected correct answer was about “having enough data to train a big neural network” and the grader said: Correct. This might be the most important factor when deciding whether to use an end-to-end approach.