NG mentioned that there is a potential risk while artificially data synthesizing such as NN may overfit to a some specific subset of synthesized data. He exemplified that fact by giving the example of synthesized car images may seem very real to humans but not to NN.
And my question is: doesn’t correct answer of question 10 totally ignore that fact?
I think it’s because in the lecture Andrew was talking about audio noise, while in the quiz you’re training an image network. I’m not 100% sure though, I’m also confused about this.
In the course video, Andrew described two cases. Both are different from the one we have in a quiz.
With audio, we can’t rely on human judgment, because humans can only detect sounds in a frequency range from about 20 Hz to 20 kHz. The human perception is totally different – humans quite good at it.
Distribution between artificially generated car images and real cars is quite different. At the same time, a distribution of real foggy images found on the Internet should be close to a distribution of images from the front-facing camera of a car.
In the quiz, we have quite a few foggy images and their distribution should be close to a distribution of images from the front-facing camera of a car, so answer options seem fine.
Aah, thanks for the answer manifest! When I was watching that video I thought that human failure to detect different noises was a general thing, not just an example for audio, but it makes sense that for images it’s not as big of an issue.
I still can’t wrap my head around this. So the issue with the car audio is that we, as humans, couldn’t be sure that it was “random” enough?
For example, if we had the same 10k hours of clean speech and 1h of car noise, but we had an expert mechanic saying “this 1h of car noise encompasses most noises a car can make” would that reduce the risk of overfitting to that specific noise track? Assuming the process of creating the synthetic data didn’t introduce another systematic error.
Good to know that I was not the only one confused by the answer.
But I have a slightly different take on the process of augmentation. We can’t randomly mix pictures of fog and road to come up with a substitute for foggy images, can we? The lighting will be all wrong in the two images – how do we objectively determine that whether the images are “natural”? There are examples where images look unchanged to a human eye whereas an NN classifies them as two different things. This previous example is usually cited in the context of adversarial attacks, but I’m using to ask whether we can be truly confident, as the question suggests, of the synthetic data.
Hi, thank for this answer, but I find it not entirely satisfying. As far as I understand correctly, Andrew was making the point in his lecture video that with only 1h of car noise, the neural net can start adapting itself to the limited variability of the noise. For us it sounds random enough, but for the neural net it’s too repetitive/limited. To be honest, the foggy images problem for me, is very similar. If you would only use a limited amount of “foggy” filters, the neural net, by the same argument as for the noise, would fit to the characteristics of the fog pictures. I don’t think it has anything to do with how well the human ear can hear noise, or the human eye can see. It’s more of a question: is the data varied enough (random enough) for the neural net to prevent overfitting to it. So I am still confused by the answer that such a limited set of foggy pictures would be enough to use for the augmented fog data set. Is there some other reason I am missing?
It might be true that if one was able to synthesize a convincing (to a human) mapping from “true image” to “foggy image” the data could be used. BUT the process proposed in the quiz is so oversimplified that - in my opinion - this would not lead to useful artificial data.
Adding fog to a scene is not the same as simply superimposing a cloud to the original image, even if naively it might seem to work. In reality, depth perception will be totally altered, making it impossible, for instance, to make out any features beyond a certain distance, the latter not being included in any way into the proposed synthesization mechanism.