Preface: So we didn’t have to deal with this in this assignment, but I am presuming the initialization of the weights here is happening in the same way as course 2 ? (I.e. small and random, not zero to avoid collapse)
So, throughout the lecture it is rather clear what is happening where Andrew is using explicit filters (i.e. horizontal and vertical). And perhaps I even had this question in the back of my mind for the first few courses, though now it comes into focus since we can literally ‘see it’. However now it particularly stands out because we are using convolutions and pooling serve almost as a kind of ‘compression’ in a sense.
And moving away from explicit filters, the idea is that let’s ‘let the network figure it out’.
What I am having a hard time wrapping my mind around though, especially in the first few layers, where is our assurance that the features it is picking up on are actually useful ? I mean granted, over all we are trying to minimize the cost function-- Though this also I guess kind of assumes there even is a smooth function over the images we are considering ?
Sorry if I am not expressing this well (why it is a question/confusion), but lets say the first layer picks up on ‘something’ that minimizes cost for a time-- Yet in the end turns out not to be the most defining feature of that image; But now all the deeper layers are dependent on it.
How do we kick ourselves out of that ‘feedback loop ?’