For MLP networks random initialization almost doesn’t affect the final values of the parameters after converging (after sufficient iteration). I wonder does random initialization affects the parameters of filters after converging in CNN. for example, the filter value [1,0,-1] gives results an important feature for a CNN, in one random initialization case this value may appear on the 5th filter but in another random initialization, it may appear on the 10th filter.
The random initialization prevents all of the hidden layer weights from learning the exact same features.
The NN cost function is not convex, so every random initialization may give slightly different final results.
The key is to assess whether a solution is “good enough”, not necessarily the absolute optimum.
The course covers how to decide what “good enough” means.
Yes, I think that can happen: it learns how to recognize the features, but there is no guarantee that the same filter (in a CNN) or the same neuron (in a MLP) will learn any particular given feature if you run the complete training more than once with different random initializations. Note that in the courses here, they do a somewhat artificial thing: they always set the random seed to a specific value, so that the results are predictable. This is for ease of grading and writing unit tests, but in a real application you would not do that and then would have the chance to see the effect you are describing.