Why does back propagation work for convolutional layers?

It seems more intuitive that minimizing a cost function might result in some sort of feature extraction, even though the features themselves cannot be predicted ahead of time. With convolution though, we expect the features to be of a very specific kind (i.e edges).

Is there something inherent in the convolution function that guarantees that the only way to minimize cost is to find edges?

I’m hoping this question helps build my intuition - and accept that it might expose some limited understanding…

A convolution doesn’t necessarily find edges. It depends on the filter.
The filter is learned so that the cost is minimized.
Often the result is a filter that detects edges, but that’s not known in advance, and is not true in all situations.