Why No. of filters are keep increasing as we go deep in CNN?

In many of the CNN architectures, I have observed that the number of filters(or depth of output layer) keeps increasing as we go deeper in CNN. Is there a rationale or intuition that follows for making such a CNN architecture or is it just empirical for first architecture and then just community followed without bothering to change it? To me the second case seems quite less plausible and I think there must be some concrete reason for the CNN creators to keep increasing (or at least keep constant) the filter size as we go Deeper in the network

