As explained in the 5th video of week 4, shallower networks could have used exponentially more neuron units in one layer.

As an extreme example in the video for dealing with x1 XOR x2 XOR … XOR xn, why 1-hidden layer network would use 2^n units? What does each unit compute?

To my understanding of the sample in the video, assume a “shallow” network need n units to compute the XOR function, a deep neural network can do the same computation with hidden layers of units like [n/2 , n/4, n/8, …]. In this way, the total number of required units are reduced comparing to the original n units.

I don think so. If we assume XOR units with two inputs, we would need n/2 + n/4 + n/8 + … + n/n = n - 1, which is still O(n) units.

It seems that each unit in the shallow network represents a particular combination of bits, but I do not understand why we should compare a shallow and a deep network with different units. Did anyone understand the intuition?