I really don’t get the point here. Anyone can help? Thanks!
I doubt it’s legitimate to use XOR as the node/function in the deeper network to illustrate the point. If XOR is allowed, I guess one can just also use it as the single output unit in a one layer network and doesn’t really need the 2^n - 1 hidden units in that sense?