Andrew says we need to examine all the 2^n possibilities of the input features then he adds on and says we technically need 2^(n-1) hidden units.

I can’t quite get this as we are learning the parameters in the neural nets why do we need to examine all the possibilities. Do he mean we need more training examples? If so why will the deep net work without looking at all the possibilities.

when he use one layer he want to check if x1 xor with all xn . and make check x2 xor with all xn that’s Permutations and combinations to check each element with all the elements what will result because it cannot be sure after that because the number of layers is 1 and therefore this will cost a lot of calculations and the program speed will be or equal to the number of calculations = O(n^2) it is the max number of calculations if we use 1 layer…if we use many layer we can check xor of every two elements and after that we check xor of result of every two elements and so on … it lead to less number of calculations =O(log(n)) so if we use more layers we can do complex calculate than one or 2 layers.

I hope I answered you question,

please feel free to ask any question,

Thanks,

Abdelrahman

Hi @AbdElRhaman_Fakhry thanks replying. This is what I got from your answer, correct me if I am wrong.

Okay let us suppose we are taking the analogy of actually computing the Xor of all the inputs then in the deep layer network the depth of the tree(not the computations) will be O(logn) and computations will be applying the operation to each and every input(total number of nodes of the tree which is of the O(n))

As in the deep layer input array will always be of size 2 so the number of computations will be significantly less.

Then in the single layer input will be all the Xn inputs for each and every layer unit which will be significantly costlier but how it is O(2^n)

We are assuming the actual computation of the XOR in our network in the hope that by giving enough examples to our network it will figure out that it has to compute the XOR of all the inputs.

Another Doubt comes up with this if our network is smart enough to figure out the XOR function then why can’t it figure out that computing the XOR for just the single unit will suffice for our problem.