This is the video: https://www.coursera.org/learn/neural-networks-deep-learning/lecture/rz9xJ/why-deep-representations

At minute 8, Andrew gives the example of a n-fold XOR done by a neural network. He states that if you would only use 1 hidden layer, that layer would need 2^N units. (Or even 2^N-1). Can someone maybe explain the reasoning behind that? So you have n input vectors in layer 0, going to 2^N hidden units in Layer 1 and then 1 unit in layer 2.

But why is that?

PS I understand that the truth table of that n-fold XOR unit would have 2^n columns as the size since those are all possible combinations, but why/how that translates to the neural network, I find hard to understand.

This point is really not that big a deal in the grand scheme of things. He just mentions it in passing and then never refers to these ideas again. So itâ€™s not really worth spending too much mental energy on this, but with that said:

Youâ€™ve got the math, then the question is just what he means by the single layer XOR network. You connect all inputs to all neurons. At each â€śnodeâ€ť youâ€™re just going to XOR the inputs with a specific combination, so you need each possible combination, one per node. Only one of them will output 0 for a given combination, right? Or you can invert the logic and use the NOT of each pattern in each node and have 1 be the criterion for â€śyesâ€ť. I guess the latter is more intuitive.

1 Like

Ah, thank you. Thanks to your explanation I made a simple drawing of a 3 input 8-node hidden layer network and made the connection. So say, the truth table has x1, x2, x3 as columns, and 8 rows of combinations. Then, each row in the truth table just â€ścorrespondsâ€ť to one of the nodes in your network. I just missed that. Thanks! I know this was a detail of a detail in the grand scheme of things, but until now I understood everything so I didnâ€™t want to miss this one. Cheers.