This is the video: https://www.coursera.org/learn/neural-networks-deep-learning/lecture/rz9xJ/why-deep-representations
At minute 8, Andrew gives the example of a n-fold XOR done by a neural network. He states that if you would only use 1 hidden layer, that layer would need 2^N units. (Or even 2^N-1). Can someone maybe explain the reasoning behind that? So you have n input vectors in layer 0, going to 2^N hidden units in Layer 1 and then 1 unit in layer 2.
But why is that?
PS I understand that the truth table of that n-fold XOR unit would have 2^n columns as the size since those are all possible combinations, but why/how that translates to the neural network, I find hard to understand.
This point is really not that big a deal in the grand scheme of things. He just mentions it in passing and then never refers to these ideas again. So it’s not really worth spending too much mental energy on this, but with that said:
You’ve got the math, then the question is just what he means by the single layer XOR network. You connect all inputs to all neurons. At each “node” you’re just going to XOR the inputs with a specific combination, so you need each possible combination, one per node. Only one of them will output 0 for a given combination, right? Or you can invert the logic and use the NOT of each pattern in each node and have 1 be the criterion for “yes”. I guess the latter is more intuitive.
Ah, thank you. Thanks to your explanation I made a simple drawing of a 3 input 8-node hidden layer network and made the connection. So say, the truth table has x1, x2, x3 as columns, and 8 rows of combinations. Then, each row in the truth table just “corresponds” to one of the nodes in your network. I just missed that. Thanks! I know this was a detail of a detail in the grand scheme of things, but until now I understood everything so I didn’t want to miss this one. Cheers.