Deep learning and shallow learning

Could you please help me to understand
I did non grasp why shallower NN requires exponentially more units in the case of xor operation. As for me, xor operation for n values requires n operations, not 2^n.
Will be grateful for the answer.

My best regards, Vasyl.

1 Like

Hey @vasyl.delta,

Well the reason why shallow neural networks struggle with XOR operations lies in their inability to capture the non-linear relationships present in the XOR function.

The XOR (exclusive or) operation is a binary operation that outputs true only when the number of true inputs is odd. In the case of a single XOR operation (two inputs), it’s relatively simple. However, when you extend it to multiple inputs, the complexity increases.

A single layer perceptron (a shallow neural network) can only learn linear decision boundaries. The XOR operation, with its non-linear nature, cannot be learned by a single layer perceptron. To handle the XOR operation with a shallow neural network, you would need to introduce exponentially many nodes in the single hidden layer to capture the non-linear combinations effectively.

I hope it makes sense for you now.
Regards,
Jamal

2 Likes

Thank you so much for the answer!
Yes, now I see, implementing non-linear xor by NN is not trivial.
It would still be interesting, how this is done by shallow NNs, from where this 2^n complexity arises.

1 Like

As Prof Ng says in the lecture you are referencing here, this example is just for intuition and it comes from Circuit Theory, which is not really connected to Neural Networks. What is represented in those little circles in the two diagrams are a set of logic gates that perform some operation. In the left side of the diagram, each circle is an XOR operation. As Prof Ng comments, there is no such thing as an XOR gate, but you can create a small circuit using AND, OR and NOT gates to perform XOR on the two inputs. In the left side, we see a binary tree and the depth of a binary tree is O(log(n)). Then think about what the “shallow” network on the right has to do: you only have one layer of gates, so each circle there is getting all n inputs and needs to recognize one specific combination and then “fire” to give the answer for that case. If you have n inputs and each one can be either 0 or 1, then how many possible combinations are there? Two choices for each of n inputs gives you 2^n possibilities, right?

Prof Ng will never mention anything about circuit theory again at least in any of the DLS courses, so this is sort of a “throw away” example just meant to give intuition. Trees of layers are more powerful than flat layers. By using “tree like” architectures with layers that feed forward to the next layer, you can generate an equivalently powerful function with a lot fewer nodes (neurons in our context or logic gates in the circuit theory example).

1 Like

Why does the right diagram need to enumerate all the possible combinations, but not the left diagram? For example, on the left one, why don’t x1 and x2 point to two nodes for 0 and 1? I don’t get the inconsistency.

The point is that in the “tree” diagram, one XOR is performed at each “node” between just the two inputs. Of course after the first layer, those inputs are the compound result of the previous XORs.

If you use the one layer architecture, then the only way to do it is what I described in my earlier reply: each node gets all n inputs and is “programmed” to fire only on one of the 2^n unique possible combinations with the preloaded answer for that case.

1 Like

Thank you!
Now it is clear.