Why are nonlinear activation functions needed?

The XOR problem the neural network does not use a nonlinear activation function. Why is it able to learn the more complicated XNOR function after 2 layers.

I took a quick skim through the lecture video you posted, and I think it is implied that the sigmoid activation function is used for the XOR problem as well.

For any non-trivial neural networks, a non-linear activation function is usually required, otherwise the network can be simplified to a single linear combination (ie. simplified to just having one neuron).

