The XOR problem the neural network does not use a nonlinear activation function. Why is it able to learn the more complicated XNOR function after 2 layers.

I took a quick skim through the lecture video you posted, and I think it is implied that the sigmoid activation function is used for the XOR problem as well.

For any non-trivial neural networks, a non-linear activation function is usually required, otherwise the network can be simplified to a single linear combination (ie. simplified to just having one neuron).

At what time mark has the video said the above?

Would you like to google and share how to achive XNOR with NNs?

Cheers,

Raymond