Why do we need activation functions?

Hi! I can’t understand how you got wx + b at the end, as in this screen from video “Why do we need activation functions?”

And I understood this video enough pure hontestly. Only one message is clear, that we can’t use linear regression for neurons with complex tasks.

The slide shows that if you only have a linear unit in the hidden layer, then mathematically your entire NN is equivalent to just plain-old linear regression.

Using a non-linear activation function in the hidden layer is essential for an NN to learn complex combinations of the input features.