Course 1: Week 3 (modeling intuition)

Hi, I just completed Course 1: Week3 + Assignment. During the assignment, I was surprised to see the difference in plots between standard logistic regression and our NN model (ie. number of sectional “cuts”. 1 vs k for logistic vs. our NN).

Course 1: Week 3: Assignment. #5.2 scatter plot vs. #2 scatter plot.

I wanted to make sure I’m understanding the intuition right (putting aside the model architecture of NN). If we manually expanded out A2 or y_hat (as a function of our input A0). We have a linear equation like so: W2W1A0 + b1W2 + b2 or (ax + bx + …+Lx + b1W2+…+…bL). [can simplify all the constant terms to just: ax+bx+…+Lx + B

Am I right that the reason for the NN model doing better is because of the fact that we basically have MULTIPLE linear equations (vs. 1 in logistic) that contribute to the prediction. And the loss that’s minimized is a function of their sum. Meaning sections where y=1 for example, is the contribution of the sum of those equations.

If I’m wrong, what’s the intuition behind the fact that we’re getting these sectional cuts in plot

Hi, @xmkhan. You are in the right neighborhood. My recent response to another question may help out with the intuition.

Let me know if it helps!

1 Like

Thanks for replying, I think I understand now - Universal Approximation theorem. In my above explanation what was missing is that we are introducing non-linear activations (sigmoid above) and that is ultimately what helps us better predict y=f(x) in a more flexible manner.

More importantly, a hidden layer has been introduced (with a hyperbolic-tangent) activation. The sigmoid function in the output layer was there in the original case. It’s a natural choice in binary classification problems because it delivers an output that can be properly interpreted as a probability. (Technically speaking, it is a “cumulative density function.”)