Regularized Logistic Regression Cost Function

In the lecture, Prof. Ng uses a complicated model for z (something like z = w_1x_1+w_2x_2 + w_3x_1^2x_2 + \cdots + b). The final derivation of the regularized cost function requires \delta J/\delta w_j. For this, Prof. Ng writes the following

\begin{equation} \frac{\delta J}{\delta w_j}= \frac{1}{m}\sum_{i=1}^m [(f_{w,b}(x^{(i)})-y^{(i)})x_j^{(i)}] + \frac{\lambda}{m}w_j \end{equation}

However, I think this is only the case if z is a linear combination of w and x, unlike the example given at the start of lecture of a complicated model. Take w_3 of the example model given. The partials would be:

\begin{equation} \frac{\delta z}{\delta w_3} = x_1^2x_2 \\ \frac{\delta f}{\delta z} = f_{w,b}(z)(1-f_{w,b}(z)) \\ \frac{\delta J}{\delta f} = \frac{1}{m}\sum_{i=1}^m [(\frac{y^{(i)}}{f_{w,b}(x^{(i)})}-\frac{1-y^{(i)}}{1-f_{w,b}(x^{(i)})})] \end{equation}

When put altogether via the chain rule, you should end up with:
\begin{equation} \frac{\delta J}{\delta w_3}= \frac{1}{m}\sum_{i=1}^m [(f_{w,b}(x^{(i)})-y^{(i)})(x_1^{(i)})^2x_2^{(i)}] + \frac{\lambda}{m}w_3 \end{equation}

Am I missing something? The regularized cost function still seems very useful if z is modeled as a complicated linear function with many parameters, but I don’t think the function presented is the generally correct for non-linear models of z.

I have added one more step to the slide that can get us back to the linear form:

Your analysis is right.

The step that I have added is a commonly used trick to reduce it back to a linear form, and that is to construct (engineer) new features x_3, x_4, x_5 … from existing features. The concept of polynomial feature engineering was also discussed in MLS Course 1 Week 2. The difference is that here we are like doing it “backward” whereas in the polynomial feature engineering lectures it is “forward”.

Cheers,
Raymond

2 Likes