Need Some Help with Vector Calculus in Cost Function Derivation

Musab_Bin_Gulfam · August 26, 2024, 9:33am

Hi everyone,
I’m working through the derivation of the cost function in linear regression, specifically where W and X are vectors in the equation WX + B. I understand how differentiation works with scalars, but I’m having trouble grasping how it applies to vectors.

Could someone explain how vector calculus is applied here? Any resources or explanations would be greatly appreciated! Image attached below.

Thanks!

nadtriana · August 26, 2024, 2:38pm

In calculus, you are familiar with finding the derivative of a function with respect to a variable. When dealing with vectors, the principles are similar, but you must consider how each element in the vector affects the function.

1. Gradient with Respect to w:

The gradient with respect to the weight vector w is computed as:

\frac{\partial J(w, b)}{\partial w} = \frac{1}{m} \sum_{i=1}^{m} \left( w \cdot x^{(i)} + b - y^{(i)} \right) \cdot x^{(i)}

Here’s how it works:

Step 1: The cost function J(w, b) = \frac{1}{2m} \sum_{i=1}^{m} \left( w \cdot x^{(i)} + b - y^{(i)} \right)^2 is first expanded as a sum of squared differences.
Step 2: Apply the chain rule. When differentiating \left( w \cdot x^{(i)} + b - y^{(i)} \right)^2 with respect to w, the derivative of the squared term is 2 \times \left( w \cdot x^{(i)} + b - y^{(i)} \right).
Step 3: The derivative of the linear term w \cdot x^{(i)} with respect to w gives x^{(i)}, since x^{(i)} is treated as a constant vector.

This gives the cost function gradient with respect to each element of the weight vector w.

2. Gradient with Respect to b:

Similarly, the gradient with respect to the bias term b is computed as:

\frac{\partial J(w, b)}{\partial b} = \frac{1}{m} \sum_{i=1}^{m} \left( w \cdot x^{(i)} + b - y^{(i)} \right)

The key difference here is that differentiating the linear term b with respect to b yields 1, rather than a vector.

Vector Calculus Application: When you differentiate a scalar function with respect to a vector (like w), you obtain a vector where each component is the partial derivative of the function with respect to one of the components of w.
Chain Rule in Vector Form: Like in scalar calculus, the chain rule is applied but operates over the vector elements. This is why you see the x^{(i)} term multiplying the error (w \cdot x^{(i)} + b - y^{(i)}).

To deepen your understanding, check out the paper “Matrix Calculus You Need For Deep Learning”, which is an excellent resource for understanding how derivatives work in the context of matrices and vectors, especially in machine learning.

Musab_Bin_Gulfam · August 26, 2024, 2:44pm

Thank you so much, this was very helpful.

Topic		Replies	Views
Derivation of linear regression cost function wx+b Supervised ML: Regression and Classification week-module-1	2	553	July 22, 2022
Course 1: Week 2 - Derivation of the cost function J with respect to w and b Neural Networks and Deep Learning coursera-platform	4	618	May 25, 2021
Equate the derivative of cost to 0 zero to get the weight 'w' Supervised ML: Regression and Classification week-module-1	4	483	June 12, 2023
C2_W2_Assignment - section 3 / Exercise 5 - clarification Calculus for Machine Learning and Data Science week-module-2	1	133	May 22, 2024
Calculation of partial derivative of the cost function for logistic regression Supervised ML: Regression and Classification week-module-3	60	177	February 25, 2025

Need Some Help with Vector Calculus in Cost Function Derivation

1. Gradient with Respect to w:

2. Gradient with Respect to b:

Related topics