Concept explanation

Hi community,
I am following a computer vision class and I am trying to implement the naive SVM. The aim is to compute the gradient of the SVM term of the loss function: compute the derivative at the same time that the loss is being computed.
Here is the function code:


def svm_loss_naive(
    W: torch.Tensor, X: torch.Tensor, y: torch.Tensor, reg: float
):
    """
    Structured SVM loss function, naive implementation (with loops).

    Inputs have dimension D, there are C classes, and we operate on minibatches
    of N examples. When you implment the regularization over W, please DO NOT
    multiply the regularization term by 1/2 (no coefficient).

    Inputs:
    - W: A PyTorch tensor of shape (D, C) containing weights.
    - X: A PyTorch tensor of shape (N, D) containing a minibatch of data.
    - y: A PyTorch tensor of shape (N,) containing training labels; y[i] = c means
      that X[i] has label c, where 0 <= c < C.
    - reg: (float) regularization strength

    Returns a tuple of:
    - loss as torch scalar
    - gradient of loss with respect to weights W; a tensor of same shape as W
    """
    dW = torch.zeros_like(W)  # initialize the gradient as zero

    # compute the loss and the gradient
    num_classes = W.shape[1]
    num_train = X.shape[0]
    loss = 0.0
    for i in range(num_train):
        scores = W.t().mv(X[i])
        correct_class_score = scores[y[i]]
        for j in range(num_classes):
            if j == y[i]:
                continue
            margin = scores[j] - correct_class_score + 1  # note delta = 1
            if margin > 0:
                loss += margin
                #######################################################################
                # TODO:                                                               #
                # Compute the gradient of the SVM term of the loss function and store #
                # it on dW. (part 1) Rather than first computing the loss and then    #
                # computing the derivative, it is simple to compute the derivative    #
                # at the same time that the loss is being computed.                   #
                #######################################################################
                # Replace "pass" statement with your code
                pass
                #######################################################################
                #                       END OF YOUR CODE                              #
                #######################################################################

    # Right now the loss is a sum over all training examples, but we want it
    # to be an average instead so we divide by num_train.
    loss /= num_train

    # Add regularization to the loss.
    loss += reg * torch.sum(W * W)

    #############################################################################
    # TODO:                                                                     #
    # Compute the gradient of the loss function w.r.t. the regularization term  #
    # and add it to dW. (part 2)                                                #
    #############################################################################
    # Replace "pass" statement with your code
    pass
    #############################################################################
    #                             END OF YOUR CODE                              #
    #############################################################################

    return loss, dW

In the class here is how the loss and the gradient with respect to the loss are defined:

I wrote this for the first placeholder:

perturbation = 0.0001
perturbated_loss = 0.0
perturbated_W = W[y[j]] + perturbation
perturbated_margin = perturbated_W - correct_class_score + 1
perturbated_loss += perturbated_margin
dW = (perturbated_loss - loss) / perturbation # Shape: 10
               

but I don’t really understand how to get the ith element in this case to compute the loss.
Thank you for your assistance

Difficult to help with this, as SVM is not used very much now.

Why SVM for this task?

Thank you @TMosh for your reply. I’m new im computer vision so I want to learn and understand.

Computer vision often uses a Convolutional Neural Network. Covered in the Deep Learning Specialization.

To compute the gradient of the SVM term of the loss function during the loss computation, we need to carefully calculate the gradients with respect to the weights ( W ) while iterating through the training examples. Your approach of using finite differences for the gradient calculation (i.e., perturbing the weights and measuring the change in loss) is not necessary here because the gradient can be directly computed from the loss function’s formula. Here is the complete implementation with detailed explanations:

def svm_loss_naive(
    W: torch.Tensor, X: torch.Tensor, y: torch.Tensor, reg: float
):
    """
    Structured SVM loss function, naive implementation (with loops).

    Inputs have dimension D, there are C classes, and we operate on minibatches
    of N examples. When you implment the regularization over W, please DO NOT
    multiply the regularization term by 1/2 (no coefficient).

    Inputs:
    - W: A PyTorch tensor of shape (D, C) containing weights.
    - X: A PyTorch tensor of shape (N, D) containing a minibatch of data.
    - y: A PyTorch tensor of shape (N,) containing training labels; y[i] = c means
      that X[i] has label c, where 0 <= c < C.
    - reg: (float) regularization strength

    Returns a tuple of:
    - loss as torch scalar
    - gradient of loss with respect to weights W; a tensor of same shape as W
    """
    dW = torch.zeros_like(W)  # initialize the gradient as zero

    # compute the loss and the gradient
    num_classes = W.shape[1]
    num_train = X.shape[0]
    loss = 0.0
    for i in range(num_train):
        scores = W.t().mv(X[i])
        correct_class_score = scores[y[i]]
        for j in range(num_classes):
            if j == y[i]:
                continue
            margin = scores[j] - correct_class_score + 1  # note delta = 1
            if margin > 0:
                loss += margin
                # Compute the gradient of the SVM term of the loss function
                dW[:, j] += X[i]  # Increase the weight for incorrect class
                dW[:, y[i]] -= X[i]  # Decrease the weight for correct class

    # Right now the loss is a sum over all training examples, but we want it
    # to be an average instead so we divide by num_train.
    loss /= num_train
    dW /= num_train  # Average out the gradient as well

    # Add regularization to the loss.
    loss += reg * torch.sum(W * W)

    # Compute the gradient of the loss function w.r.t. the regularization term
    dW += 2 * reg * W  # Gradient of regularization term

    return loss, dW

Explanation:

  1. Gradient Calculation During Loss Computation:

    • When the margin is greater than 0, indicating a violation of the SVM constraint, the loss is incremented by the margin.
    • At the same time, the gradient is updated:
      • For the incorrect class ( j ), we increase the gradient by ( X[i] ) because increasing ( W[:, j] ) increases the score for the incorrect class.
      • For the correct class ( y[i] ), we decrease the gradient by ( X[i] ) because increasing ( W[:, y[i]] ) decreases the score for the correct class.
  2. Averaging the Loss and Gradient:

    • The computed loss is divided by the number of training examples to get the average loss.
    • Similarly, the gradient ( dW ) is divided by the number of training examples.
  3. Regularization:

    • The regularization term ( reg * \sum(W * W) ) is added to the loss.
    • The gradient of the regularization term with respect to ( W ) is ( 2 * reg * W ), which is added to ( dW ).

This method ensures that the gradient is correctly computed simultaneously with the loss, making the implementation more efficient and straightforward.
GPT4o’s answer

@TMosh yes indeed, but yet this is my case. @vdt yes we can compute the loss while iterating each batch of training samples(backprop ?!). The output of the implementation you provided (by chatGPT) doesn’t match the gradient check. And I’m trying to do as described in the class

Which class uses SVM?

sorry for the confusion, I mean during the lecture, not class like

public class AClass{}