In the “compute_gradient_logistics” function definition, the outer loop is m (# of examples) and inner loop is n (# of features). Given the definition of the gradient in (2), shouldn’t the order be flipped, outer loop is on n and inner loop is on m?

Hi @EagleEdge ,

Welcome to the community.

Let me try to answer with some examples to create intuition:

As you said, ‘m’ is the number of samples, and ‘n’ is the number of features in each sample.

If we were going to invert the loops as you propose, then we would traverse each feature of every sample. For instance, if the dataset is of cats, and the first feature is ‘Has pointy ears’, and we have 1000 samples, we would travers the 1000 samples of the ‘has pointy ears’ feature at once, right?

Thinking it that way, what do you learn of each sample? is it a cat or not? would not you need to see at all the features of a given sample to try to determine if this is a cat or not?

Lets say we as humans go through the same process. I receive, as a human, a list of 1000 yes or no on an attribute called ‘Has sharpy ears’. Me, as a human, could determine based on that if each one is a cat or not?

As a human I would need to see at all the features at once, for each sample, to determine if that’s a cat or not. Same with the machine model.

Does it make sense?

Hi Juan,

Thank you for the example. Yes, the example makes perfect sense. My struggle is more on the side of python implementation of (2). I tried to write out what the given python code does for dj/dw_j and I had different mathematical expression than what was given in (2). Am I missing something here?

Never mind. I got it now. Thanks.

1 Like

Excellent! Nothing better than finding the solutions ourselves