C3 W1 A1 Ex7 Question about sum weights in compute_accuracy

drew_Frances · April 12, 2022, 12:34am

Hi:

In the compute_accuracy function, there is a line

Count the sum of the weights.

sum_weights =

I don’t quite understand what is being asked. Simply to sum the y_weights. Where does the count come in or is that comment strangely worded?

Thanks,
Drew

arvyzukai · April 12, 2022, 5:03am

hi, @drew_Frances

You are correct - what is being asked is what is the sum of the weights. And yes, the comment is strangely worded, I think it should have been “Calculate” instead of “Count”

obryan · September 24, 2022, 2:56pm

I agree. In line 27, it should have written for instance, “Count y_weights.”.

And in lines 36-37, “# Sum up the weighted correct predictions (of type np.float32), to go in the
# denominator.”. It should be “numerator”.

Cheers…

ajeancharles · August 2, 2023, 6:07am

What is this y_weights in compute_accuracy? I don’t remember seeing that in the lecture so far.
Isn’t this material information for the knowledge of this TRAX API?

arvyzukai · August 3, 2023, 6:28am

What is y_weights is explained
in the instructions:

y_weights contains the weights to give to predictions.

and in the function docstring:

y_weights: a n.ndarray with the a weight for each example

What do you find confusing about these statements?

ajeancharles · August 3, 2023, 4:11pm

What is the motivation for it? In the past, when we calculated the accuracy, we would take the “number of correct predictions” divided by the “total number of predictions.” What is the motivation for the weight? Are we saying that some test instances matter more than others?

What are the use cases?

arvyzukai · August 4, 2023, 6:13am

Yes, that is exactly the point of these weights.

The most obvious and common one is <pad> tokens, which usually have 0 weight, meaning the loss for predicting wrong or correct does not influence the model layers.

But there are also other use cases, depending on application, for example old data could be weighted less than “fresh” (stock market, etc.,), different data sources (like examples from less precise or reliable instruments might get down-weighted, or in NLP - “chat forum” data vs. Wikipedia), other subjective “expert” input (like data having higher variance might be less weighted because it might have outliers).

In other words, there are many cases when it’s useful to not treat every data point equally.

Cheers

ajeancharles · August 4, 2023, 11:01pm

Interesting, like a time-weighted average, where the present matters more than the far-away past.
Percentage uncertainty in the data from variance.

Is the idea formalized somewhere? It feels a bit Bayesian and Time-Series-like.

Thank you.

arvyzukai · August 5, 2023, 6:58am

I’m not sure what you mean by that. These (that I mentioned) are different use cases and the idea (different weights for different samples) is widely used. One of the earliest use of it that I remember predates Deep Learning an was/is used in Reinforcement Learning as a discount factor (for example in Q Learning), (The discount factor represents the preference for immediate rewards over future rewards; in simpler terms, receiving $1 today is preferred over receiving the same amount two years from now).

By the way, there’s another important use case I forgot to mention - class imbalance. This occurs when one or more classes are more frequent than others. In such cases, it becomes essential to weigh the loss differently for various samples, based on whether they belong to the majority or minority classes. Essentially, we want to assign a higher weight to the loss encountered by samples associated with the minor classes.

Cheers

ajeancharles · August 6, 2023, 1:20am

Thank you!

I am talking about math or algebra. E.g., let’s take the idea of a norm. Usually, we think of it as |x dot x|. A generalization might be |Ax dot x| for a linear operator A.

arvyzukai · August 7, 2023, 9:44am

It’s just simple element wise product (or a more fancy term - Hadamard product). For example, if the output is [0.9, 0.3, -2.1, 3.5] and the mask is [1, 1, 1, 0], the result after applying the mask would be [0.9, 0.3, -2.1]. You can easily extent this concept from vectors to matrices.

Topic		Replies	Views
C3_W1: Doubt in giving weights to predictions NLP with Probabilistic Models week-module-3	3	455	April 3, 2023
Things spotted in C3_W1_Assignment (shape error and compute accuracy denominator) NLP with Sequence Models week-module-1	1	486	May 23, 2023
General question - accuracy function Neural Networks and Deep Learning coursera-platform	4	538	January 9, 2022
Exercise 3 - Weighted Loss AI for Medical Diagnosis week-module-1	5	698	September 17, 2021
W1 assignment 1 AI for Medical Diagnosis week-module-1	2	616	November 23, 2021

C3 W1 A1 Ex7 Question about sum weights in compute_accuracy

Count the sum of the weights.

Related topics