Week 3 (Trigger Word detection): Why do we need to balance the dataset with 50 positive labels?

diogosnows · December 1, 2021, 7:56pm

I am trying to understand the negative effects of not balancing the dataset with extra positive labels.
To me it sounds like it could introduce complexity (e.g. what if the audio stream ends before the 50 clicks?)

I assume without these extra labels, the model would learn to predict 0 all the time as the number of False negatives would be very low compared to True negatives, but I was wondering if this would be better solved by tweaking the cost function to place a high cost on False negatives, instead of creating a data requirement (data contract) that can lead to inconsistencies.

Am I missing other important reasons to balance the labels?

Dyxuki · December 2, 2021, 8:44pm

I assume you are talking about inserting ones?
It actually doesn’t matter that much if the stream ends before.

The thing is here we are working with sequential models, so basically you should see the whole thing as sequences instead of punctual predictions.
Also I guess having just only one 1, makes your algo not sensitive at all. Most of the time the trigger word will be missed, and for this application it would be bad.
The follow may not be totally accurate, but hope can give you a some intuition:

You may visualize the signal as “moving average”, and as long as the trigger word is inside of this “moving average span”, it should predict one.
Another way is to see those 50 as the max “delay” you want the your model to detect the trigger word, after “hearing it”.

diogosnows · December 3, 2021, 6:30pm

You were on spot! Thanks. I like how you explained the intuition, especially the moving average.

Topic		Replies	Views
C5W3A2 Trigger word Detection - Why more positives? Sequence Models coursera-platform	1	420	July 14, 2023
Which model and what dataset to be used for keyword spotting? AI Discussions ai-discussions , data-centric , project	11	400	June 12, 2024
C5W3 - Missing intuition on positive dataset marking with trigger word detection Sequence Models week-module-3 , coursera-platform	14	209	June 1, 2024
Sequence Models Week 3 Assignment 2 Question Sequence Models coursera-platform	3	387	August 22, 2023
DLS - Course 5 - W3 - Trigger Word Detection Sequence Models coursera-platform	6	545	April 26, 2023

Week 3 (Trigger Word detection): Why do we need to balance the dataset with 50 positive labels?

Related topics