I am trying to understand the negative effects of not balancing the dataset with extra positive labels.
To me it sounds like it could introduce complexity (e.g. what if the audio stream ends before the 50 clicks?)
I assume without these extra labels, the model would learn to predict 0 all the time as the number of False negatives would be very low compared to True negatives, but I was wondering if this would be better solved by tweaking the cost function to place a high cost on False negatives, instead of creating a data requirement (data contract) that can lead to inconsistencies.
Am I missing other important reasons to balance the labels?
I assume you are talking about inserting ones?
It actually doesn’t matter that much if the stream ends before.
The thing is here we are working with sequential models, so basically you should see the whole thing as sequences instead of punctual predictions.
Also I guess having just only one 1, makes your algo not sensitive at all. Most of the time the trigger word will be missed, and for this application it would be bad.
The follow may not be totally accurate, but hope can give you a some intuition:
You may visualize the signal as “moving average”, and as long as the trigger word is inside of this “moving average span”, it should predict one.
Another way is to see those 50 as the max “delay” you want the your model to detect the trigger word, after “hearing it”.