C5W3: Trigger word exercise

zhexingli · May 13, 2023, 6:59am

Near the end of the exercise, past all the required grading cells, there’s one part of the code that creates a chime sound over the audio whenever it detectes an “activation” word. The instruction says we want to insert a chime sound at most once every 75 steps to avoid having two sounds per one activation. But in the code it only counts “consecutive steps” up to 20 time steps. Why 20 here? Earlier in the exercise we coded output 1 for 50 time steps when there’s an activation. Doesn’t set a 20 time steps here we could potentially get 2 sounds per activation? I’m a little confused by this and appreciate any help here. Thanks.

balaji.ambresh · May 16, 2023, 4:47am

Thanks for bringing this up. The staff have been notified regarding the markdown hinting that exactly 1 chime should be present in a 75 timestep output window.

zhexingli · May 16, 2023, 5:56am

Thanks. It just occured to me, that maybe the 20 consective steps in the given code is applied to the output, which has a different dimension than the input? Maybe it’s the conversion happening here?

balaji.ambresh · May 16, 2023, 6:12am

I don’t follow you. Please elaborate keeping this markdown text in mind:

So we will insert a chime sound at most once every 75 output steps

zhexingli · May 16, 2023, 6:29am

The input and output to the CNN are of different dimensions. So for example, maybe because the input is of dimension (1,5511) then 75 output steps should be just 75 steps in the dimension, no problem. But the output from the CNN has a reduced dimension of 1375, so 75 steps in the 5511 input dimension is actually ~18 steps in the output dimension, which is close to the 20 steps in the code? This is my guess, not 100% sure.

balaji.ambresh · May 16, 2023, 7:00am

That doesn’t sound like the correct interpretation. If your interpretation is correct, then we have to check for consecutive_timesteps == 19 or 18 instead of consecutive_timesteps > 20.

Per my understanding, 20 was chosen as a threshold via experimentation since it’s possible that the model doesn’t predict exactly 50 consecutive 1s after end of the trigger word. To reduce the number of chimes, one chime in a window of 75 output steps was used.

I’ve asked other mentors to comment on this topic as well.

zhexingli · May 16, 2023, 8:00am

Thanks. Let’s see what other mentors say about this.

rmwkwok · May 16, 2023, 8:44am

Hi @zhexingli,

{Edited}

To begin with, I think we all agree that there are 1375 output steps.

With the current version of the code, the number 20 carries two meanings:

A. threshold for identifying it as an activation signal and thus adding a chime
B. minimum time interval between two chimes.

We can set both of them to the same value (A = B), or have B > A. In the current version of the code, it uses A = B = 20.

If you experiment with the code by changing the number 20 into 75, no chime sound will be added to the first example immediately after, so 75 isn’t a good value for B (also supported by @balaji.ambresh’s comment) even if we wanted to cope with the description.

Therefore, we might keep A = B = 20, or change it to A = 20 and B = 75, and then make sure the text is consistent with the code.

The above is my comment, and since I didn’t make this part of the notebook, the course staff should have the final say of how to make what changes.

Your hypothesis would alter the meaning of output steps, however, it is a nice one because you won’t be able to propose any hypothesis if you have not thought about it, and I appreciate that

Cheers,
Raymond

balaji.ambresh · May 16, 2023, 11:07am

@rmwkwok
The model is trained to predict 50 consecutive 1s starting at end of trigger word. That would make checking for 75 consecutive 1s a poor choice. Thoughts? Could you please check the code on the ticket filed on the repo?

Thanks.

rmwkwok · May 16, 2023, 11:31am

Hello @balaji.ambresh

Agreed.

Both that solution and my replacing the description’s 75 with 20 can prevent from having double chime within a time interval. The difference is that that solution separates the time interval from the threshold to make them two configurable values (75 and 20 respectively), which is good.

Cheers,
Raymond

balaji.ambresh · May 16, 2023, 12:24pm

Given that the model is expected to predict 50 consecutive 1s for a trigger word, wouldn’t there be multiple chimes for the same trigger word if the model is well trained and we check just for consecutive_timesteps > 20?

rmwkwok · May 16, 2023, 12:36pm

@balaji.ambresh

Absolutely possible!

I think that the value for A (and B, if configurable to be different from A) (defined in this post) are empricial. Setting A > 50/2 sounds reasonable but it would better be determined experimentally.

It’s like adjusting the threshold for a logistic regression model to trade off between precision and recall - it’s empricial.

Raymond

PS: I have a feeling that your argument was going to support that solution in the ticket, and I totally agreed because having A = 20 and B = 75 will totally eliminate that possibility.

Topic		Replies	Views
C5W3 - Missing intuition on positive dataset marking with trigger word detection Sequence Models week-module-3 , coursera-platform	14	209	June 1, 2024
C5W3: Is real-time detection really possible? Sequence Models coursera-platform	3	568	September 12, 2021
C5W3: Trigger word detection detects "activation" Sequence Models coursera-platform	1	607	June 22, 2021
Sequence Models Week 3 Assignment 2 Question Sequence Models coursera-platform	3	387	August 22, 2023
DLS - Course 5 - W3 - Trigger Word Detection Sequence Models coursera-platform	6	545	April 26, 2023

C5W3: Trigger word exercise

Related topics