C5W3: Is real-time detection really possible?

wong-1994 · September 11, 2021, 11:23am

In ‘assignment 2: trigger word detection’, I noticed following statement:

However, the trigger word detection system that we’ve trained in this assignment can only take 10-second-window audios as input, which means we can only get the input and output every 10 seconds anyway.

And only then can we figure out when we said the trigger word (where those ones are).

So how can we “detect the trigger word almost immediately after it is said” as the statement above has implied.

reinoudbosch · September 11, 2021, 7:09pm

Hi wong-1994,

If you listen to one of the audio snippets, you will notice that there can be a few seconds of other sound after the trigger word was said. So it does make a difference if the system responds immediately or not.

But the statement can also be taken as a more general explanation of why in a case such as this, a unidirectional RNN makes more sense than a bidirectional RNN.

I hope this clarifies.

wong-1994 · September 12, 2021, 1:15am

Hi Bosch,

Thanks for replying. But I’m still confused .

If you’re talking about the chime sound, I did hear that. But still, the chime sound was manually added after the output is generated.

To my understanding, the pipeline is:

Get the 10-second-window audio;
Input the audio to trigger word detection system;
Get the output;
Find out where the “ones” are located, then add the chime.

But in real world, let’s assume the device starts to listen at 0 second, and I say “activate” at the 5th second. The device has to wait until the 10th second to get the right-shape-input.

And although later it can decide when I said “activate” during these 10 seconds, it’s just impossible to go back in time and then create a chime at the 5th second.

So in real world, how can a device react just after the trigger word?

reinoudbosch · September 12, 2021, 1:46am

Hi wong-1994,

You are right that this is not implemented in the assignment. You can have a look here to see an example of how to extend this to real time trigger detection.

Topic		Replies	Views
Question on trigger word detection Sequence Models	1	538	April 21, 2022
Real-time Trigger Word Detection Sequence Models	1	497	April 28, 2022
C5_W3_A2 Question about the architecture Sequence Models week-3	7	22	September 3, 2024
C5W3 - Missing intuition on positive dataset marking with trigger word detection Sequence Models week-3	14	209	June 1, 2024
DLS - Course 5 - W3 - Trigger Word Detection Sequence Models	6	541	April 26, 2023

C5W3: Is real-time detection really possible?

Related topics