Hi!
In lesson we saw that we would add ‘1’ a few times after the trigger word ‘attention’. Expanding on this concept, if I wanted to process more general audios, is it correct to say that the labeling of the training data would be to mark with an index each word of the audio file?
For instance, if the vocabulary is “I”, “AM”, “JANE”, and a dictionary would be, for instance, (1:“AM”), (2:“I”), (3:“JANE”), … the audio clip “I AM JANE” would be labeled:
00220000110000330
… I … AM … JANE…
Or labeling like this is only needed for trigger word cases?
Thanks!
Juan