[Week3: Trigger Word Detection] modelf architecture

Hey guys!

I’m unsure if I understand the architecture of modelf in the Trigger Word Detection correctly: After the convolutional layer, each sample consists of 1375 time steps, with each time step having 196 features.
After that, batch normalization is applied with respect to the last axis (i.e. the feature axis). So, normalization on the feature axis is applied, using the same parameters for each time step, right?
And regarding Dropout: I understand the dropout within a normal DNN layer (some units of the layer are ‘switched off’), but I don’t understand it in this context. I think for each sample, 20% of the 1375*196 signals are just not passed to the next step. But it’s just 20% of the signals in total, and not paying attentention that it’s evenly distributed and catches 20% of the 1375 time steps and 20% of the 196 features respectively, correct?

Best, Elke

1 Like

Do you still have a question on this topic?