Hey community. I was trying to build a custom keyword detector system for a personal project. I have been facing some issue with the model architecture . I converted the audio samples to spectrograms and passed on to a conv1d and rnn architecture but failing to get a decent accuracy. Could anybody please help with some article to learn and build a custom system from scratch.