What are the keys in doing it? Should I care about file extension, recording length, frequency in generally?
speech_commands | TensorFlow Datasets