How to add new audio examples in a new label to an audio dataset?

What are the keys in doing it?
Should I care about file extension, recording length, frequency in generally?

speech_commands | TensorFlow Datasets