Error in Custom Trigger Word Detection

Hey guys. So i tried to do Trigger Word Detection. First, i tried the normal dataset from Coursera, and it succeed. Then i tried to do it with my own custom dataset, but somehow, it throws some error which you can see below.


I also have to actually put this in the code which is X = np.array(X, dtype=object), Y = np.array(Y, dtype=object) instead of X = np.array(X) and X = np.array(Y), because there are warning which is,
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify ‘dtype=object’ when creating the ndarray.

This is not recommended. You may be better to relook into your data.

Your data is so called “ragged” data. There are some inconsistencies in size, length, or whatever in your data, which do not fit to ndarray. It is also known as non-rectangular data, irregular matrix, etc.

If you uses the same model, it was created with these parameters.

Tx = 5511 # The number of time steps input to the model from the spectrogram
n_freq = 101 # Number of frequencies input to the model at each time step of the spectrogram

So, our data for training is (32, 5551, 101).

I think it is better to adjust your data, and input_shape for a model to fit to your objectives.

Hey, thanks for the reply. But how do i check my data? Most of them on average are 1 second. And also, for the background and negative, i use the example of the assignment files. So i didn’t have a custom data for background and negative.

Oh, it turns out that the shape of X is much different. The original one is (32, 5511, 101), but when i print(X.shape), it only printed (32, ). The thing that i’m confused is what is wrong? Because nearly my custom dataset has average of 1 second. Also, when i print for each x, it printed:


The values is either (101, 5511) or (101, 5998).

Can i know what software are you guys using to create the dataset? Because i tried everything, i tried having the same length as the activate dataset, i tried having almost the exact same sizes, but nothing works.

Hey @BryanEL,
The training samples are generated in the notebook only, as you must have seen already. Now, if we consider the raw audio samples, then I don’t think any specific software is required. You can use any audio recording software. If it supports the output files in .wav format, well in good, otherwise, you can use one of the many software to convert the extension from yours to .wav. Now, in the notebook, nothing has been mentioned regarding the specific recording software and/or the conversion software (if one is used), so, I don’t think it’s of much significance. Let me know if this helps.

Cheers,
Elemento