C5W3 : Trigger word detection learning question

In the trigger word detection problem, i noticed that for the defined architecture there are 522425 trainable parameters. And that the load pre-trained model has been trained with only 4000 examples. I wonder why that training fits so well considering that 522425 >> 4000 (the number of unknown is about 2 order greater than the number of examples). Moreover in the first courses of deep learning specialization we were instructed that the number of examples should be very large for deep learning. Is it a special case for trigger word detection ? or is it usual for sequence model ? or is it not a deep learning example ?
Thanks in advance for your answer,

If you have more features than examples, it’s a ready-made situation for an over-fit solution. It will give a low training error, but won’t generalize to new data very well.

One reason the training set is small in the exercise: Using a bigger training set (which you would in a real application) would take an extremely long time.