Hello everyone,
This is a practical question, not a theoretical one
In the course, Mr. Ng explains that you can combine clean audio input with car noise to have more realistic input of voice recognition in a car.
This made me wonder about the generalization abilities of deep learning algorithm.
Surely, if you are in car from the 70s, a recent and expensive car and an electric car, the audio input captured by the mic of the surrounding noise will be very different. Actually, even the type of microphone will generate vastly different input. The audio input captured by a bad mic will probably be as easy to comprehend for a human as the input from a good one (because mics are designed to the human ears). However, for an AI, the two audio inputs might be completely dissimilar.
Therefore, I have two questions:
- How well can an AI trained with only surrounding songs from a 70s car with a bad mic can generalize the learning to a good mic in an electric car? And vice versa?
- If it sounds similar to the human, does it mean that it will also sound similar to the AI? Or in the opposite, one should be careful of thinking that things are similar when they are actually only similar for us and not an AI?
Best regards