i am doing a project on deepfake audio detection . can u please help me to modify a pre trained model for my purpose. plz help if anyone have experience for the same
Sure, could you please provide more details about the specific issue?
@Rachit_Rawal unfortunately I cannot think of a pretrained model to suggest to you-- However in the fourth course in the DL Specialization on convolutional networks, where we get into one-shot training for facial recognition, naively, might not similar methods be applicable for voice ‘deep fake’ recognition ?
However, at least my intuition suggests-- Well, ConvNets alone are nice, yet to really understand what is going on here you have to have a deep understanding in DSP (which I am only just venturing into, but this is basically what your ConvNets are).
You’d also have to think quite a bit (and how to collect-- I mean the most obvious data set would be a contrast between someone’s ‘real voice’ and then their ‘deep fake’ voice, reading the exact same statement. Not that you’d train the DeepFake network on that, you’d train it on a completely different utterance, but measuring on the same utterance would provide you with your differential or difference)–
Whom would actually give you that data ?
Unfortunately, even as these types of networks greatly improve in their efficiency, in part, I think we also need to use our old school flesh brains to make at least some measure of the distinctions-- Would ‘X, Y, Z’ person actually say that ?
At least, luckily, thus far, those that tend to leverage these technologies, without any sophisticated understanding behind them…
Well… Let’s just say they tend to be ‘not that smart’.