I love the music from the 1970s, but suffer from the poor sound quality.
AFAIK the original multi-track recordings are lost in most cases.
So I wonder if AI would be able to:
- audio stem splitting to recover the original individual voice and instrument tracks;
- nice to have: write a lead sheet and/or tabular;
- let the model replace poor microphones with modern ones (for voice and acoustic instruments) so that the track sounds as if it was recorded with modern equipment;
- electrical instruments are ideal sources and do not need to be replaced;
- remaster to stereo, stereo for dummy head and surround sound.
My issues are:
- is there even a market for this? The people listening to the 1970s music are currently dying out;
- currently available audio stem splitters, even AI based, perform well for voice and drums, but poorly for guitar and keyboards;
- What would the neural network look like? Convolutional and recursive? A classification model where the individual tracks are an intermediate result?
- The number of possible classifications is endless: think of an orchestra, all those different instruments, or choirs; should individual artists be recognised, too? Or famous instruments like a Stradivari? What about sound effects? Ambient sounds (wind, rain, leaves, audience, noise)?
- data preparation is a nightmare, as all those different categories have to be added for each and every audio sample;
- Another hurdle is the new recording of the instruments. Everyone can beat on a drum, but not necessarily make it sound like the original artist;
- One upside: The amount of training data available is immense.
Bottom line: it will be hugely expensive and take decades to complete, and there is probably not even a market for it. But it would be so awesome to hear Bessie Smith as the original live audience did, or your favourite band in surround sound.
What do you think?