If the output is an audio file, how should it be processed?

Golden_Hwang · November 12, 2022, 7:11pm

I’m trying to get an audio file with two numeric inputs (A) and an output (B).
(I’m not trying to make a TTS.)
I plan to use only 2 or 3 neural networks.

When trying to do machine learning, how should the audio file be processed and presented? Can I just put the file in?

Note: I completed the course https://www.coursera.org/learn/machine-learning-course/ on Coursera.

Juan_Olano · November 12, 2022, 7:39pm

Hi @Golden_Hwang ,

Welcome to the community! This is your first post

Regarding your question: “how should the audio file be processed and presented? Can I just put the file in?”

I am a beginner in audio, so all I can do right now is give you some general ideas of what you are trying to do.

First, we need to know that the audio files cannot be input directly into the model.

Just as with images, the audio files need to go through a series of transformations before they are given to the model.

If you think of the images case, we take the image, reshape it to a vector with the pixel values, then normalize the vector values and may be crop the vector to get a fixed size, etc.

With audio we have to follow a similar process of transformations:

We read the audio file and reshape it to have a standard shape, like 1 audio channel or 2 audio channels.
Then we standardize the sampling rate to have the same rate of Hz. This means for instance to have the same array size per second of audio.
Next, resize each array so that all arrays have the same lenght, by either truncating or padding with zeroes.
Finally we create a spectrogram to capture the main features of the audio
Before and/or after the spectrogram, we can do some data augmentation

At this point we have an audio file transformed and ready to be fed to the model.

So this is, in general terms, how you’d prepare an audio file to be used by a model.

I hope this gets you started. As I learn more about this specific topic, I can be sharing more about it.

Good luck in your project! Please share any findings!

Juan

Golden_Hwang · November 12, 2022, 7:55pm

Thank you for your kind and detailed reply.

Are there any Coursera courses where I can practice on this?
The process seems more complicated than I thought…

Juan_Olano · November 12, 2022, 8:03pm

@Golden_Hwang ,

Yes, I was also surprised about the requirements to process audio. In fact, even now that I know a bit more, I still struggle with my specific use case.

I could not find this information in Coursera. One of the best sources of information I have found is this one:

https://towardsdatascience.com/audio-deep-learning-made-simple-sound-classification-step-by-step-cebc936bbe5

You’ll find a step-by-step guide right there.

I hope this helps

Juan

Golden_Hwang · November 12, 2022, 11:15pm

Thank you so much again.

Topic		Replies	Views
AI cannot do well on a training set AI Discussions ai-discussions	0	61	July 31, 2023
C2_W3_Transfer Learning Advanced Learning Algorithms week-3	6	183	April 12, 2024
Album breaking application AI Discussions	3	55	January 3, 2023
Jazz Improvisation - How Chords Are Represented? Sequence Models	4	545	September 19, 2022
Week 1 General Questions on Lab 2 Machine Learning Modeling Pipelines in Production	1	524	July 26, 2022

If the output is an audio file, how should it be processed?

Related topics