How does text to music training?

hello. I am a developer living in Korea.

Recently, I see a lot of models that create music by entering the prompt.
I’m curious about how these models are trained and work.

I tried to search and find out how these models work, but all the blogs were advertisements promoting them.

If anyone knows how these models work, please recommend an article or blog to me.


Is there answer for your question?

no… help me