Data provider - how to build on the fly

I have audio files that I want to randomly select portion from and feed those as input for training/testing. I know how to create a matrix and than store and use this. But I was wondering if it would be possible to generate those batches on the fly, saving disc space.
I am looking for resources on learning how to do this.
Could you please links about this topic?
Thanks a lot.

You should check tensoflow datasets TFDS, this can help you in creating datasets for ML pipelines.