Reading accuracy

Hello, I need to evaluate and compare the accuracy of readings using recordings. Is it more advisable to transcribe speech to text and then evaluate with AI or evaluate the audio directly with AI?

What sort of recordings and readings are you referring to?

The reading aloud of a story.

There are lots of speech-to-text tools already.
Then you could run the text through other AI for analysis.

And what type of AI would be the most recommendable for comparing the accuracy of the readings?

What two things are you comparing?

I´m comparing two recordings of a reading.

I’ll assume that you first use voice-to-text to generate a transcript of the recording.

A quick internet search turned up this method:

Thank you, I’m going to check it.