Speech to Speech system

milindkopi · January 24, 2025, 11:10pm

Hi:
I am trying to build a Speech to Speech system (Take in Audio from microphone → Convert Speech to Text → Analyze Text → Generate Response → Convert Text to Speech). Has anyone tried something like this? Any tips on how to get started? I was planning on using Whisper and OpenAI GPT 4o for the text analysis. But would love to get tips on any other system I should use.
Thanks,

Samer_Attrah · January 24, 2025, 11:38pm

Hello,

I am just curious, what type of analysis do you plan to do on the text?

milindkopi · January 25, 2025, 1:08am

Its to simulate a conversation. So responding to a question for example.
Looking for any examples of similar projects that anyone may have done.

Samer_Attrah · January 25, 2025, 1:40am

I don’t have an example because my work is mostly on computer vision and not speech or text, but I would argue, why won’t you use a speech model?

A few days ago I had a nice conversation with Microsoft Copilot and it takes audio and gives audio as output, so if you use an API of such a model, it would be less complex for you to implement something.

I don’t know about Whisper or GPT 4o if they have audio-to-audio but I am sure there are multi-modal models. I suggest checking HugginFace for that.

I hope that helps.

Topic		Replies	Views
Creating Personalised AI assistant that can ans my questions and assist me to do some great things ChatGPT Prompt Engineering for Developers	1	103	August 16, 2023
STT for stroke victims and unrecognisable speech AI Discussions ai-discussions , project	7	52	September 9, 2024
Speech to text - Open models for transfer learning AI Discussions	1	49	May 18, 2023
Project idea help AI Discussions	1	50	December 22, 2023
I want to make a vioce to voice translation app in multiple language in real time AI Discussions ai-discussions , careers , data-centric	2	161	April 21, 2024

Speech to Speech system

Related topics