AI4E - Week3 - Case Study: Smart Speaker

ashish1.sharma · January 9, 2025, 6:01pm

In video-2, AI4E - Week3 - Case Study: Smart Speaker, the step-2 is “Speech recognition” in which we are doing A->B mapping of an Audio file -to-> “tell me a joke” but my doubt here is a user can ask this or rephrase this statement in many different ways. So, here do we need to build the data and train the model with all the such possible statements having the same meaning “tell me a joke”. The same things is explained in the next step-3 i.e. “Intent recognition” that one thing can be asked in multiple ways. Please explain, I am not able to visualise how this will work.

ashish1.sharma · January 10, 2025, 6:48am

A follow up question on the above session only … the steps shared for smart speakers are:

Wakeword detection
Speech recognition
Intent recognition
Execute Joke

Can we also replace step-2 i.e. Speech recognition with

Wakeword detection (via A->B mapping)
NLP - Speech-to-text (to skip/avoid A->B mapping to get the text via Speech recognition)
NLP - Text translation to english if speech is in other language
Passed to extracted text to Intent recognition
Execute Joke

Is this also correct?

Topic		Replies	Views
Speech to Speech system AI Discussions ai-discussions	3	68	January 25, 2025
Code not able to catch user speech well Building AI Voice Agents for Production	6	104	May 13, 2025
Speech to text - Open models for transfer learning AI Discussions	1	53	May 18, 2023
Lets create ''Jarvis'' AI Discussions project	8	2366	February 10, 2024
STT for stroke victims and unrecognisable speech AI Discussions ai-discussions , project	7	59	September 9, 2024

AI4E - Week3 - Case Study: Smart Speaker

Related topics