How is LSTM connected with image captioning?

someone555777 · July 17, 2023, 2:25pm

Is it something like OCR systems? If yes, I don’t understand how it is connected with LSTMs too

o

nilosreesengupta · July 17, 2023, 4:12pm

LSTM is used as the decoder in an image captioning system .
LSTM , as a decoder, takes the features extracted from an image by a convolutional neural network (CNN) as input and generates a sequence of words that is described by the image.

With regards,
Nilosree Sengupta

someone555777 · July 17, 2023, 8:51pm

So, ok, as I thought it is something like OCR. But why is specially LSTM? As I understand any model can be used after CNN, isn’t it?

nilosreesengupta · July 17, 2023, 10:55pm

Hello @someone555777 ,

There are a list of reasons for preferring LSTMs :

LSTMs lcaptures long-term dependencies, which is important for captioning images because the context of the words around an image might affect its interpretation.
By capturing both short and long-term dependencies, LSTMs can interpret images better and produce appropriate descriptions.
It can handle sequences of arbitrary length.
It generates natural and fluent descriptions with correct grammar.
Better performance.

Hope this helps.

With regards,
Nilosree Sengupta

someone555777 · August 17, 2023, 2:00pm

so, is’t about when we have a lot of text on image?

Topic		Replies	Views
LSTM versus CNN for time series data Sequence Models coursera-platform	2	659	October 31, 2021
Can anyone clarify LSTMs to me? Better with an example AI Discussions ai-discussions	1	48	April 23, 2025
Why using LSTMs for NER NLP with Sequence Models week-module-3	1	519	November 30, 2022
Need help understanding Emojify network C5Wk2 l Sequence Models coursera-platform	6	365	September 26, 2023
Can we combine the CNN and the LSTM? Sequences, Time Series and Prediction week-module-1	2	558	April 10, 2022

How is LSTM connected with image captioning?

Related topics