Understanding the loss of this many to many architecture with LSTM layer

Kalana_Induwara_Wije · March 31, 2022, 8:05am

I believe I have a problem understanding the basic structure of this architecture used in week 3 graded assignment that tackles NER.

When one input (array of shape (1, max_batch_length, embedding_dimensions)) is fed into the network, does every vector (corresponding to a word in the input) produce an output all the way through the dense layer and the logsoftmax layer?

If so, shouldn’t the labels also be of the shape (batch_size, max_batch_length, tag_map) ? How is every output that is a 17 dimensional

arvyzukai · April 1, 2022, 10:35am

Hi, @Kalana_Induwara_Wije .

This is a good question, let me help you to understand the shapes:
Let’s say
batch_size=5
max_len=30
embedding_dim=50
hidden_dim=50
Say your input X1 is of shape (5, 30) - 5 sentences with 30 words max and padded

you pass X1 to embedding layer:
you get emb_out of shape (5, 30, 50) # batch_size, seq_len, emb_dim
you pass emb_out to lstm layer:
you get lstm_out of shape (5, 30, 50) # batch_size, seq_len, hidden_dim
you pass lstm_out to dense layer:
you get dense_out of shape (5, 30, 17) # batch_size, seq_len, num_tags
you pass dense_out to logsoftmax:
you get logsoftmax_out of shape (5, 30, 17) # batch_size, seq_len, num_tags

Logsoftmax changes nothing regarding our predicted label, but we use our loss according to it, so I had to mention it.

So now to your question - you are right that model outputs the shape of batch_size x seq_len x num_tags but when we make a prediction (predict function), we use argmax on the last dimension (tags) and we get prediction of shape (5, 30)

Cheers!

Topic		Replies	Views
What is the dense layer for in week 3 assignment? NLP with Sequence Models week-3	3	575	April 11, 2022
Siamese network, sublayer output dimensions NLP with Sequence Models week-4	3	641	November 28, 2022
[Week 1] [Assignment 3] Conceptual understanding of model architecture Sequence Models	1	563	May 10, 2022
C5: W2: Emojify_v2: LSTM Layers Clarification Sequence Models	2	646	November 22, 2021
Question about Sequence Models Week 2 Emojify Sequence Models	2	371	September 14, 2023

Understanding the loss of this many to many architecture with LSTM layer

Related topics