UNQ_C9 About the next_symbol and the model

Elemento · July 13, 2023, 3:48pm

Hey @20020069_Le_Thai_S_n,
Welcome, and we are glad that you could become a part of our community

First, I would like to thank you for creating this thread. I also learnt something new, while trying to curate the answer to your thread.

This is because the model outputs the probabilities corresponding to each of the positions, i.e., if we have padded_length = 100, it will output the probabilities corresponding to each of the 100 positions (or tokens). And the model has been structured to do this exactly, since in order to compute the loss for each of the tokens during training, we need the probabilities corresponding to each of the tokens. I hope this makes sense now.

I don’t think it should be an issue. It’s quite analogous to CNNs being used on multiple examples simultaneously, during inference. We just need to make sure that the padded_length for each of the examples in a single batch is the same. And I guess that should be it.

As to the speed comparisons between transformers and CNNs, I am not sure whether this is a question that we should even think about. Note that CNNs are designed to exploit spatial information, which is not an attribute of any natural language application. On the other hand, sequence models and transformers are designed to exploit temporal information, which happens to be a key attribute of every natural language application.

If you really think this is a question of concern, feel free to train a CNN based architecture and a transformer based architecture for any natural language application, and you can decide for yourself, whether you want a faster CNN-based architecture with a huge drop in the performance or not. Please do share your results with the community.

Honestly speaking, I never thought about this at all prior to your question. I checked this article out, and in that, Text Summarization was included as an application for the Encoder-Decoder architecture. So, what are we missing here? Turns out @arvyzukai has already posted an answer to this question, which you can find here.

Let us know if this helps.

Cheers,
Elemento

Topic		Replies	Views
C4_W1_UNQ_C6 wrong ouput NLP with Attention Models week-1	3	522	March 26, 2023
Confused about padding in next_symbol implementation NLP with Attention Models week-1	2	387	August 1, 2023
C5W4 Questions after finish the course Sequence Models	5	264	December 30, 2023
Week 3, Neural_machine_translation_with_attention Sequence Models	2	582	July 3, 2021
Calling the model in UNQ_C9 : next_symbol NLP with Attention Models week-2	8	354	September 19, 2023

UNQ_C9 About the next_symbol and the model

Related topics