Why we need to add the padding as possible output in Text generation Model

nitulkukadia · December 18, 2023, 11:42am

See the example here:

https-deeplearning-ai/tensorflow-1-public/blob/main/C3/W4/ungraded_labs/C3_W4_Lab_1.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/https-deeplearning-ai/tensorflow-1-public/blob/master/C3/W4/ungraded_labs/C3_W4_Lab_1.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4VKWPfcBiYZO"
   },
   "source": [
    "# Ungraded Lab: Generating Text with Neural Networks\n",
    "\n",
    "For this week, you will look at techniques to prepare data and build models for text generation. You will train a neural network with lyrics from an Irish song then let it make a new song for you. Though this might sound like a more complex application, you'll soon see that the process is very similar to the ones you've been using in the previous weeks. Only minor modifications are needed. Let's see what these are in the next sections."
   ]
  },

This file has been truncated. show original

# Define the total words. You add 1 for the index 0 which is just the padding token.
total_words = len(tokenizer.word_index) + 1

And this as well:

# Ignore if index is 0 because that is just the padding.
if predicted != 0:

jyadav202 · December 18, 2023, 2:15pm

@nitulkukadia thats a good question!
Padding is a preprocessing step that needs to be done with all the input sequences. Usually we pad the sequence to the length of the longest sequence in the input. This padding is usually represented with zeros in the sequence and so we need to reserve 0 as a padding token while creating our dictionary. This is similar to when we also reserve special tokens like [UNK] (unknown), [SOS] (start of sequence), [EOS] (end of sequence) etc, in our dictionary depending on our need.

nitulkukadia · December 19, 2023, 5:37am

Hi @jyadav202 Thanks for your prompt response.

With respect to dictionary, we need and entry for padding as we are passing those with sentences.
But why do we need this in dense layer, which predicts the possibility of next word, here there is no need o predict the possibility of the padding right ?

Sample code in git repository:
# Build the model
model = Sequential([
Embedding(total_words, 64, input_length=max_sequence_len-1),
Bidirectional(LSTM(20)),
Dense(total_words, activation=‘softmax’)
])

I am thinking we should change Dense layer to total_words -1

# Build the model
model = Sequential([
Embedding(total_words, 64, input_length=max_sequence_len-1),
Bidirectional(LSTM(20)),
Dense(total_words - 1, activation=‘softmax’)
])

jyadav202 · December 19, 2023, 1:03pm

Hi @nitulkukadia! thanks for your question.

A more technical reason for not doing total_words-1 is:
The Dense layer needs to have the same dimension as much as we have categories (ys.shape) otherwise the model won’t fit the data. Since this is a multiclass problem, we created one-hot array (ys) and later the model with categorical_crossentropy to compute loss. They both need to have same shapes.

Because of the above reason, padding tokens will be assigned a probability during next word prediction. To avoid appending that to our original input, we do if predicted != 0 and skip them.

This assignment is a primitive way of doing next-word prediction as you can see that the predictions are not good. For this, we need other mechanisms like having a special end of sequence [EOS] token.

nitulkukadia · December 20, 2023, 10:36am

Thanks @jyadav202, I am good with the clarification provided.

Topic		Replies	Views
UNQ_C6 I don't understand the overall concept NLP with Attention Models week-module-1	4	477	August 6, 2023
Difference sequence other than the expected output Natural Language Processing in TensorFlow week-module-1	1	360	October 25, 2023
Module2 Graded Lab1 padding and eos tokens Fine-tuning & RL for LLMs: Intro to Post-training week-module-2 , dl-ai-learning-platform	13	106	December 19, 2025
Embedding layer output for the padding token Advanced Computer Vision with TensorFlow week-module-3	1	520	January 20, 2022
Get_padded_sequences Natural Language Processing in TensorFlow week-module-1	6	570	December 23, 2022

Why we need to add the padding as possible output in Text generation Model

Related topics