NLP C3 W1 Assignment E6 GenerativeModel

Hi,

I have a question on the provided implementation of generate_n_chars(), which is using generate_one_step() to generate multiple output characters based on the input.

The code is

    def generate_n_chars(self, num_chars, prefix):
        """
        Generate a text sequence of a specified length, starting with a given prefix.

        Args:
            num_chars (int): The length of the output sequence.
            prefix (string): The prefix of the sequence (also referred to as the seed).

        Returns:
            str: The generated text sequence.
        """
        states = None
        next_char = tf.constant([prefix])
        result = [next_char]
        for n in range(num_chars):
            next_char, states = self.generate_one_step(next_char, states=states)
            result.append(next_char)

        return tf.strings.join(result)[0].numpy().decode('utf-8')

I think I implemented the generate_one_step() correctly, but getting error from generate_n_chars().

When executing generate_one_step():

tf.random.set_seed(272)
gen = GenerativeModel(model, vocab, temperature=0.5)

next_char, states = gen.generate_one_step(" ")
print(next_char)
print(states.shape)

=>

tf.Tensor([b'h'], shape=(1,), dtype=string)
(1, 512)

But when executing the generate_n_chars():

tf.random.set_seed(272)
gen = GenerativeModel(model, vocab, temperature=0.5)

print(gen.generate_n_chars(32, " "), '\n\n' + '_'*80)

=>

...

    ValueError: Exception encountered when calling layer 'grulm_3' (type GRULM).
    
    in user code:
    
        File "/tmp/ipykernel_14/3241785185.py", line 46, in call  *
            x, states = self.gru(x, initial_state=states, training=training)
        File "/usr/local/lib/python3.8/dist-packages/keras/src/layers/rnn/base_rnn.py", line 626, in __call__  **
            return super().__call__(inputs, **kwargs)
        File "/usr/local/lib/python3.8/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
            raise e.with_traceback(filtered_tb) from None
        File "/usr/local/lib/python3.8/dist-packages/keras/src/engine/input_spec.py", line 235, in assert_input_compatibility
            raise ValueError(
    
        ValueError: Input 0 of layer "gru_3" is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (1, None, None, 256)
    
    
    Call arguments received by layer 'grulm_3' (type GRULM):
      • inputs=tf.RaggedTensor(values=tf.RaggedTensor(values=Tensor("string_lookup/Identity:0", shape=(None,), dtype=int64), row_splits=Tensor("StringsByteSplit/RaggedFromValueRowIds/RowPartitionFromValueRowIds/concat:0", shape=(None,), dtype=int64)), row_splits=Tensor("RaggedExpandDims/RaggedFromUniformRowLength/RowPartitionFromUniformRowLength/mul:0", shape=(2,), dtype=int64))
      • states=None
      • return_state=True
      • training=False

My guess is there must be some type mismatch on next_char when feeding it back to the model, but not sure if this fix should be added on the provided code or I need to make a fix on generate_one_step().

Any help would be greatly appreciated!!

I just found that after modifying the line to

            next_char, states = self.generate_one_step(next_char[0], states=states)

it seems it’s working as expected. Please let me know if this was the intended code.

without that [0] u have a shape mismatch right - a little tricky because its outside of start end code so it likely could have addressed earlier but i think its legit as long as u get the “why”