Hi,
I have a question on the provided implementation of generate_n_chars()
, which is using generate_one_step()
to generate multiple output characters based on the input.
The code is
def generate_n_chars(self, num_chars, prefix):
"""
Generate a text sequence of a specified length, starting with a given prefix.
Args:
num_chars (int): The length of the output sequence.
prefix (string): The prefix of the sequence (also referred to as the seed).
Returns:
str: The generated text sequence.
"""
states = None
next_char = tf.constant([prefix])
result = [next_char]
for n in range(num_chars):
next_char, states = self.generate_one_step(next_char, states=states)
result.append(next_char)
return tf.strings.join(result)[0].numpy().decode('utf-8')
I think I implemented the generate_one_step()
correctly, but getting error from generate_n_chars()
.
When executing generate_one_step()
:
tf.random.set_seed(272)
gen = GenerativeModel(model, vocab, temperature=0.5)
next_char, states = gen.generate_one_step(" ")
print(next_char)
print(states.shape)
=>
tf.Tensor([b'h'], shape=(1,), dtype=string)
(1, 512)
But when executing the generate_n_chars()
:
tf.random.set_seed(272)
gen = GenerativeModel(model, vocab, temperature=0.5)
print(gen.generate_n_chars(32, " "), '\n\n' + '_'*80)
=>
...
ValueError: Exception encountered when calling layer 'grulm_3' (type GRULM).
in user code:
File "/tmp/ipykernel_14/3241785185.py", line 46, in call *
x, states = self.gru(x, initial_state=states, training=training)
File "/usr/local/lib/python3.8/dist-packages/keras/src/layers/rnn/base_rnn.py", line 626, in __call__ **
return super().__call__(inputs, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.8/dist-packages/keras/src/engine/input_spec.py", line 235, in assert_input_compatibility
raise ValueError(
ValueError: Input 0 of layer "gru_3" is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (1, None, None, 256)
Call arguments received by layer 'grulm_3' (type GRULM):
• inputs=tf.RaggedTensor(values=tf.RaggedTensor(values=Tensor("string_lookup/Identity:0", shape=(None,), dtype=int64), row_splits=Tensor("StringsByteSplit/RaggedFromValueRowIds/RowPartitionFromValueRowIds/concat:0", shape=(None,), dtype=int64)), row_splits=Tensor("RaggedExpandDims/RaggedFromUniformRowLength/RowPartitionFromUniformRowLength/mul:0", shape=(2,), dtype=int64))
• states=None
• return_state=True
• training=False
My guess is there must be some type mismatch on next_char
when feeding it back to the model, but not sure if this fix should be added on the provided code or I need to make a fix on generate_one_step()
.
Any help would be greatly appreciated!!