How to measure the performance of an RNN model

I have replicated the dinosaur name generation assignment using Keras. The model I use is defined by the following function,

def create_model(n_x, n_a):

    x = tf.keras.layers.Input(shape=(None, n_x))

    a, c = tf.keras.layers.GRU(n_a, return_sequences=True, return_state=True)(x)
    y = tf.keras.layers.Dense(n_y, activation='softmax')(a)

    model= tf.keras.Model(x, y)

    return model

I train it like this,

opt = tf.keras.optimizers.Adam(learning_rate=0.01, beta_1=0.9, beta_2=0.999)

model.compile(
    optimizer=opt, loss="categorical_crossentropy", metrics=["accuracy"]
)

model.fit(
    X_train,
    Y_train,
    batch_size=128,
    epochs=1000
)

Although I get accuracies in the range of 0.25 after the 1,000 epochs, the model generates plausible dinosaur names.

My questions are,

  • Since I am training a GRU layer, is Keras measuring the accuracy on the whole sample sequences (i.e. complete names) as opposed to character by character? Is this the reason for the low accuracy I get?
  • What would be the appropriate metric to measure the performance of this model? Accuracy does not seem to be a good one. My intuition is that the performance should be measured with something like the likelihood that the generated names belong to the same distribution as the training set.

Thanks in advance.

Training an RNN involves minimizing the errors in the character predictions on the training set. The metric used is the loss (not the accuracy).