Dinosaur Island Char Model

I have coded up the solution for the assignment. However, when training the character level model the test won’t pass. I have checked my code over-and-over. The code looks good and produces the expected results. However, the training is giving the wrong dinosaur name. People can you assist.

Hi @Matt_Gerhold

Passing the unit tests is good but those tests are not the full test. There could be problem in the model training part of the code. If you can post the wrong names resulted from your training, it would help to diagnose what the problem might be.

Iteration: 22000, Loss: 19.746477


AssertionError Traceback (most recent call last)
1 parameters, last_name = model(data.split(“\n”), ix_to_char, char_to_ix, 22001, verbose = True)
----> 3 assert last_name == ‘Trodonosaurus\n’, “Wrong expected output”
4 print(“\033[92mAll tests passed!”)

AssertionError: Wrong expected output

Thanks Kic. I’ve attached the notebook in pvt message so you can see my training routine. I’ve double-checked but I can’t see why it would produce the wrong dino name.

Hi @Matt_Gerhold ,

The problem is in creating the list of input characters. It needs to prepend the list [none] in front of the list of input characters. Here is the instruction:

Create the list of input characters: X
  • rnn_forward uses the None value as a flag to set the input vector as a zero-vector.
  • Prepend the list [None] in front of the list of input characters.
  • There is more than one way to prepend a value to a list. One way is to add two lists together: ['a'] + ['b']

Thanks so much for the advice. :+1:

I ran into the same issue. What is the reason we need to prepend the [None] to the input? I thought that was just for the initial activations, since we of course don’t have activations for the first RNN cell. It also seems weird that the loss 19.7 (that you get without the [None] part) is less than the expected output’s loss of 22.7–shouldn’t the expected loss be better than the loss that the code gets without apprending [None]? It seems like it learns just fine without that?

To clarify, I had my algorithm set to be one letter ahead of x, but just skipping the first letter of x. So example:

X = ‘BigBoy’
Y = ‘igBoy’ + ‘\n’

as opposed to what the assignment wants:

X = None + ‘BigBoy’
Y = ‘BigBoy’ + ‘\n’

I suppose prepending Y with [None] and giving it all of X (plus newline) might give it more to train on about X. But it seems like we’re kind of giving the model a wrong example–i.e. we’re telling it “when X is all zeros, you should predict basically a random first letter of a dinosaur name.” Which I guess makes sense if we give it nothing to start with. Whereas, without that first letter of X, we’re giving it real examples that happen in the data. i.e. “when you see this letter after all these other letters, you should predict this letter.”

I suppose we’re teaching it realistic/common first letters of actual dinosaur names, so maybe that’s the answer to my question then–like it could learn that “x” is a petty uncommon first letter. And maybe it’s just kind of a coincidence that the loss got lower for this particular dataset/problem.

Hi @John_Schottler ,

The implementation note explained the reason for prepending None to the input vector :

Create the list of input characters: X
  • rnn_forward uses the None value as a flag to set the input vector as a zero-vector.

I couldn’t comment why your loss is less than the expected loss without knowing what is going on.

Hello @John_Schottler,

There are a few points that I would like us to consider:

  1. Although it is very premature to say because I have not done any serious experiment, it sounds reasonable that the loss for without-None is lower, because None is the first character and there is no pretext, so essentially we are asking it to wild guess the first letter and it can’t be better than wild guessing what number I will get by rolling a 26-side dice.

  2. I took a quick look of the lessons again and just realize that it seems evaluation method for generative model is not covered (I may have overlooked), which makes us have to resort to Googling. The point I want to make is, consider with-None and without-None as two different ways of modeling the problem, it might not be appropriate to compare their performance with their training losses. We probably want to establish a proper evaluation metric before making that comparison. So, up to what I have seen so far in this thread, I don’t think either with-None or without-None is indeed doing a better job.

  3. I don’t mean to ignore seeing your point. I believe your point is that the without-None approach is sufficient and reasonable, and you questioned why we want that None. @Kic showed us the assignment’s reason, though it is not about why None makes anything more superior. However, at least, we can tell everyone that, the with-None approach allows us to not to decide what the first character is when generating a new dinosaur name, and that is something your without-None approach can’t do.

  4. The existence of None has TWO effects. The first one has been covered by our conversation which is to denote the beginning of the first letter. The second one, which might have been missed out, is that since some RNN (like LSTM) can “remember” some pretext, it allows the process of name generation to realize “Oh, I still remember None, we are perhaps generating the first few characters of a name, and maybe there is some pattern to follow because of such pretext”. Therefore, the role of None may not limit to only as an input in the current timestep, but also influencing as pretext in the future timesteps.

Therefore, if I am to claim that the names generated by my model is 100% from the model, including the first letters, then my model is a with-None model. However, if I have a proof that getting rid of None will improve my evaluation metric (which is definitely not training loss), then I probably should go for a without-None model.


Yeah, my last comment was basically me figuring out your main points here, but I appreciate the extra details. It makes sense now why the none is passed in–so the model can learn how to start a dinosaur name like a pro, rather than guess completely randomly pretty much. Thanks for taking a look.

Ah, the loss was reducing quicker because it’s a simpler problem to fit, so that makes sense. Each X example we train our RNN cell on has fewer data points to train on.

This probably also resulted in the algorithm learning the way that dinosaur names end relatively quickly, so the results looked decent, but training the correct algorithm for longer would likely produce better results. Whereas continuing to train this one would likely result in good endings, but poor beginnings of dinosaur names.

I wonder if the explanation should explain that when explaining to prepend the [None]? And, in fact, would it even make sense to just explicitly pass np.zeros((vocab_size,1)) ? After all, the underlying utils.py rnn_forward function ultimately does in that if block where it checks if it needs to set 1 on the one-hot encoding or not.

That way it’s similar to how we initialize the activations and also the memory cell. I think it makes intuitive sense mathematically–with the activations and x set to 0s, they’re ignored. So it’s up to the activation biases and y prediction’s weights and biases to learn how to start off when building a new dinosaur name.

Hi, I also have problems asserting the example as true where I need to train the model. I tried to find out what the problem is but somehow are stuck.

All the before examples (so the functions I use in the model training) pass the test without problem. Also the model is training without problem. But the last dinosaur name does not match with the expected output …

Can you please give me some hints on typical errors? I did not change any of the random seeds etc.

Thank you so much!



Sorry, I don’t understand what this means.

Can you post a screen capture that shows the error or assert message?

Yes, as Tom says, seeing the actual output would help. If all your code passes the previous tests, one common mistake is discussed on this recent thread. See if your output matches what is shown there.

Hi paulinpaloalto and TMosh,

thank you for your feedback. That actually solved it!! Thank you :slight_smile:

For completeness and others who may encounter the problem:


Iteration: 22000, Loss: 20.578871


AssertionError Traceback (most recent call last)
1 parameters, last_name = model(data.split(“\n”), ix_to_char, char_to_ix, 22001, verbose = True)
----> 3 assert last_name == ‘Trodonosaurus\n’, “Wrong expected output”
4 print(“\033[92mAll tests passed!”)

AssertionError: Wrong expected output

Expected output

Iteration: 22000, Loss: 22.728886


Probably using the non shuffled input data:

single_example = data_x[idx]

Change to use shuffled input data:

single_example = examples[idx]

For more info see: Here