Week 1, Programming assignments (# 2, #3 )

Maryam_rm · September 30, 2024, 1:42am

I have a question about week 1 assignments for the sequence models. I understand that for the generations tasks, we need to feed the output from previous time of t to the next one as an input. So, in order to create new word or character, we need to use the generated ones in previous step. However, I am not sure how to implement that in the code particularly in the model function of generating characters for dinosaurs names, I am not confident which slices of X do we use for Y to optimize our model. Do we use all X created so far and use that as predictions? Please help me understand the code for this section.

Thank you,

TMosh · September 30, 2024, 3:32am

I recommend you read the instructions in the notebook very carefully,.

Maryam_rm · September 30, 2024, 1:35pm

The instructions tells : Set the list of labels (integer representation of the characters): Y

The goal is to train the RNN to predict the next letter in the name, so the labels are the list of characters that are one time-step ahead of the characters in the input X.
- For example, Y[0] contains the same value as X[1]
The RNN should predict a newline at the last letter, so add ix_newline to the end of the labels.
- Append the integer representation of the newline character to the end of Y.
- Note that append is an in-place operation.
- It might be easier for you to add two lists together.
  So, in the function “model” in the Dinosaur Island assignment, the model iterates over the words in the dataset, decomposes each into the index and characters indices and uses that as the optimization entries for that particular learning step. So, basically, it learns from each words and updates its activations based on the words sequence of characters. So, X is basically the characters and Y should be one step ahead. When I put it in words, I understand it, but it does not click for me the whole process. I still have some difficulty understanding the character level optimization, maybe it is because it is stochastic GD.

paulinpaloalto · September 30, 2024, 4:20pm

Yes, the instructions here are very detailed and it sounds like you have understood them correctly.

I’m not sure I see why the size of the batch for GD affects the intuition here. It’s just a question of whether you are averaging the gradients over multiple samples or not. The learning should end up being statistically the same, although the exact path you take along the solution surface to get there may vary a bit.

Maybe the other key thing to keep in mind here is the difference between training and inference mode. In inference mode, they also introduce randomness at the level of the individual characters just to keep things more interesting, meaning that the model doesn’t always generate the same name every time.

My apologies if I have just missed the full subtlety of what you are saying here.

Maryam_rm · October 6, 2024, 3:45am

It is resolved. Thank you.

Topic		Replies	Views
Week 1 Assignment 2 Dino Name Exercise Sequence Models coursera-platform	4	648	July 5, 2021
Week1-Assignment2-Exercise4: Model() Sequence Models coursera-platform	1	510	July 18, 2022
Wk1 Dino Island model function Y label Sequence Models coursera-platform	1	513	September 23, 2022
Week 1 Coding Assignment 2 Sequence Models coursera-platform	4	442	August 12, 2023
Language Model and Sequence Generator - Using y as input instead of y_hat Sequence Models coursera-platform	12	537	February 28, 2023

Week 1, Programming assignments (# 2, #3 )

Related topics