A doubt in Week 1 Assignment

JJaassoonn · December 10, 2023, 9:11am

Dear Mentor,

In the Week 1 Assignment, Dinosaurus Island Character level language model

Section 3.2 Training the model
– Exercise 4 - model

The loss is initilized by using this formula

loss = get_initial_loss(vocab_size, dino_names)

def get_initial_loss(vocab_size, dino_names):
      return -np.log(1.0/vocab_size)*dino_names

Could you please guide me the reason of defining this formula?

Thank you.

TMosh · December 10, 2023, 7:52pm

That’s a good question (and one I have not seen on the forum before). Immediately, I don’t know where that comes from.

I’ll do a little research and see if I can find out.

TMosh · December 10, 2023, 7:54pm

FYI, if you’re going to post code examples on the forum, please remember to enclose them in the “preformatted text” tag. That will preserve the indentation and makes the code more readable.

Just a reminder for clarity - you should not post your code for the assignments, but posting examples (or error messages) is fine.

paulinpaloalto · December 11, 2023, 12:01am

Yes, it is an interesting point that they don’t really say much about in the assignment. I guess there is too much else new that is going on, so they don’t make a point of this. But it is paired with this function that we can see them use to “smooth” the loss:

def smooth(loss, cur_loss):
    return loss * 0.999 + cur_loss * 0.001

So what that says is that they are doing an EWA (exponentially weighted average) for the loss. They just make the brief comment that this smoothing helps convergence and don’t explain any more than that, so I guess we just have to take their word for it. Well, this is an experimental science, so you could initialize the loss to 0 and then remove the smooth call and see what happens. And to complete the experiment, try smoothing with the 0 initialization and see which of the 3 strategies gives you the best convergence.

You can see the values of vocab_size and dino_names in the default values declared in the declaration of model, so it would be easy to compute the initial loss, but I just added a print and ran that cell:

initial loss 23.070858062030304

Remember when Prof Ng explained EWAs back in DLS Course 2, he mentioned that you could either start the sequence at 0 and just let it gradually stabilize or you could do bias correction to help compensate for the startup or you could do initialization, but I don’t have anything in my notes about typical initialization strategies. Notice that the \beta value is pretty small (0.001), so not doing initialization would mean it would take it a while to stabilize to a realistic value. How they came up with that specific formula I don’t know. We’d have to go back and review the lectures on EWAs in DLS Course 2.

Topic		Replies	Views
[W1A2E4] Model() - Wrong expected output: Loss drops too quickly Sequence Models	3	561	September 14, 2022
DLS C5W1A2E4, Type Error Sequence Models	1	614	August 18, 2021
Help (C5-W1-WB2) Loss is lower than expected result Sequence Models	5	556	February 11, 2023
Course 5 Week 1 Assignment 2 Exercise 4 - model() Sequence Models week-1	2	355	February 18, 2024
Problem in course5 week1 Assignment2 Sequence Models	43	1485	July 3, 2021

A doubt in Week 1 Assignment

Related topics