MATLAB code for hidden layer assignment DLS1 week3

Hi guys,

In order for me to understand better this assignment, I decided to create the same process in MATLAB. Although I am getting some quite different results and is getting difficult to follow.

One of the salient differences between MATLAB and python is that indexing in MATLAB is 1-based and in python it’s 0-based, right? Of course in MATLAB you have the beautiful way that vectors and matrices are just a natural part of the language and you don’t have to deal with numpy, as a separate library.

1 Like

I took a quick look at your code. There must be a better way to handle the way you pass the parameters as lists. I used to know MATLAB pretty well, but haven’t written any in a few years. Don’t they have an index data structure like python dictionaries? You also shouldn’t need to pass m around, since you can deduce it from the shapes of the X, A or Z values, right?

The one thing I can see that is clearly different is your initialization function:

You multiply by 0.001, instead of 0.01. But the other issue is that you are using rand, which is the uniform distribution on [0,1], right? In the course notebook, we use the Normal Distribution with \mu = 0 and \sigma = 1. Big difference! Initialization matters …

Also you have a “copy/paste” error in the comments in your tanh function. :nerd_face: :laughing:

1 Like

Thanks for that! I guess you are right about the initialization of the parameters. Let me make some adjustments. I was playing with the 0.01 value because although I am getting the first correct value of the cost function as the exercise, then later all turns bad.

Yes, the rand function is a uniform distribution. But I am getting this crazy iteration.
I have edited this reply with a more suitable learning rate value of 0.05. I will try to run the code with other data; perhaps the way I import it is incorrect.
Learning rate 0.05

Hmmm, there can be some oscillation with that high a learning rate, but that does look kind of crazy. The first question is whether your back prop logic is correct. My take is that you have not implemented the derivative of tanh correctly. Have a look and compare that code to what you wrote in the notebook.

1 Like

Yeah, that could be the reason…

Yep SIR! a little mistake in the derivative for tanh!
Learning rate 0.05_V2

Very cool! That’s a nice looking convergence curve!

Just out of curiosity, is that curve with the learning rate of 1.2 or with a lower rate?

That one is with a lower rate of 0.05. Using 1.2 is quite steep convergence. I am happy with the results. Thank you so much for taking the time to check my code!