W1 | copy.deepcopy for w_in not b_in| Question about the optional lab on gradient descent

oliveiraph17 · June 19, 2022, 7:09pm

Hi there! My question is about the following line in the function gradient_descent.

I’d like to know why a deep copy is made for the initial value of parameter “w”:

w = copy.deepcopy(w_in) # avoid modifying global w_in

I understand the concept of deep copy, but why is this made for w_in and not for b_in?
Besides, the initial value w_in is not needed later on.

Thanks in advance!
Cheers from Sao Paulo, Brazil.

shanup · June 19, 2022, 8:16pm

Regarding the difference in treatment for Weights and Bias: As a general practise, the weights are initialized to a random number while the bias is initialized to 0.

It is required to do this to the weights to break their symmetry and let the learning algorithm take each of the weights along different paths therby facilitating the learning of different new features - This is a little out of scope for where we are right now in Course 1.

This technique does not carry any importance in the case of a Linear Regression model. However, it becomes extremely crucial in the case of more advanced models such as Neural Networks, where the model starts out with the input features and then learns or creates new features in subsequent layers - It is here that it becomes extremely important to start out the learning algorithm with weights having different initial values, so that their final values are also different by the time the learning algorithm converges or stops.

In the Course 1 Optional Lab, we can look at it more like a trial run where the intent was to get us accustomed to this best practise right from the outset of impementing a Learning Algorithm. In Course 2, where we will get to deal with Neural Networks, it will have a real and significant impact.

rmwkwok · June 20, 2022, 2:41am

Hello @oliveiraph17 , from programming perspective,

because when Python passes a variable to a function, it does not create a new copy of the data of that variable, so modifying the data inside the function will change that data everywhere because there is always only one single copy of that data. This can be bad. To avoid this, we make a new copy and only modify the new copy inside the function so it will not affect the outside world. A nice stackoverflow discussion here.

I think the best practice is we should do the same for both w_in and b_in when we want to modify them inside the function without affecting the rest of the world.

It is ok that the variable w_in is not used later on, because the content of it has been copied to w. Remember we make a copy because we do not want to change anything in w_in.

However, the real problem is, if you look at the highlighted line

Screenshot from 2022-06-20 10-36-59

w is later on reassigned to w_in which means the effect of making a new copy has gone! we are still modifying the w_in because now both w and w_in points to the same data.

Having said that, is this a problem in this particular exercise?

No. Because our code in the outside world does not need to preserve the w_in value. Also we need to do copy.deepcopy when the variable is, for example, a numpy array, a pandas dataframe or complex objects like that. In our case, w_in is merely a int which needs not to be worried about.

poco · May 10, 2024, 1:59pm

No. Because our code in the outside world does not need to preserve the w_in value. Also we need to do copy.deepcopy when the variable is, for example, a numpy array, a pandas dataframe or complex objects like that. In our case, w_in is merely a int which needs not to be worried about.

So in all the occasion we have “w = copy.deepcopy(w_in)”, “w=w_in” is just redundant ?

rmwkwok · May 13, 2024, 2:16am

Hello @poco

I have explained why we may/may not need deepcopy which means it depends.

Using deepcopy all the time seems to be fine, but it will also cost additional memory which can be a problem when working with a large amount of data.

Sometimes, you might want to design your program such that all functions update the same object of weights to maintain the same level of memory use.

Therefore, the answer really depends. There can be a reason for an assignment notebook to use deepcopy, but there can also be a reason for your model notebook to not use it at all.

Cheers,
Raymond

Topic		Replies	Views
It doesn't need 'deep copy' or even 'shallow copy' of `w_in` Supervised ML: Regression and Classification week-2	8	527	June 23, 2024
C1 - W3 Exercise 7 copy.deepcopy Neural Networks and Deep Learning	4	962	August 14, 2022
Use of deepcopy in C1_W1_Lab05_Gradient_Descent_Soln Supervised ML: Regression and Classification week-1	8	739	July 20, 2022
Course 1: Week 2 Logistic_Regression_with_a_Neural_Network_mindset Neural Networks and Deep Learning	2	774	September 16, 2021
W3E7 deep copy, and gradient equation Neural Networks and Deep Learning	2	480	October 23, 2023

W1 | copy.deepcopy for w_in not b_in| Question about the optional lab on gradient descent

Related topics