Which layer matters for .get_weights(...) in a neural network?

Hello -

If we use the following code for a neural network:

model2 = Sequential(
        [Dense(6,activation='relu',kernel_regularizer=tf.keras.regularizers.l2(0.1),name='layer_1'),
         Dense(6,activation='relu',kernel_regularizer=tf.keras.regularizers.l2(0.1),name='layer_2'),
         Dense(1,activation='sigmoid',kernel_regularizer=tf.keras.regularizers.l2(0.1),name='layer_3') 
           ], name = "out_layer_is_sigmoid" )

model2.compile(
    loss=tf.keras.losses.BinaryCrossentropy(), optimizer=tf.keras.optimizers.Adam(0.001) )

Then, I can use the following code to get the weights for each layer (and the biases):

layer1_final_weights = model2.get_layer("layer_1").get_weights()
layer2_final_weights = model2.get_layer("layer_2").get_weights()
layer3_final_weights = model2.get_layer("layer_3").get_weights()

I get three different weights. I assume that the weights only matter for the last layer (‘layer_3’ in this case).

Questions:
1.) Which layer weights are the ones that “matter”. For example, if I wanted to use the weights from the model I made above using .set_weights(…), which layer weights would I use?
2.) If we only use one layers weights for set_weights(), (assuming that is true), then what observations can be made by looking at the weights from the other layers? Is there something insightful/useful that we should look for?

Thanks.

Hi @naveadjensen ,

You get different weights because each layer has to have its own weights.

Actually all layers matter, not only those in the last layer. I am curious: why do you think only the weigths in layer3 matter?

You would not do that ‘by hand’. Setting, or better, updating the weights is a work done by the back propagation. May be the only moment when you would set the weights is at initialization, and even in that case you would use a function like random or some other initialization technique.

In general, you would not be looking inside the weights. This is just ‘a bunch of numbers’ that are being updated by the model on training. Well, when I say just a bunch I am not being fair because actually this ‘bunch of numbers’ is the learning, it is like the secrete ingredients that make the model work, once it is trained.

1 Like

Thanks @Juan_Olano for the response. Your answers have cleared things up for me, and forced me to think deeper about what I was asking – which brought up a few new thoughts/questions.

I was naively assuming that in some way the weights would be useful to us in our analysis, similar to an old-fashion linear regression where the weights can be analyzed and tell us some information about the associated input (feature, coffee roasting time for example). That is why I thought that only the weights from the last layer “mattered” in the sense that they are the ones that tell us what the model ended up with. I was also thinking that the last layer weights would have some meaning relative to the neuron it was representing – but then I realized that by layer 3, the neuron is not the really representing the input feature that it started with and is instead something else (that I don’t quite understand). Here are a few more question I’m hoping you or someone could shed light on.

1.) Is there a reason to use set_weights? (besides at initilization as you mentioned).
2.) Is there a reason to use get_weights? I understand that they have meaning as far as the learning that occurs within the layers, but as a human looking at them is there any information gain I can achieve?
3.) Maybe I am focusing on the weights too much, and what I should be focusing on is the performance measures of the model such as error and the ROC curve - any thoughts?

Thanks!

Hi @naveadjensen ,

I am glad that things are more clear now.

Regarding your new questions:

1 and 2: Is there a reason to use getWeights / setWeights? One case scenario where I see the need of these functions is when you want to pass the weights from a layer of one model to a same-shape layer on another model. Some type of ‘quick transfer learning’. I guess another use of getWeights could be when you want to explore/research/analyze the evolution or state of weights… I am thinking for example in the case of convolutional neural networks, some researches have shown how the model takes features from images and how this evolves through the model. I guess they may be using some version of getWeights to see what the model is doing.

@rmwkwok have you ever used get_weights / set_weights? if so, can you share a bit of your experience?

  1. At this point of your learning process, I would recommend that you learn and understand very well the forward propagation, backward propagation, how you can model a NN by defining its architecture (layers, units, types of layers), and also learn the types of models that Prof Ng is presenting in Course 2 and forward.

MLS has a good level of abstraction that allows to get started in Machine Learning. Once you finish this specialization, you may want to consider getting deeper into the deep in the DLS specialization where you’ll learn more of the ‘behind the scenes’ of ML.

Hello @Juan_Olano, @naveadjensen,

I think we want to do them when saving and reading a checkpoint, especially when the model is a combination of two or more models, and we only have separated checkpoint files for each of them, and we want to reuse the current model object. Then, overwriting weights value onto existing model object is a way out!

When we trouble-shoot. Commonly known problems such as exploded gradient or gradient vanishing (covered in the Deep Learning Specialization) can be hinted by abnormal prediction results, but has to be verified by the weight values.

When our training algorithm needs it. We perform “soft-update” in a Reinforcement Learning Model (covered in MLS Course 3 Week 3) which requires us to take out the weights from one model and somehow added to another model.

When we inspect our model. Although it is more research-oriented, but it is not limited to professional researcher, is it? Sometimes we need to do our research to know better. On google, we can find some papers describing how they analyze the amount of information learnt by a NN, and ofcourse, those analysis were upon the weights.

Cheers,
Raymond

1 Like