Residual Nets Question

See the code below:

def init(self, filters, kernel_size):

    super(IdentityBlock, self).__init__(name='')

    self.conv1 = tf.keras.layers.Conv2D(filters, kernel_size, padding='same')
    self.bn1 = tf.keras.layers.BatchNormalization()

    self.conv2 = tf.keras.layers.Conv2D(filters, kernel_size, padding='same')
    self.bn2 = tf.keras.layers.BatchNormalization()

    self.act = tf.keras.layers.Activation('relu')
    self.add = tf.keras.layers.Add()

def call(self, input_tensor):
    x = self.conv1(input_tensor)
    x = self.bn1(x)
    x = self.act(x)

    x = self.conv2(x)
    x = self.bn2(x)

    x = self.add([x, input_tensor])
    x = self.act(x)

self.conv1, self.bn1 and self.conv2, self.bn2 have the exact same definition. In the call definition they get stacked one set after the other. I’m thinking that had the same conv-layer and norm-layer combo been stacked twice it would have crated some kind of loop in which the second iteration would have resumed exactly from the departure point of the first one (let’s say input would have been equal to latent state) instead of latent state being norm initialisation as the input from iteration one first meets this layer and then from then on the reason to have two different layers that do exactly the same is for each to have its own latent state / bias rather than the same layer “training twice” am I correct thinking that? Then, why is the exact same instantiation of the activation layer used twice? Why doesn’t this layer create the loop using the exact same conv/dropout combo would have created and which then requires it being declared twice? Thank you :slight_smile:

Maybe I’m simply not understanding the question, but once you instantiate a TF/Keras “Layer” function, that just gives you a function. What actually happens in the compute graph is determined by what input you feed to the function and what you do with the output, right? So just because I invoke the same function twice in the same graph does not mean it’s a “loop”: the inputs and outputs are different each time. Of course the interesting thing about the Residual Net case is that the graph is not “simply connected”.

But given that, maybe the question is why it’s necessary to go to the bother of defining two functions which do the same thing. I think it’s just a coding clarity or “style” issue and maybe they want to reserve the right to alter the parameters in one of the instances so that it’s not the same function.

2 Likes

Right yes, thank you! It got me a bit confused, but back on the right track, thank you!

I agree with Paul here, its used for clarity and flexibility.

Yes, I’m with you (and thank you!)

So this two pieces of code would work exactly the same:

def call(self, input_tensor):
    
    x = self.conv1(input_tensor)
    x = self.bn1(x)
    x = self.act(x)

    x = self.conv2(x)
    x = self.bn2(x)
    x = self.act(x)

and

def call(self, input_tensor):
    x = self.conv1(input_tensor)
    x = self.bn1(x)
    x = self.act(x)

    x = self.conv1(x)
    x = self.bn1(x)
    x = self.act(x)

Sometimes pandas for example requires making an explicit copy of a df because the object in memory is the same one if not declared as a copy and simply assigned to a different variable… Anyhow, glad to hear there’s none of that here… Or is there? :slight_smile:

That’s not just a pandas thing: it’s a generic python thing. You always have to be conscious of the semantics of passing objects to functions in python. They are passed by reference, not by value, so it’s up to the code in the called function to copy any object before doing an “in place” operation on it. Here’s a thread which gives a classic example of that issue. But here we are dealing with predefined TF/Keras “Layer” functions and those are written to return independent objects and not to modify their arguments.

Hello guys, I might interpret it wrongly but it seems like repeated call to the function does produce a loop, as the plot produced was something like this

The code to reproduce the plot is as below, the plotting code is referring to
Plotting for Subclass Model

class IdentityBlock(tf.keras.Model):
    def __init__(self, filters, kernel_size):
        super(IdentityBlock, self).__init__(name='')

        self.conv1 = tf.keras.layers.Conv2D(filters, kernel_size, padding='same')
        self.bn1 = tf.keras.layers.BatchNormalization()

        self.conv2 = tf.keras.layers.Conv2D(filters, kernel_size, padding='same')
        self.bn2 = tf.keras.layers.BatchNormalization()

        self.act = tf.keras.layers.Activation('relu')
        self.add = tf.keras.layers.Add()
    
    def call(self, input_tensor):
        x = self.conv1(input_tensor)
        x = self.bn1(x)
        x = self.act(x)

        x = self.conv2(x)
        x = self.bn2(x)

        x = self.add([x, input_tensor])
        x = self.act(x)
        return x
    
    def build_graph(self):
        x = tf.keras.Input(shape=(100,100,64))
        return tf.keras.Model(inputs=[x], outputs=self.call(x))

id1a = IdentityBlock(64, 3)

id1a.build_graph().summary()
tf.keras.utils.plot_model(id1a.build_graph())

For a more precise proof, I have created dummy models with and without repeated function call, and the results show that the one with repeated call have 20 parameters less than the other.

Code to reproduced is as below,
Dummy Model With Repeated Call:

class DummyModelWithRepeatedCall(tf.keras.Model):
    def __init__(self):
        super(DummyModelWithRepeatedCall, self).__init__()
        self.dense1 = tf.keras.layers.Dense(10, activation='relu')
        self.dense2 = tf.keras.layers.Dense(10, activation='relu')

        self.bn1 = tf.keras.layers.BatchNormalization()

    def call(self, inputs):
        x = self.dense1(inputs)
        x = self.bn1(x)
        x = self.dense2(x)
        x = self.bn1(x)

        return x
    
    def build_graph(self):
        x = tf.keras.Input(shape=(10))
        return tf.keras.Model(inputs=[x], outputs=self.call(x))

model = DummyModelWithRepeatedCall()

model.build_graph().summary()

Dummy Model Without Repeated Call:

class DummyModelWithoutRepeatedCall(tf.keras.Model):
    def __init__(self):
        super(DummyModelWithoutRepeatedCall, self).__init__()
        self.dense1 = tf.keras.layers.Dense(10, activation='relu')
        self.dense2 = tf.keras.layers.Dense(10, activation='relu')

        self.bn1 = tf.keras.layers.BatchNormalization()
        self.bn2 = tf.keras.layers.BatchNormalization()

    def call(self, inputs):
        x = self.dense1(inputs)
        x = self.bn1(x)
        x = self.dense2(x)
        x = self.bn2(x)

        return x
    
    def build_graph(self):
        x = tf.keras.Input(shape=(10))
        return tf.keras.Model(inputs=[x], outputs=self.call(x))

Plotting

modelwithout = DummyModelWithoutRepeatedCall()
modelwith = DummyModelWithRepeatedCall()

modelwithout.build_graph().summary()
modelwith.build_graph().summary()

Please let me know if there is any mistake. Thank you