In the lecture, Laurence talks about defining custom Model classes for recurring building blocks / sub-networks in a larger model architecture. But in the ResNet-like example, he uses Layer as base class for CNNResidual and DNNResidual. At first sight, I found a Layer containing itself potentially many many Layers a little counter-intuitive, but admittedly it seems arbitrary to worry about sub-Layers within Layers, while accepting the concept of sub-Models within Models…
To the point: what are reasons, or good rules of thumb, for choosing one base class (Layer) over the other (Model) for such building blocks?
Inheriting from Layer vs Model for recurring building blocks
I would say it depends on how you think you might (re)use them. In the class hierarchy a Model is-a Layer. A Layer is-a Module, and thus so is a Model. However, a Layer is NOT a Model. A Layer has no compile()
or fit()
API, for example. In this sense Model extends Layer and adds behavior. Model also extends the state of Layer - it has an attribute that is a collection of Layers. Layer, by default, does not, though you can add it to one you define yourself.
The more likely you are to use the object standalone as a container of multiple Layers, and do training on it as a single unit, the more you should reuse what Model brings along. The more likely you are to use the object as a stackable component in a larger context, especially if you will use more than one instance in that context, and train only as part of a larger whole, the more you would prefer Layer. Does that make sense?
Yes, that makes a lot of sense. So, avoid the extra baggage that Model brings with it (on top of Layer) unless it’s actually needed. And think of a Layer more freely as some encapsulated structural unit, rather than literally as “one layer”. Anyway, plenty of food for thought. Thanks!
This is a similar question to the one we have a bit further down on this page:
Glad we both came to the same conclusion in both places
I like this comment string from the Model class source on github
`Model` groups layers into an object with training and inference features.
Here is the corresponding description from class Layer
This is the class from which all layers inherit.
A layer is a callable object that takes as input one or more tensors and that outputs one or more tensors. It involves *computation*, defined in the `call()` method, and a *state* (weight variables).
I also notice that while both Layer and Model accept one or more tensor inputs upon which they perform operations to produce one or more tensor outputs, Model requires a specific type of input tensor, which is an instance of class InputLayer.
I can really give myself a headache reading this code. Looks to me like keras.Input() is a factory method that builds instances of InputLayer. It actually returns the input’s outputs
def Input():
…
input_layer = InputLayer(**input_layer_config)
…
outputs = input_layer._inbound_nodes[0].outputs
if isinstance(outputs, list) and len(outputs) == 1:
return outputs[0]
else:
return outputs
So a Model’s inputs are the output(s) of an InputLayer. While a Layer’s inputs, as best as I can tell, can be any tensor, or dict/list/tuple of tensors.
This is a very detailed and interesting explanation of the question, thank you for your effort. Its just wonderful that your oversight went into this depth!
Thank you both for the insightful discussion, and sorry for missing the related earlier discussion (When to use Layers or Model as parent class) – I went straight for the “Week 4” sub-forum and overlooked the general one…
Interesting point about the requirement for a Model’s input to be InputLayers – even though that seems to apply primarily when using Model “neat”, within the context of the Functional API. It is no longer the case for the custom-define subclasses derived in the W4 lectures/lab to construct ResNet (given the specific way the init and call methods are defined). So at least, custom Model subclasses can be made to stack/re-combine as flexibly as custom Layers (but of course with much additional functionality/overhead that may not actually be needed). Does that sound fair?
Generally speaking it can be fair to say that (but I am not sure if there are many practical applications of
“Model subclasses can be made to stack/re-combine”).
I just came across an interesting use case in the NLP domain. Here, a Layer is instantiated and then executed inside a user defined function…it isn’t part of a Model at all. Notionally, it looks like this…
def vectorize_text(vectorize_layer, text):
text = tf.expand_dims(text, -1)
return vectorize_layer(text) #<== invokes vectorize_layer.call() ==
my_vectorization_layer = tf.keras.layers.TextVectorization(
standardize=custom_standardization,
max_tokens=max_features,
output_mode='int',
output_sequence_length=sequence_length)
some_text_as_vector = vectorize_text(my_vectorization_layer, some_text)
The TensorFlow doc for Layer starts its narrative with A layer is a callable object…. So you can use the instance call()
method to do something, eg vectorize text, perform a convolution, etc. without ever building or training a full model.
In the NLP case, I had a Sequential model that was defined to accept vectorized strings into its first, Embedding, layer. That means the inputs look like this…
Vectorized review (<tf.Tensor: shape=(1, 250), dtype=int64, numpy=
array([[ 86, 17, 260, 2, 222, 1, 571, 31, 229, 11, 2418,
1, 51, 22, 25, 404, 251, 12, 306, 282, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
which is not particularly human-readable. But by defining a new model that wraps the old one and add the text vectorization layer, you can directly pass in strings…
requres_vectorized_input_model = tf.keras.Sequential([
layers.Embedding(max_features + 1, embedding_dim),
layers.Dropout(0.2),
layers.GlobalAveragePooling1D(),
layers.Dropout(0.2),
layers.Dense(1)])
accepts_unvectorized_strings_model = tf.keras.Sequential([
my_vectorization_layer,
requres_vectorized_input_model,
layers.Activation('sigmoid')
])
I take the first model, which requires vectorization, and stack it below the instance of tf.keras.layers.TextVectorization that I built above. So some interesting ideas about creating a Layer as a Layer, then combining that with an existing Model as a Layer to make a new Model. The standalone Layer instance call()
function can also be invoked in procedural code regardless of it having ever been used in a model or had its weights trained.
Now I can pass in human readable text strings and transparently see how the sentiment analysis is turning them into floating point values using primarily encoding in the Embedding layer.
examples = [
"The movie was great. Hilarious. Beautiful!",
"The movie was pretty good",
"The movie was okay.",
"The movie was terrible",
"The worst most awful ridiculous pathetic movie ever!"
]
accepts_unvectorized_strings_model.predict(examples)
array([[0.7906576 ],
[0.5475819 ],
[0.4600771 ],
[0.37825677],
[0.08336403]], dtype=float32)
Interesting stuff, interesting application. It seems to be very easy to predict accurately sentiment by using a simple notebook without complex transformations.
I think that many interesting combinations can be created with the use of layers for different applications. I had the idea that layers can be used independently of a model, to let’s say transform data through, which could be used further down the processing stream, but still I think in order to achieve a full learning cycle you need the properties of the model.
Exactly. There is no learning when using a Layer outside a model…just computation.