Is the Input Layer considered its own layer

When you are counting the layers of a Neural Network, do you include the input layer in the count?

Typically you don’t count it as “a layer”, since it really isn’t, right? But you need to know the dimensions that it presents, of course. A “layer” is where actual processing happens, be it Feed Forward, Conv, Pooling or RNN/LSTM/GRU or …

1 Like

Thanks, yet it is named the “input layer” and has its own interface; you cannot make a meaningful network without it. You can make a network with just an input and an output layer.

The naming convention feels confusing.

Nevertheless, I will match your convention.
But in my heart, the “input layer” will always be the first (where everything starts.)


My point was that no processing actually happens in the “input layer”: it is there only to declare the shape of the input. You need that, of course, but it’s just a question of what you call it.

And what does it matter whether you call your network a 99 layer network or a 100 layer network? What matters, e.g. in the case of the layers_dims array in DLS C1 W4 A1 and A2, is that you need an entry in that array for the dimension of the input. But there are no weights or bias values corresponding to the input layer, so the dimension of parameters is 2 * (L - 1), if L is the number of entries in layers_dims.

I hear you. The layer does not count literally (because it does not process). But it does process; it can do batch processing. Anyway, I just wanted to know what convention the class uses. If I ever teach or write, I will count it.

Sure, when you are the Professor, you can do it your way. But here it’s Prof Andrew making the decisions and we just need to understand his approach. :nerd_face:

I respect that response.
Yet no matter how much I like and appreciate the professor (and I do a lot), logic and reason must stand on its own.

My favorite philosopher is Socrates, (Ok, it did not end well for him to ask so many questions, I still like him).

Hi guys,

From my point of view, there is no need to add the input layer or even the output layer, if was the case in the layer count

Doesn’t sound logical natural to me, since it’s an implicit layer to build a neural network.

After all, without the input layer we don’t build the neural network, right?

Anyway, I find an interesting discussion regarding the conventions naming used in this days.

Best regards

You realize that you are referring to it as a “layer” or “input layer” and that I can create a neural network with an input layer and output. By your logic, “there is no need to add the input layer or even the output layer, it was the case, in the layer count,” then a shallow neural network with only an input layer and an output layer has no layer, thus has 0 layers.

Reductio ad absurdum

While technically accurate, it’s important to recognize that the term “layer” in this context primarily emphasizes the distinct stages of data flow, even though no additional computational transformation occurs between the input and output.

With that in mind, the utilization of the term “layer” helps maintain consistency in the nomenclature across various network architectures, irrespective of their depth and complexity. Nonetheless, the fundamental difference lies in the absence of hidden layers, which play a vital role in capturing hierarchical representations and handling more intricate patterns in deep neural networks. As such, understanding the distinction between shallow and deep neural networks aids in choosing the appropriate architecture for specific problem domains.

I do wholeheartedly accept that “the distinction between shallow and deep neural networks aids in choosing the appropriate architecture for specific problem domains.”

It is the manner of counting the layers that feels odd to me.

1 Like

From my perspective, the “layers” is counting like this because each layer represents a distinct stage of computation, and data flows sequentially through these layers. The depth of a neural network refers to the number of hidden layers it contains, while the overall architecture and the number of neurons in each layer impact the network’s capacity to learn complex patterns from data.

Hi @ajeancharles

Let me offer my perspective - input layer is not a “layer” it’s just input data.

Hidden layers are the layers that processing happens. And even here the “layer” count is not agreed upon universally. For example, in this course, trax counts activations as a “layer” (ReLU is a layer because it changes the values of the tensors). Other libraries consider “layers” only those that have weights (even though LayerNorm and other internally also include weights but of different kind.
Another point is that most consider LSTM layer as a single layer even though there is much processing going inside (multiple weights (or linear layers), multiple activatations, etc.)

Output layer is considered as a different kind since it produces the results of shape that you need (even though technically it could be the same kind as a hidden layer (Linear for example)).

So I wouldn’t worry much about “layer” count because layers are not created equal :slight_smile: and you can call input data as input data and not a “layer”.


1 Like

You can “batch process” in the input layer. We have a whole exercise in the course about it (" see C3_W1_Assignment: Creating a Batch Generator").

“Batch process” is a term for that you are processing not a single example but a number of them at time (more concretely - “mini batch processing” for a certain number of examples at a time, “batch processing” for the whole dataset but it’s often loosely used as meaning “mini-batch” processing). That has nothing to do with the input data/layer, and for that matter even with the hidden layers (they don’t “care” if you process a single example or a bunch of them). In other words, “batching” is not “processing” per se.

On the other hand, you can do a multiple operations on inputs before the neural network processing, like pre-processing (stemming, punctuation removal, etc.), different tokenization techniques and other stuff, but that has nothing to do with the neural network architecture (hence layers of it).

1 Like

“multiple operations on inputs before the neural network processing, like pre-processing”, that says everything.

yes @ajeancharles, :slight_smile: excactly - the input layer (or input data) is not a part of the neural network but what happens before, so it is not considered a layer in a neural network.

Nobody considers plugging an usb flash drive to a pc, then copying the contents to a hard drive etc, as an input layer even though this operation needs “processing”…

1 Like

That was a very good explanation @arvyzukai

Thank you for contributing.

1 Like

I like your example a lot; thank you for offering it.
I can run my computer without a USB drive; I can’t run a neural network without an input layer.

Thank you for the suggestion.

1 Like