In which situation the identity block can have the same weights?

Even if they have the same weights it does not mean they extract the same information because they are executed in sequence, so they have an additive effect on each other. Maybe this way is less complex to the network and by the way there are so many parameters there that even when some are the same does not reduce the effectiveness of the network. When dropout for example is used some weights are not even taken into consideration.

1 Like