In the Understanding Residual networks video, the instructor said
"Why should you have three blocks of code here to go through the data when instead you could have a loop that runs the data through residual type 2 three times instead? Also, it could be the same weights in each block, so instead of each of the three blocks learning independent, wait separately, you get one block that is learned and executed three times. "
I did not completely understand why some blocks could have the same weights?
In which situation the identity block can have the same weights?
Course Q&A
TensorFlow: Advanced Techniques Specialization
Custom Models, Layers and Loss Functions with TF
Even if they have the same weights it does not mean they extract the same information because they are executed in sequence, so they have an additive effect on each other. Maybe this way is less complex to the network and by the way there are so many parameters there that even when some are the same does not reduce the effectiveness of the network. When dropout for example is used some weights are not even taken into consideration.
1 Like