I red the paper “Stacked Convolutional Auto-Encoders for HIerarchical Features Extraction” but I didn’t understand well the why of the paper (based on everything said in the introduction) and have some questions/clarifications.
What is identity mapping(it was mentionned in CNN Week 2 while talking about ResNets) ?
“Unsupervised initializations tend to avoid local minima and increase the netword’s performance ability”: since usually we initialize weights randomly I don’t understand this sentence.
Identity mapping means that a tensor is passed along as is. This can be problematic if it does not add complexity needed for the performance of the network. But it can be useful if an additional layer leads to a worsening of performance due to added complexity. A residual layer allows for identity mapping by allowing the possibility of bypassing complexity through a skip connection.
As discussed in the paper on unsupervised pre-training, unsupervised pretraining has been found to have regularization effects helping to avoid local minima.
This discussion of unsupervised pretraining may also clarify, as may this discussion of the effect of unsupervised pre-training.