Based on the above slide, I understand that inputs/outputs may be of varying size, but assuming we ignore this issue, I don’t understand the second bullet point…
Could we not make a convolutional network with input dimensions of (m x Tx_max x 10000 x 1)? If so, what prevents this Convnet from sharing features across different positions, and in that case why do normal Convnets succeed in sharing features across positions of an image?