\underline{\textbf{Implementation}} \textbf{:}

Say we have a 4\times4 image, which in matrix form, is \begin{bmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8\\ 9 & 10 & 11 & 12\\ 13 & 14 & 15 & 16 \end{bmatrix}.

Its horizontally flipped image is the matrix given as \hspace{1pt} \begin{bmatrix} 4 & 3 & 2 & 1 \\ 8 & 7 & 6 & 5 \\ 12 & 11 & 10 & 9\\ 16 & 15 & 14 & 13 \end{bmatrix}.

When the image and its horizontally flipped version are fed in to a NN, the output should be **exactly the same** for both the image and its flipped version as they both belong to the same class. This is possible only if the outputs from the previous layer are also exactly the same. Extending this argument backwards (in the NN), we can say that every neuron of the NN should have identical outputs for the image and its flipped version.

Now let us say that w_{ij} are the weights of the i^{th} neuron in the first layer. The output of every neuron should be the same for the input image and its flipped version.

That is, before the non-linear activation for the i^{th} neuron, we require

w_{i1}(1)+w_{i2}(2)+w_{i3}(3)+w_{i4}(4)... = w_{i1}(4)+w_{i2}(3)+w_{i3}(2)+w_{i4}(1)+...

The above result should be the same for any image, not just the example image I discussed above. This is possible only if w_{i4}=w_{i1}, w_{i3}=w_{i2} .

Similarly, the following conditions should also be satisfied:

\begin{eqnarray}w_{i8}&=&w_{i5}, w_{i7}=w_{i6}\\w_{i12}&=&w_{i9}, w_{i11}=w_{i10}\\w_{i16}&=&w_{i13}, w_{i15}=w_{i14}\end{eqnarray}

So, during initialization of weights, we randomly initialize **only** the weights w_{i1},w_{i2},w_{i6},w_{i5},... and set the remaining weights equal to these initialized weights as per the constraints above. That is we initially set w_{i4} equal to w_{i1}, w_{i3} equal to w_{i2} and so on.

Even during backpropagation, we update **only** w_{i1},w_{i2},w_{i6},w_{i5},... using gradient descent. The remaining weights are updated as per the above constraints. That is, updated w_{i4} is set equal to updated w_{i1}, updated w_{i3} is set equal to updated w_{i2} and so on.

Generalization is discussed in the next comment.