Hello @gauravv,

In short, the neurons in a layer learn different things because they are different at the beginning. When building a model with Tensorflow, we need to initialize the weights for each neuron to some values, and by default they are initialized randomly so that it is essentially impossible for any two weights to share the same value. This initial diversity allows weights to go through different learning paths throughout the process of gradient descent, and ending up learning different things.

Above is how I would explain this without maths, but if you want some simple maths and an example to persuade yourself, you may read this.

Cheers,

Raymond