kindly explain in simple words why tanh is better than sigmoid function?

Mean of your data closer to 0 rather than 0.5 and make learning easy for next layer. kindly elaborate this point of instructor .

The sigmoid function is useful as the output layer in classification tasks, in which one tries to ascertain the probability of an object (e.g. an image) belonging to a specific class (e.g. a cat). In terms of probability, it is a valid “cumulative density function.” Specifically, it is monotonically increasing in the range between zero and one. As such, its output can be interpreted as a probability. It is important to note that it evaluates to 0.5 at 0.

The tanh has the same “S-shape” as the sigmoid, but it ranges between -1 and 1 and evaluates to 0 at “Z=0.” This is a useful property for the hidden layers as it is usually the case that the data and the inputs are normalized so that they have mean zero and a unit standard deviation (for a number of reasons which will become clear in the second course).

The S-shape implies that as the linear activation Z, evaluates further from zero, the activation becomes stronger–either negatively or positively–and the gradient becomes smaller.