I was learning about different model training algorithms and found out that most algorithms assume dataset features to be normally distributed otherwise, it might lead to inaccuracy.
I researched about it but couldn’t get an intuitive understanding of it.
Kindly help.
I think this is related to a balanced dataset, if the dataset fed to the model is not balanced then the model most probably will favor a particular outcome which is occurring more than the others during training phase.
For example the presence of a disease is a rare thing (depending on decease of course) but if the model has seen only a few presences when trained then it will most probably predict not present when used for inference. Because after all these are probabilistic models.
1 Like