I am not sure to which publication Mahmud(2021) refers.
In a paper from 2021, Mahmud, Morshed, and Hasan (https://arxiv.org/pdf/2107.02543) refer to an article by Loshchilov and Hutter (2019) (https://arxiv.org/pdf/1711.05101). In their turn, Loshchilov and Hutter refer to Hanson and Pratt (1988) as their source of the idea of weight decay. Their reference is in fact incorrect, as it should be pointing to the 2nd International Conference on Neural Information Processing Systems:
Stephen José Hanson and Lorien Y Pratt. Comparing biases for minimal network construction with back-propagation. In Proceedings of the 2nd International Conference on Neural Information Processing Systems, pp. 177–185, 1988. (https://dl.acm.org/doi/10.5555/2969735.2969756)
In their turn, Hanson and Pratt refer to a personal communication by David Rumelhart (1987) as the source of the idea. David Rumelhart - Wikipedia