In WGAN, why not include the penalty gp term in the gen_loss calculation?

In WGAN, why not include the penalty gp term in the gen_loss calculation?
The minmax cost function has the penalty term which include x_hat = epsilon*real + (1-epsilon) * g(noise). It seems to have the generator function included.

GP can be used, its not like it cannot be used. In case of WGAN facing exploding or vanishing gradients, GP is shown to help in easier optimizations and convergence. This paper (NeurIPS 2017) explains it in detail.

Thank you for the nice information!