The result of embedding + positional encoding would become large in this case.
Scale of inputs is important for faster convergence.
The result of embedding + positional encoding would become large in this case.
Scale of inputs is important for faster convergence.