Why do the embeddings cluster

Are we supposed to understand why the embedding vectors learned by training the movie review classifier appear to cluster in the visualization tool? Is there an explanation for why exactly this happens? I understand this is what happens when minimizing the loss but my question is why

Please start here to understand how the embeddings are trained.