What is the different between the two outputs in the transformer?
and why in the decoder the output in the bottom became input?
The question is not totally clear - what are you asking about - in both cases the outputs are probabilities (the output of softmax layer), all depends in what context these models are used and “their” dataset.
For example, the top one is probably taken from the original paper, where the model was trained to translate text. So the inputs in the left branch was the sentences in one language while the inputs (called Outputs - or outputs “so far”) in the right branch are the translated text (so far). The outputs are probabilities for the next word in the language that is being translated.
The bottom picture could be for any generative model (when model attempts to generate text (exactly as it was in the dataset but usually with some added randomness)) for example, like chatbots.