In the middle of this assignment, a “what you should remember” says " Run one step of forward propagation to get 𝑎⟨1⟩ (your first character) and 𝑦̂ ⟨1⟩ (probability distribution for the following character)"

but why a<1> is the first character? Isn’t it come from the 𝑦̂ ⟨1⟩ probability?
I don’t think the character has something to do with the a value
or do I misunderstand something?

The ‘a’ values are the probabilities for each possible letter.
The ‘y’ value is the prediction - the letter for the ‘a’ value that has the highest probability.

above this “what you should remember”, it says clearly that y is the probability and a is just hidden state. And softmax(W*a+b)=y, according y the probability to get the character with max probability