I just watched the RNN video and andrew talk about the parameters WA and BA and he used them on ever X so my question are does these parameter is the same on every word or they are different?

There are 2 inputs to an RNN layer. They are:

- a^{<t-1>} that comes from the previous timestep
- x^{<t>} which refers to the data from the current timestep.

W_a = [ W_{aa}; W_{ax}] i.e. 2 matrices concatenated horizontally.

W_b contains the bias term.

Using the 2 matrices above, we perform W_{aa} \cdot a^{<t-1>} + W_{ax} \cdot x^{<t>} + W_b

is Waa and Wax is different across every new input right ?

The matrices are shared across all timesteps. You’ll see more about this in the lectures and programming exercises on backpropagation of RNN and LSTM.

It is because the internal parameters are shared across all timesteps that the term BPTT exists.