Why is it important to use the same weights to predict all the words in the sentence?

In the Jazz assignment (Course 5, Week 1), the LSTM layer is shared across the time steps. Why is it important to use the same weights to predict all the words in the sentence?

If you used different weights for each word, the computational cost of training would be huge.

1 Like