Sampling Novel Sequences

ajaykumar3456 · January 11, 2023, 11:12pm

Hello Team,

In the RNN shown in the lecture, A<0> and X<0> are filled with zeros in the start of the training and the first layer should predict yhat<1> which is the probability vector which has the shape of the length of the dictionary. So, we need to select the word with highest probability which will be the next word.
The Model will give out bad results in the start but will improve eventually. Does this makes sense? How does Sampling from the output distribution help because you are selecting at random which has the least probability of being the next word?

TMosh · January 12, 2023, 3:21am

In this example, we use random sampling of the output in order to get a wider variety of outputs.
If you only use the output with the highest probability, you only get one prediction, and that’s not very interesting if you’re trying to generate a lot of possible names.

ajaykumar3456 · January 12, 2023, 9:17pm

The Point I wanted to mention was if the A<1> was “harry” then the next word should be “Potter” to be the best fit. We don’t want Harry Arora or Harry Kane right. Suppose if there are only 3 words in the entire dictionary.

paulinpaloalto · January 12, 2023, 10:11pm

Whether “potter” is the best fit all depends on what your training data looks like, right? Does it contain any other last names paired with “harry”? The point is that the dictionary is not all that matters: that’s just your vocabulary. The training “corpus” tells you how that vocabulary is combined to form sentences and that’s what is used to train the weights in the RNN node.

ajaykumar3456 · January 13, 2023, 1:10am

So, the solution you provided is not in the context of Next word prediction because the next word prediction works with having the highest probability. is that right?

paulinpaloalto · January 13, 2023, 1:31am

Sorry, I think we are talking at cross purposes here. Yes, this is all based on selecting high probability next words: that’s what the model is trained to do. But Tom’s earlier point is that when you use the trained model to do the sampling, they introduce another level of randomness by selecting randomly from the highest probability possible words, so that we don’t get the same result if we rerun with the same inputs. They don’t just always select the single highest probability next word or next letter in the case of the dinosaur names.

ajaykumar3456 · January 13, 2023, 3:42am

Got it. Thanks @paulinpaloalto

Topic		Replies	Views
Week 1 Sampling Novel Sequences Sequence Models coursera-platform	4	595	June 20, 2024
DLS course 5 week 1 Sequence Models coursera-platform	2	493	April 30, 2023
RNN for speech recognition Sequence Models coursera-platform	4	539	March 15, 2023
Why take random sample for prediction and not the sample with maximum probability Sequence Models coursera-platform	3	557	December 25, 2022
Why y for the probability distribution Sequence Models coursera-platform	3	605	May 17, 2021

Sampling Novel Sequences

Related topics