Hi, I haven’t been able to figure the first programming assignment of week 1. It says that mini-batch is composed of 20 examples and that nx = 5000(vocabulary) as well as total time steps = 10. what does this 20 trainig examples mean? Is it 20 words or 2 sentence?? i am having this confusing because it says, for each time step t, you’ll use 2D slice of shape(5000,20).
Shape of mini-batch is (n_x, m, T_x). This means:
- Each entry in the mini-batch is made of 10 timesteps i.e. 10 words, where each word is one-hot encoded to a vocabulary size of 5000
- Mini batch is made of 20 such entries.
Thinking in terms of tensorflow will be a lot easier since the batch dimension comes first. In tensorflow, such a problem is represented as (BATCH_SIZE, NUM_TIMESTEPS, NUM_FEATURES_PER_TIMESTEP)
i.e. (20, 10, 5000)
There are 20 training examples meaning 20 such sentences. In each time step, there is 1 word. Each word contains about 5000 vocab. The shape of time step(xt) is (20,5000). isn’t it wrong?
The shape of xt should be like (200,5000) since in 20 training examples, we will have around 200 words and each word has 5000 such vocab.
If you want to think in terms of a language, think of a batch as 20 sentences, where each sentence has 10 words and each word is one-hot encoded based on a vocabulary of size 5000.
x_t refers to the input to an RNN cell at a specific point in time. Do remember that the same RNN cell is used to process the entire sentence (i.e. 10 timesteps), one timestep at a time. Since we have batch in the last dimension, the shape of x_t is correct.
I’m sorry but 200 doesn’t make sense to me. Can you please elaborate?
would you please elaborate on the shape of xt?
As there are 10 timesteps for each example, isn’t the dimension (10,5000) for that example since each word has 5000 vocab?
In ML / DL, we process a batch of data at a time. So, the last dimension makes sense and can be ignored when reasoning rest of the dimensions.
Consider a 1 sentence of 10 words where the vocabulary has size 5000. x_t refers to the input to the RNN cell at timestep t
. (see this lecture as well).
Each timestep will consist of a word in the one-hot encoded form as input to the RNN cell. So, we’ll input a 5000 dimension vector.
Bringing back the batch dimension into light, we now have (5000, 20).
Hi there. This is totally different question .From my research through this exercise on dinosaur_character_level_language_model, i think in the sequence generation task, the RNN is modeled into such that the initial activations and the input are all zero vector and then we progress ahead by feeding the softmax activation (one with high probability) into the next input thereby giving this notion that the sequnce is generated without actually giving input. Is it true?
Your understanding is correct. a_prev
and x
start out as zero and carry outputs from previous timestep into the current timestep when generating a sequence of characters.