Shape of mini-batches of input x

hi.
in the first code assignment of week 1 in sequence models, it says the shape of batch of input x is (n_x, m, Tx). according to my uploaded photos, It does not make sense in terms of indexing of numpy. becuase I think we should cover each token of each example in our first dimension, then cover the whole time step in second dimension and at the end cover all the batches.


1 Like

Look at rnn_forward and rnn_cell_forward and it’ll become clear that slicing till the last dimension makes things easy for this assignment.

Different frameworks prefer arranging sequence data differently.
Tensorflow aligns with your abstraction of sequence data since it expects input data to be of shape (batch size, number of timesteps, number of features per timestep).

The thing that matters is to use the same weights to perform forward propagation one timestep at a time and finally update the RNN layer via BPTT.

Thanks. I have just one question. When we prefer this arranging (n_y,m,Ty) for prediction y, after calculation for all of timesteps in mini-batch, if we want to know the first token of first example in the batch, is it this indexing right? y[:,0,0]

for example we have batch with 10 exapmles, n_y = 2 and T_x = 4. after computing updating all of matrix y, I want to know what is the first word of 5th exapmle. I show in the photo. is it true?
3

1 Like

Your understanding of y[:, 0, 0] is correct wrt the 0th example. Softmax along the features axis will yield the probability of next character. In your picture, the highlighted numbers correspond to 5th example since indexing starts at 0.

Here’s an example:

>>> import numpy as np
>>> np.random.seed(1)
>>> ys = np.random.normal(size=(2, 10, 4))
>>> ys
array([[[ 1.62434536, -0.61175641, -0.52817175, -1.07296862],
        [ 0.86540763, -2.3015387 ,  1.74481176, -0.7612069 ],
        [ 0.3190391 , -0.24937038,  1.46210794, -2.06014071],
        [-0.3224172 , -0.38405435,  1.13376944, -1.09989127],
        [-0.17242821, -0.87785842,  0.04221375,  0.58281521],
        [-1.10061918,  1.14472371,  0.90159072,  0.50249434],
        [ 0.90085595, -0.68372786, -0.12289023, -0.93576943],
        [-0.26788808,  0.53035547, -0.69166075, -0.39675353],
        [-0.6871727 , -0.84520564, -0.67124613, -0.0126646 ],
        [-1.11731035,  0.2344157 ,  1.65980218,  0.74204416]],

       [[-0.19183555, -0.88762896, -0.74715829,  1.6924546 ],
        [ 0.05080775, -0.63699565,  0.19091548,  2.10025514],
        [ 0.12015895,  0.61720311,  0.30017032, -0.35224985],
        [-1.1425182 , -0.34934272, -0.20889423,  0.58662319],
        [ 0.83898341,  0.93110208,  0.28558733,  0.88514116],
        [-0.75439794,  1.25286816,  0.51292982, -0.29809284],
        [ 0.48851815, -0.07557171,  1.13162939,  1.51981682],
        [ 2.18557541, -1.39649634, -1.44411381, -0.50446586],
        [ 0.16003707,  0.87616892,  0.31563495, -2.02220122],
        [-0.30620401,  0.82797464,  0.23009474,  0.76201118]]])
>>> softmax = lambda logits: np.exp(logits) / np.sum(np.exp(logits))
>>> first_word_fifth_example = ys[:, 4, 0]
>>> first_word_fifth_example
array([-0.17242821,  0.83898341])
>>> probabilities = softmax(first_word_fifth_example)
>>> probabilities
array([0.26670369, 0.73329631])
>>> np.sum(probabilities)
1.0
1 Like

This is driving me crazy too. The structure is not intuitive or aligned with the lecture (which loosely drilled down from batch → sample → Tx) and I dont see any explanation of what it is or why it was chosen. I want to know this stuff top-down. Anyone know this well enough to help a confused old dude :)?

The challenge is that it’s really difficult to convey a 3-dimemsional dataset on a flat 2D monitor.

Hey @TMosh. Nice to hear from you again. Yeah, I get it - i have several pages of pencil doodles trying to imbed this in me lil brain. @mahdi_khoshmaramzade pics are pretty good and helped convince me that what was stated is what was intended. But the layout seems counter-intuitive and I don’t know WHY it is crafted this way. I’ll poke through the exercise and see if it clarifies things. Thanks Tom