Shape of mini-batches of input x

mahdi_khoshmaramzade · March 6, 2024, 1:04pm

hi.
in the first code assignment of week 1 in sequence models, it says the shape of batch of input x is (n_x, m, Tx). according to my uploaded photos, It does not make sense in terms of indexing of numpy. becuase I think we should cover each token of each example in our first dimension, then cover the whole time step in second dimension and at the end cover all the batches.

balaji.ambresh · March 6, 2024, 2:25pm

Look at rnn_forward and rnn_cell_forward and it’ll become clear that slicing till the last dimension makes things easy for this assignment.

Different frameworks prefer arranging sequence data differently.
Tensorflow aligns with your abstraction of sequence data since it expects input data to be of shape (batch size, number of timesteps, number of features per timestep).

The thing that matters is to use the same weights to perform forward propagation one timestep at a time and finally update the RNN layer via BPTT.

mahdi_khoshmaramzade · March 6, 2024, 4:33pm

Thanks. I have just one question. When we prefer this arranging (n_y,m,Ty) for prediction y, after calculation for all of timesteps in mini-batch, if we want to know the first token of first example in the batch, is it this indexing right? y[:,0,0]

for example we have batch with 10 exapmles, n_y = 2 and T_x = 4. after computing updating all of matrix y, I want to know what is the first word of 5th exapmle. I show in the photo. is it true?

balaji.ambresh · March 6, 2024, 5:27pm

Your understanding of y[:, 0, 0] is correct wrt the 0th example. Softmax along the features axis will yield the probability of next character. In your picture, the highlighted numbers correspond to 5th example since indexing starts at 0.

Here’s an example:

>>> import numpy as np
>>> np.random.seed(1)
>>> ys = np.random.normal(size=(2, 10, 4))
>>> ys
array([[[ 1.62434536, -0.61175641, -0.52817175, -1.07296862],
        [ 0.86540763, -2.3015387 ,  1.74481176, -0.7612069 ],
        [ 0.3190391 , -0.24937038,  1.46210794, -2.06014071],
        [-0.3224172 , -0.38405435,  1.13376944, -1.09989127],
        [-0.17242821, -0.87785842,  0.04221375,  0.58281521],
        [-1.10061918,  1.14472371,  0.90159072,  0.50249434],
        [ 0.90085595, -0.68372786, -0.12289023, -0.93576943],
        [-0.26788808,  0.53035547, -0.69166075, -0.39675353],
        [-0.6871727 , -0.84520564, -0.67124613, -0.0126646 ],
        [-1.11731035,  0.2344157 ,  1.65980218,  0.74204416]],

       [[-0.19183555, -0.88762896, -0.74715829,  1.6924546 ],
        [ 0.05080775, -0.63699565,  0.19091548,  2.10025514],
        [ 0.12015895,  0.61720311,  0.30017032, -0.35224985],
        [-1.1425182 , -0.34934272, -0.20889423,  0.58662319],
        [ 0.83898341,  0.93110208,  0.28558733,  0.88514116],
        [-0.75439794,  1.25286816,  0.51292982, -0.29809284],
        [ 0.48851815, -0.07557171,  1.13162939,  1.51981682],
        [ 2.18557541, -1.39649634, -1.44411381, -0.50446586],
        [ 0.16003707,  0.87616892,  0.31563495, -2.02220122],
        [-0.30620401,  0.82797464,  0.23009474,  0.76201118]]])
>>> softmax = lambda logits: np.exp(logits) / np.sum(np.exp(logits))
>>> first_word_fifth_example = ys[:, 4, 0]
>>> first_word_fifth_example
array([-0.17242821,  0.83898341])
>>> probabilities = softmax(first_word_fifth_example)
>>> probabilities
array([0.26670369, 0.73329631])
>>> np.sum(probabilities)
1.0

Dave_Cotton · May 2, 2024, 12:01am

This is driving me crazy too. The structure is not intuitive or aligned with the lecture (which loosely drilled down from batch → sample → Tx) and I dont see any explanation of what it is or why it was chosen. I want to know this stuff top-down. Anyone know this well enough to help a confused old dude :)?

TMosh · May 2, 2024, 12:16am

The challenge is that it’s really difficult to convey a 3-dimemsional dataset on a flat 2D monitor.

Dave_Cotton · May 2, 2024, 4:23pm

Hey @TMosh. Nice to hear from you again. Yeah, I get it - i have several pages of pencil doodles trying to imbed this in me lil brain. @mahdi_khoshmaramzade pics are pretty good and helped convince me that what was stated is what was intended. But the layout seems counter-intuitive and I don’t know WHY it is crafted this way. I’ll poke through the exercise and see if it clarifies things. Thanks Tom

Topic		Replies	Views
Doubt regarding the input examples in RNN Sequence Models coursera-platform	9	466	June 27, 2023
Not able to understand Sequence Models coursera-platform	3	515	October 29, 2024
Shape of minibatch - Week 1 DLS5 Sequence Models coursera-platform	1	471	May 8, 2023
Week4 Lab1: ValueError: Input 0 of layer sequential is incompatible with the layer: : expected min_ndim=3, found ndim=2. Full shape received: (None, None) Sequences, Time Series and Prediction week-module-4	4	582	August 5, 2022
Week3 Programming Exercise, Section 3.3 Train the Model Improving Deep Neural Networks: Hyperparameter tun coursera-platform	8	538	April 12, 2023

Shape of mini-batches of input x

Related topics