Week 1 assigniment 1

Devasish.Y · August 12, 2024, 1:34pm

I had a doubt regarding why we representing the input as (nx,m,Tx) and output as (ny,m,Ty) and also nx and ny can be different? i am confused over what advantage we get by representing like this instead of (m,nx,Tx)? if i want feature vector of a word at particular timestep how do i extract it?

nadtriana · August 12, 2024, 3:30pm

The use of (nx, m, Tx) and (ny, m, Ty) is common in RNN implementations because it aligns well with how data is processed during both the forward and backward passes, providing both memory efficiency (keeping m, the batch size, as the middle dimension allows efficient memory access during computation) and ease of access to specific time steps or batches (for example, to get the features of the first word over the entire batch, one can slice along the time dimension). The choice to have nx and ny as different dimensions allows flexibility in handling different input and output feature sizes, especially in tasks like machine translation, where the input and output vocabulary sizes may differ. If we use (m, nx, Tx), it would mean that we place the batch size first, which is common in fully connected networks, but less so in RNNs due to the need to handle sequences efficiently. A drawback of (m, nx, Tx) is that accessing features at a given time step over all examples becomes less intuitive, and it may not fit well with the underlying computational optimizations of many deep learning libraries. If you want to extract the feature vector of a word at a given time step t, you would typically do: xt = x[: , : , t] and this gives you a matrix of shape (nx, m). Here, xt will contain the feature vectors for all words at time step t across all examples in the mini-batch.

balaji.ambresh · August 12, 2024, 3:45pm

Sorry to interrupt but (batch_size, timesteps, features) is in wide usage now. Guessing that the underlying library has an influence on the input format, please see this link to see how oneDNN library encouraged pytorch to support channel last approach.

nadtriana · August 12, 2024, 4:05pm

Thanks for bringing this up! The choice of input format depends on the framework and the underlying hardware optimizations. You’re right, in PyTorch, the (batch_size, timesteps, features) format is widely used.

Devasish.Y · August 12, 2024, 4:41pm

what will be the size of vector n_a and does this depend on any parameters in RNN, for example like size of vocab or its designer choice? please clarify it.

nadtriana · August 12, 2024, 4:58pm

na is considered a hyperparameter of the RNN and is typically chosen by experimentation or using heuristics based on similar tasks. It does not directly depend on the other parameters. While nx and ny are determined by the problem at hand (e.g., vocabulary size, embedding size), na is chosen independently based on how much “memory” or “context” the RNN needs to keep to perform well on the task.

balaji.ambresh · August 13, 2024, 1:31am

The 1st link I shared is from tensorflow docs. Both platforms support channels last format.

nadtriana · August 13, 2024, 2:13am

Thanks for pointing this out! TensorFlow and PyTorch often prefer the (batch_size, timesteps, features) format, mainly due to the performance benefits of optimizations such as those provided by the oneDNN library.

Topic		Replies	Views
Not able to understand Sequence Models coursera-platform	3	527	October 29, 2024
Shape of mini-batches of input x Sequence Models week-module-1 , coursera-platform	6	281	May 2, 2024
RNN Shapes Clarification Sequence Models coursera-platform	2	542	July 4, 2022
Doubt regarding the input examples in RNN Sequence Models coursera-platform	9	487	June 27, 2023
Understanding rnn_forward in utils.py of W1 A2 sampling dinosaur names Sequence Models coursera-platform	5	623	November 3, 2021

Week 1 assigniment 1

Related topics