Question about Lab Description : Deep N-grams

Mohan_Singh_Randhava · February 10, 2023, 6:00pm

Regarding the description of the lab in the intro, is the schematic (included below) correct? Should not the input to the Dense layer prior to LogSoftmax be the predicted yhat* instead of the hidden state ht as depicted, since we are trying to predict the next character each time?

arvyzukai · February 11, 2023, 4:26pm

Hi @Mohan_Singh_Randhava

In that regard - no (though it depends on terminology), usually RNN “layer” (GRU included) return hidden states after each step. So the more accurate schema what happens would be:

Also note, that x_0, x_1, x_t here are vectors (after embedding) and not scalars.

In this Assignment the model outputs the prediction for every character but we take just the one we need.

Here is an example of predicting the next character, when the input is ‘t’:

====================================================================================
# starting with letter 't', predict up to 32 characters
#print(predict(32, "t"))  

# the input 't' is changed to integer
# inp:
# [116]

# input is changed to array
DeviceArray([116,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
               0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
               0,   0,   0,   0,   0,   0,   0,   0,   0], dtype=int32)

# added batch dimension
DeviceArray([[116,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
                0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
                0,   0,   0,   0,   0,   0,   0,   0,   0]], dtype=int32)

#input.shape
#(1, 33)

#inputs
DeviceArray([[116,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
                0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
                0,   0,   0,   0,   0,   0,   0,   0,   0]], dtype=int32)
===============================================================================
=====
# ShiftRight layer
Serial[
  ShiftRight(1)
]

# outputs
DeviceArray([[  0, 116,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
                0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
                0,   0,   0,   0,   0,   0,   0,   0,   0]], dtype=int32)

# outputs.shape
(1, 33)

====================================================================================
# Embedding layer
Embedding_256_512

# layer weights shape
(256, 512)

# layer outputs
DeviceArray([[[ 0.04724933, -0.02226537, -0.07821855, ...,  0.08575993,
               -0.00808371,  0.02202921],
              [ 0.10795548, -0.03526021, -0.08854008, ...,  0.01529627,
                0.0868175 ,  0.09092615],
              [ 0.04724933, -0.02226537, -0.07821855, ...,  0.08575993,
               -0.00808371,  0.02202921],
              ...,
              [ 0.04724933, -0.02226537, -0.07821855, ...,  0.08575993,
               -0.00808371,  0.02202921],
              [ 0.04724933, -0.02226537, -0.07821855, ...,  0.08575993,
               -0.00808371,  0.02202921],
              [ 0.04724933, -0.02226537, -0.07821855, ...,  0.08575993,
               -0.00808371,  0.02202921]]], dtype=float32)

# note all `0` have the same embedding

# outputs.shape
(1, 33, 512)

====================================================================================
# first GRU layer
GRU_512

# layer outputs
DeviceArray([[[ 0.20887429,  0.42045066,  0.24031094, ..., -0.25590184,
               -0.20555982, -0.26257256],
              [ 0.6903972 ,  0.563846  ,  0.6010454 , ..., -0.9051461 ,
               -0.506419  , -0.46426502],
              [ 0.15103428,  0.814925  ,  0.2067267 , ..., -0.29996556,
               -0.49159542, -0.5896287 ],
              ...,
              [ 0.73863196,  0.6200977 ,  0.63184285, ..., -0.9044281 ,
               -0.08520836, -0.6938022 ],
              [ 0.73863304,  0.62009877,  0.63184524, ..., -0.9044295 ,
               -0.0852062 , -0.69380426],
              [ 0.7386341 ,  0.62009937,  0.63184786, ..., -0.9044308 ,
               -0.0852043 , -0.6938063 ]]], dtype=float32)

# note all `0` have slightly different values

# outputs.shape
(1, 32, 512)

======================================================================================
# second GRU layer
GRU_512

# outputs
DeviceArray([[[-0.02699553,  0.68844396, -0.45988116, ..., -0.64575034,
                0.99919385, -0.9338472 ],
              [-0.13541652, -0.9974171 , -0.46418995, ..., -0.65407974,
                0.9975511 , -0.9740795 ],
              [-0.62881285, -0.4357384 , -0.4716416 , ..., -0.672858  ,
                0.9999947 , -0.9998259 ],
              ...,
              [-1.        ,  0.87456065, -0.5058002 , ..., -0.7197592 ,
                0.9775919 , -0.99944305],
              [-1.        ,  0.87489724, -0.5069449 , ..., -0.72155267,
                0.9775692 , -0.99944377],
              [-1.        ,  0.8752117 , -0.50808483, ..., -0.7233412 ,
                0.9775478 , -0.99944407]]], dtype=float32)

# outputs.shape
(1, 32, 512)

======================================================================================
# Linear layer
Dense_256

# outputs
DeviceArray([[[-4.624827  , -2.8908231 , -5.1104207 , ..., -4.9862876 ,
               -4.7495956 , -5.9244623 ],
              [-3.6607375 , -0.39768016, -4.6184626 , ..., -5.850813  ,
               -4.533198  , -5.199797  ],
              [-6.8010573 , -2.9996667 , -8.04726   , ..., -6.9786253 ,
               -6.5035443 , -8.284581  ],
              ...,
              [-8.779667  , -1.2761917 , -9.385116  , ..., -7.8552837 ,
               -8.279278  , -9.4966755 ],
              [-8.781777  , -1.2657752 , -9.386472  , ..., -7.8565826 ,
               -8.281515  , -9.496905  ],
              [-8.783843  , -1.2556825 , -9.387917  , ..., -7.8577847 ,
               -8.283754  , -9.497269  ]]], dtype=float32)

# outputs.shape
(1, 32, 256)

======================================================================================
# LogSoftmax layer
LogSoftmax

# outputs
DeviceArray([[[-13.580183, -11.84618 , -14.065777, ..., -13.941645,
               -13.704952, -14.879819],
              [-15.388636, -12.125579, -16.346361, ..., -17.578712,
               -16.261097, -16.927696],
              [-17.529985, -13.728596, -18.776188, ..., -17.707554,
               -17.232473, -19.01351 ],
              ...,
              [-19.89362 , -12.390145, -20.49907 , ..., -18.969238,
               -19.39323 , -20.61063 ],
              [-19.894333, -12.378332, -20.499027, ..., -18.96914 ,
               -19.394072, -20.609463],
              [-19.895145, -12.366985, -20.49922 , ..., -18.969088,
               -19.395058, -20.608572]]], dtype=float32)

# outputs.shape
(1, 32, 256)

======================================================================================
# make a prediction according to previous outputs and added noise

# np.argmax(log_probs + g * temperature, axis=-1)

# next_char
DeviceArray(111, dtype=int32)

# add prediction to current input
# inp += [int(next_char)]
# inp
[116, 111]

# which translates to 'o'
# result.append(chr(int(next_char)))
# result
['t', 'o']

======================================================================================
======================================================================================
# the next input for the whole sequence becomes
DeviceArray([116, 111,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
               0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
               0,   0,   0,   0,   0,   0,   0,   0], dtype=int32)

Cheers

Topic		Replies	Views
Creating a GRU model using Trax NLP with Sequence Models week-module-2	3	734	July 26, 2022
Unexplained, confusing and missing detail in Assignment C3W1 NLP with Sequence Models week-module-1	4	78	October 19, 2024
Assignment of Course 3 week 2 NLP with Sequence Models week-module-2	1	567	December 5, 2022
What is the dense layer for in week 3 assignment? NLP with Sequence Models week-module-3	3	586	April 11, 2022
Model architecture: Embedding dimension size and GRU number of cells NLP with Sequence Models week-module-2	8	1170	January 3, 2023

Question about Lab Description : Deep N-grams

Related topics