Coding concerning stacking GRUs

What’s the difference between the two? I tried to implement the layers using the second way, and see errors in the assignment where the matrices doesn’t match sizes at the end when making predictions from test samples (while training and evaluations works), while the first one has no issue.

Hey @roger.lee,
The answer to your query lies in how trax initializes the layer. When you write the following tl.GRU(n_units = model_dimension), it initializes this layer with random weights, and each time you initialize this, the layer will have different weights. This holds true even for GRUCell, since the GRU layer is simply GRUCell in the Scan function, that we discussed in one of the ungraded labs.

For a better understanding, you may have a look at the init_weights_and_state method of the GRUCell class here. I have also attached some code below for your understanding.


n_layers = 2
model_dimension = 100

gru_layer = tl.GRU(n_units = model_dimension)
a = [gru_layer for _ in range(n_layers)]
b = [gru_layer] * n_layers
print(a == b, a[0] == a[1], b[0] == b[1])

gru_layer_1 = tl.GRU(n_units = model_dimension)
gru_layer_2 = tl.GRU(n_units = model_dimension)
print(gru_layer_1 == gru_layer_2)

gru_cell_1 = tl.GRUCell(n_units = model_dimension)
gru_cell_2 = tl.GRUCell(n_units = model_dimension)
print(gru_cell_1 == gru_cell_2)


True True True

Now, if you take a look at the tl.GRU()'s implementation, I don’t suppose, it allows us to initialize the different layers with the same weights. I tried to search it online, but to no avail.

Hey @arvyzukai, can you please let us know if there is a way in trax, that allows us to manually set the weights for tl.GRU()? Thanks in advance.


1 Like

Thank you so much for your clear explanation, didn’t expect the reply to be so quick!!

There is one more follow-up question I have regarding the two implementations.

If I use the first implementation, it produces no error.
However, if I use the second implementation for GRU, it has an error when I’m running UNQ_C6

Error code as follows:

TypeError: dot_general requires contracting dimensions to have the same shape, got [512] and [1024].

It seems the two implementations are not the same, apart from the differences in weight initialisation.

Hey @roger.lee,
I believe you have missed an important point, regarding the distinction between the first implementation and the second implementation.

In the first implementation, the two GRU layers in the list are themselves different from each other, whereas, in the second implementation, there is basically a single GRU layer only, which has been copied 2 times to fill the list. Hence, in this case, if you pass any parameter to the first element, the affects on the first element will be passed on to the second element as well, since both elements are after all, a single GRU layer.

For your reference, I have attached a small piece of code:


lst1 = [{2} for _ in range(2)]
lst2 = [{2}] * 2

print(id(lst1[0]), id(lst1[1]))
print(id(lst2[0]), id(lst2[1]))


140278346922912 140278345848640
140278345850656 140278345850656

As you can see, in the first case, the addresses of both the elements are different, indicating different elements, but in the second case, both the elements have the same address, indicating that any change made to the first element, will be made to the second element as well.

I hope this helps.


1 Like

Hey @Elemento

My apologies for late response :slight_smile:

  • if you want just a copy, a simple way is to save to a file and load :
# print(embedded.shape)   # (2, 10, 512)

gru_1 = tl.GRU(512)
gru_1.init_weights_and_state(embedded)  ## initialize weights with `embedded` to let trax know the input shape

gru_2 = tl.GRU(512)
gru_2.init_from_file('test_gru_1.pkl.gz', input_signature=embedded)
# note initializes as numpy array and not DeviceArray
(gru_1(embedded) == gru_2(embedded))  # True
  • if you want to set weights manually for some concrete layer:
gru_3 = tl.GRU(512)
gru_3.weights = gru_1.weights
(gru_1(embedded) == gru_3(embedded))  # True
  • if you want to set weights manually for some concrete part of the GRU (update, reset etc), the you need to know the structure of the layer. For example, for GRU:
# First
gru_1_weights[0]  # - an empty tuple of 2 (for Branch) (Select[0,0]_out2, Parallel_in2_out2)
# ((), ((), ()))  # - len(gru_1_weights[0]) == 2 (a single tuple and a tuple of two tuples for Parallel)

# Second - GRU weights
gru_1_weights[1][0]  # 4 weights for gru in trax (len(gru_test_weights[1][0]) == 4)
# input_shape - concatenation of x and h.
gru_1_weights[1][0][0]  # w1 (input_shape x 2 * n_units) - Update and reset (1024, 1024)
gru_1_weights[1][0][1]  # b1 (2 * n_units)  (1024,)
gru_1_weights[1][0][2]  # w2 (input_shape x n_units) - Candidate (1024, 512)
gru_1_weights[1][0][3]  # b2 (n_units) - Candidate (512,)

# Third
gru_1_weights[2]  # - for Select - empty tuple
# ()

For example to change “Candidate” weights:

change_cand = np.array(numpy.random.randn(1024, 512))
# since tuple is imutable, you need to create a new one
new_weights = (((), ((), ())),
              ((gru_1_weights[1][0][0], gru_1_weights[1][0][1], change_cand, gru_1_weights[1][0][3]),),
gru_3.weights = new_weights
(gru_1(embedded) == gru_3(embedded))  # False (weights changed)


Hey @arvyzukai,
Thanks a lot for the detailed response. I will surely bookmark it for future reference :nerd_face: