Understanding GRU trax code

Iaroslav_Iatsenko · August 31, 2023, 8:44am

Hi, I try to understand source code for GRU
Could you please help to understand, how different GRU cells combined together?
As GRU can handle any time length, it means GRU cells count is not connected to input time length.
It’s both changes with time (X_1, then X_2), and with multiple GRU cells. I understand idea of taking each time step and creating y_hat for that step. But how multiple GRU cells interact at that moment?
Also in source code it’s cb.Branch, cb.Scan and cb.Select for integration. Could you please describe logic what each of them does in GRU?

Thank you in advance

arvyzukai · September 1, 2023, 10:27am

Hi @Iaroslav_Iatsenko

You might find this thread helpful. TLDR version: there is often confusion between GRU “cells” and GRU “layer” (dimension / units).

They don’t, each “cell” produces a result, many “cells” produces many results which is the output of the “layer” (loosely speaking, many dimensions = many cells).

Also, this thread might also help understand the underlying calculations.

As for the cb.Branch, cb.Scan and cb.Select, there is a documentation for it:

In short, these functions are needed to implement the “layer” on top of “GRUCell” (the code you might want to spend more time looking into, because here is the essence of it). This post and this post might also help you with that.

Cheers

Topic		Replies	Views
Coding concerning stacking GRUs NLP with Sequence Models week-2	5	613	January 16, 2023
C3W1_RNNs How exactly are the GRU layers connected to each other NLP with Sequence Models week-1	3	245	April 5, 2024
Simple GRU initialization not working NLP with Sequence Models week-2	3	485	July 31, 2023
GRU assignment n_layers argument NLP with Sequence Models week-2	4	591	July 18, 2022
Model architecture: Embedding dimension size and GRU number of cells NLP with Sequence Models week-2	8	1142	January 3, 2023

Understanding GRU trax code

Related topics