Understanding GRU trax code

Hi @Iaroslav_Iatsenko

You might find this thread helpful. TLDR version: there is often confusion between GRU “cells” and GRU “layer” (dimension / units).

They don’t, each “cell” produces a result, many “cells” produces many results which is the output of the “layer” (loosely speaking, many dimensions = many cells).

Also, this thread might also help understand the underlying calculations.

As for the cb.Branch, cb.Scan and cb.Select, there is a documentation for it:

In short, these functions are needed to implement the “layer” on top of “GRUCell” (the code you might want to spend more time looking into, because here is the essence of it). This post and this post might also help you with that.