In the GRU summary output, I was trying to calculate the number of parameters displayed for each layer. I am not sure how to get the value 228,864 in the first layer.
Each of the 3 weight matrices have size 256 * (256 + 40).
Each of the 3 bias vectors have size 256
This only gives a total of 228,096.
Does someone know why I am short by 768 (which happens to be 256 * 3)?