Actual Purpose of dropout_shared_mask

I​n week 3 assignment, I saw a lot of references to dropout_shared_mask in the code.

​ dropout_shared_axes (int): axes on which to share dropout mask. Defaults to None.

C​ould anyone provide more context about this ?

Hi user342,

As I understand it, sharing of dropout masks is done to enhance effectiveness of dropout as discussed in this paper, and it can also save memory, as stated in the trax docs here.

