Nevermind, I have solved this problem. It originates from UNQ_C5, where I mistaken set the d_head as d_feature for both the compute_attention_heads_closure and compute_attention_output_closure functions. It has been resolved.
Nevermind, I have solved this problem. It originates from UNQ_C5, where I mistaken set the d_head as d_feature for both the compute_attention_heads_closure and compute_attention_output_closure functions. It has been resolved.