Question on Fused Kernel and Teacher Forcing (Module 2)

I may not fully follow when Sharon talks about Teacher forcing and fused kernel, although I did try to chatgpt to find the meaning of those two terms:

  1. Teacher forcing - just directly pass in the True tokens from previous outputs during Training time
  2. Fused Kernel - an example is cross-entropy loss usually has a fused kernel combining softmax and logprob operations together for efficient processing.

I’m a bit confused as to where/how Sharon is depicting how those two relates to the example she’s going through in her video. Did any one catch that?