Hi, so I felt maybe this course was a little too brief, but still interesting and informative.
To be honest I really have no personal interest in creating a video generation model-- Though I am rather curious about the underlying theory, applications, and methods used.
It strikes me as if, somehow, this must be like ‘sustained attention’-- But for a sequence of images, which must be pretty tough.
My thinking is these methods would be interesting for other applications, so I didn’t know if anyone could cite some papers or similar as to, with video gen, exactly how are they pulling this off ?