There are lots of moving parts here and many ways to go off the rails. The hardest parts are dealing with the “input space” versus the “output space”. Here’s a thread which gives a really nice description of the algorithms in words.
The key point is that the loops are over the output space and we must touch every position in the output space. The “striding” happens in the input space. That means all the for loops have a “step” of 1, right?