About spatial transformer network(STN)

I’m reading a paper about STN and something is puzzling.
The paper talked about grids, and I understood that every feature map has a grid of its own. The network takes input feature maps and produces output feature maps.
The output feature map will have a regular grid, is that correct?