hi there, can someone explain to me why does flattening the last 2 values make it simpler and why does it work?
because after flattening this step instead doesnt become as str8 forward as it was b4 flattening.
tldr: having trouble visualising how flattening works in this context.
this section of the notebook is doing a task call image segmentation, and it is doing it at the grid cell-level so it wants the output to be 19x19. notice that YOLO v2 can predict different classes at the anchor box level of granularity, but this harder to visualize. so this section just takes the highest probability class for all the anchor boxes within each grid cell. the flattening is just a way to reduce the granularity of the output from 19x19x5 to 19x19.
note that this is only done for the image segmentation task, and isn’t really part of YOLO itself. when you do predictions on bounding box location further down in the notebook, it’s at the 19x19x5 granularity.
oh so almost like just finding the most possible class out of the 5 bounding boxes per grid and then displaying that value am i right to say so ?
1 Like
Yes. That is image segmentation in a nutshell
1 Like