I think I don’t fully understand how object detection works… In the course Andrew says that the mid point of the object is detected, how is that done? And also, if an object is bigger than the grid cell size, how does the net know that those are parts of the same object?
However, I still don’t understand how the “mid-point” of the object is defined… When we train the network, we don’t input such “mid-point”, so how is the network able to determine if the mid-point is in the cell or not?
The mid-points are assigned in creating the training set. Someone looks at each image, clicks a mouse on what they think is the center of the object, and draws a box around it.
The system learns that when it detects a specific pattern of pixels, it knows to output a box of the size it has learned for similar objects in the training set.