Hi Mentors and friends,
I just done the section of " End-to-end Deep Learning". In this section, Prof mentioned about face detection. So, you have a picture, the NN will find the section (kinda make a square around the face) of the face, and zoom in.
If this is truly done by NN, may I ask, what’s the output of this NN look like? Because of what I learned so far, the output is just a yes or no label (e,g, yes or no it’s a cat, yes or no there is a car in the pic). but in this case face detection, the output should be a location (like X, Y ?)
I understand this is a little bit out of the scope, so I just need a very basic concept here, a tutorial link is good as well.