Building CNN for Hat Identification

Hello, I am new to CNNs and have ran into an issues regarding my first project. My objective is to train a model that is able to detect if a person in an image has a hat on their head. I am using the CelebA dataset, and for now training on the first 10,000 images.

This issue I am running into is that during training the accuracy and loss of the model is actually getting worse.

I am new to this field and troubleshooting so any advice would be very helpful.

Here is the github link to the notebook I have been working on so far.

Interesting! Well, I have not tried using the CelebA dataset directly before, but they do use it in one of the GANs courses.

How much investigation have you done to make sure your logic for loading the images and labels is correct? Have you printed the file names you are getting and the label values?

Also note that it looks like you are using 1 as the label to say “hat” and -1 to say “no hat”. That’s not the normal way of doing binary classifications and the loss function is not going to understand that: it expects the labels to act like Booleans basically (1 for true and 0 for false). The predictions from the network are the output of sigmoid, so the values will be between 0 and 1, right? Look at the formula for cross entropy loss and ask yourself what happens if the label value is -1.

Also just at a first glance, the CNN you are using is pretty simple. Have you taken DLS Course 4 about CNNs? They give lots of examples of successful nets there and most of them have lots more than 3 Conv layers. But with that said, even if your network is not as powerful as you really need, you still wouldn’t expect it to get worse with training. So my bet is that it’s the issues I mentioned earlier that are the first things to sort out and then see how you do with the network as it stands.

Hi, I took another look today and you were right, I forgot to change the labels from -1 to 0.

I am still new to deep learning so I was trying to create a basic model that “works”. I will take a look into the deep learning specialization but I just wanted to mess around with NNs a bit.

Thanks you for your help!