Hi everyone! I am starting an ambitious multi-class image classification project inspired by Dressing Your Truth by Carol Tuttle, where facial profiling is used to classify four types of energy or beauty. (Type 1, Type 2, Type 3, or Type 4)
Going from structured assignments with conveniently packaged data sets to real data is going to be a multi-phase journey. (Luckily I’m stubborn.)
Any tips for facial image classification would be greatly appreciated!
DETAILS:
I’m using celebrity images that have been typed and will progressively pre-process them to see if results improve…
a) Backgrounds
b) Color (since shape is the main determiner)
c) Hair
d) Jewelry, microphones and other potentially distracting elements
My baseline results seemed impressive with just 30 - 40 unprocessed images per type, but that was only with “aggressive” background removal that only left facial outline and eyes…
I was typed as 2 (primary) and 4 (secondary). Initial results correctly predicted primary and secondary in the reverse order.
Results got progressively worse when I removed background removal (my primary type was predicted as least shown in the data.)
So, more images with significant processing are required.
I’m in course 3 of the TensorFlow certification. My use-case was all about text, but it was super helpful to start with images.
For phase 1: I used code to pre-process the images.
For phases 2 and beyond, I’ve be manually pre-processing the images using bulk background removers, croppers, and “spot” item removers for things like earrings.
Phases 3 and beyond might involve removing hair to see if that yields better results.
It will also be interesting to see if going from color to grayscale improves performance as well.
For this project, I’m embracing an experimental approach from one of my son’s favorite memes, “Flock around and find out.”
I recommend you try using a Convolutional Neural Network for any task that involves images. It is capable of developing filters that will automatically ignore characteristics in the images that do not correlate to the labels. Then you don’t really need to pre-process individual images.
I’m using the CNN structure I learned in the first two courses of the TensorFlow certification.
The labels are simply Type 1, Type 2, Type 3, and Type 4, so I’m not sure how the model will know to only read facial shapes and ignore anything extraneous like backgrounds, hair, earrings, microphones, etc.
If there are shortcuts, I’m definitely interested in hearing them!
Even with bulk editors, image processing is taking a while.
Surely, image processing is not a lightweight task.
First thing to try is use the original images and make them monochrome, as every extra dimension roughly squares the amount of computation needed.
You’ll also need a boatload of labeled images, and some sort of paid account so you can use a lot of GPU time for training.
Hand-masking the training images is a losing game. The CNN will automatically learn small weight values for any features of the images that don’t correlate to the labels. That’s largely the whole purpose of deep learning.
Detailed pre-processing of the dataset is to be avoided unless you can’t get good results without it.