A Small face recognition model

Greetings!!
Need your advice here:
I need to demonstrate a face recognition model that can be quickly retrained (transfer learning) to identiy new faces and transported over a low data rate (1 Mbps) wireless network to a Raspberry PI 4 device in real-time. The whole process of retraining and transporting should not take more than 3 minutes.
The objective of this exercise is to show face recognition models can be transported over low data rate wireless networks such as IEEE 802.15.4 and run on tiny resource constrained devices.

Please advise how can I go about this. Thank you!

Hi there,

specifically on your problem, I cannot judge on the outlined non-functional requirements and if the approaches can solve this (quickly) or if there are better alternatives, but this is how I would tackle the problem:

Regarding your model: Feel free to check out:

I guess you already have the data set for finetuning the model, right?

So concept-wise in order to meet your non-functional requirements in my understanding you should:

  • store as many information like weights of your model on your edge device (Raspberry Pi) as possible
  • and only transfer the weights of the trainable layer (after successful retraining and validation) as updates to minimize data transfer during deployment.

Regarding deployment, TFlite might be worth a look for your Raspberry Pi. I believe also fundamental knowledge about automatising the solution will be required:

Is this going into the right direction, @David00?

Best regards
Christian

thank you, @Christian_Simonis, for your insightful response. yes, what you are writing makes a lot of sense.
a couple of question:

  1. How can I extract just the weights from the top layers?
  2. How can I incorporate those parameters on the receiver side after receiving the new set of parameters?

Concerning your questions:

  1. You can get the weights of your NN with the get_weights() function in Tensorflow, returning the current weights of the layer, as NumPy array. You can then take only the finetuned weights of your trainable layer and send them to your edge device, see also: tf.keras.layers.Layer  |  TensorFlow v2.11.0

  2. After sending your finetuned weights to the edge, you can update the tflite model with the set_tensor() function and then invoke your model, see also: tf.lite.Interpreter  |  TensorFlow v2.11.0

I have not personally conducted exactly these steps which you want to do l, but a couple of months ago I was given the AIY vision kit as a present where also deployment of models on a Raspberry Pi are relevant and I also played a bit around with it. I just noticed that there is also a face detection API, see: aiy.vision.models — AIY Projects 2018-11-16 documentation

Here you can also find a tutorial on how to retrain a classification model for the AIY kit with TF: Google Colab

Feel free to check this out. I would be curious how it goes and if with this approach you can reach your goals. Please keep us posted, @David00!

Best regards
Christian

In addition to @Christian_Simonis comments, because you have well-predefined constraints, I think a good approach is to make tests with a series of models and hyperparameters and see which combination is better for your scenario. Maybe, making use of ML auto can help accelerate your research and contribute to finding the best solution to your problem.

Keep learning!

Hi @carlosrl

I guess @David00‘s requirements:

would maybe not allow to do extensive hyperparameter tuning everytime. But it can be done once at the very beginning of course to tailor the model to the finetuning task in order to reach a outperformance compared to the default hyperparameters of the pretrained model. Thanks for your hint, @carlosrl!

Best regards
Christian

Thank you, @Christian_Simonis and @carlosrl for your inputs.
Based on my limited understanding of face recognition, I think I don’t need to rebuild the whole model or change the hyperparams every time.
Let me explain the problem more: lets say all the units (RPIs with cameras) are trained to recognize a set of users. These units can be distributed around large buildings. If I need to add a small set of new faces (for giving access or block acccess) to the set of recognized faces, then I need to update the model for just this purpose. Would just changing weights in the top layer work in this case? What would be the number of images of new faces I would require for retraining? Please advise. Thank you!

2 Likes

With the approach I provided I believe yes! Did you check the resources and links mentioned above? Are the steps feasible and useful from your perspective, @David00?

The classes you are trying to predict would grow over time (like you start with 5 classes but eventually would like to get to 12 classes for example), right?

I think the number of training examples cannot be answered quickly with just a number „like at least 20“ or so because it highly depends on the quality, also size of visible faces and more context like angle of pictures etc. For example on my iPhone it seems the SW needs maybe a handful to get good results but in the end it really depends on more factors. In case this number of training examples is not crucial for your business application, I would just suggest to start and check in an iterative way by trial and error how many you would end up with? Otherwise you should also give some more thoughts to your labelling strategy to get sufficient and representative labels to enable your ML system.

Best regards
Christian

Yes, I read these links. thank you for sharing them!
Based on your inputs, I am thinking of the following:

  • On the training side, use public image databases to train an established model such as VGG_16.

  • Using transfer learning, train the model to recognize local faces

  • Transfer the TensorFlow model to receiver RPI4 by writing to the SD card over SCP or over USB.

  • When a new face has to be recognized, train the saved model with multiple images of that face. While training, keep only the top layers as trainable

  • Get the weights of the top layers using the following steps:
    ** Use the get_layer function of the model to get all the top layers that were retrained
    ** for each of the layers, use get_weights to get their weights, Save the weights in a file

  • Transport those weights to the receiver over the aforementioned wireless network

  • Load the saved TensorFlow model on the receiver side

  • Update the weights of the layers with the newly received weights

  • Convert the model to a TFLite one

  • Then run the updated TFLite model in tf.lite interpreter

Is my understanding correct?
In the approach I have described, a couple of things to notice:

  1. I am converting the model to a TFLite one only just before running the model to infer. That means, I am transporting TF model and its weights and not TFLite model and its weights. Sorry, I do not understand how the set_tensor function can be used to modify weights.
  2. it is not possible to change the model itself - say add/delete/modify - the layers. But, that is ok.

the wireless network needs a RPI 4. so will use that for hardware.

Thank you for your continued support and advice.

1 Like

I am open to buying this course and material to learn. But, I am trying to see the table of contents.

1 Like

Overall the plans sounds reasonable to me; I believe couple of points are not certain if they work out 100% and of course you also have some trade-offs. Here are my comments:

  • you should work with a validation (or dev) and test set in addition to your train set to make sure you would not overfit on the train set, see also: How and why do training and cross validations sets wear out in time? - #3 by Christian_Simonis
  • dependent on how close you are to your performance limits: you can think of model pruning before sending the weights to the edge device, see also: Pruning in Keras example  |  TensorFlow Model Optimization
  • you can change the weights with the set_tensor() function in tflite if you want to minimize data transfer and exploit prior knowledge of the frozen layers. Or alternatively you can also learn the new weights in the finetuned model and then change the whole model to a tflite model. Probably this would be the easier way to get your system running in a first step. But if you need to reduce data transfer probably not the whole model but rather the new weights only should be transferred in a zipped way, considering also the communication protocol overall.
  • when you change the model architecture (e.g. modifying the trainable layers), you need to think about the consequences with respect to data transfer to make sure that your Raspberry Pi as edge device has all the information to interpret the new model correctly (besides correct tf version, also knowledge about the recent architecture (e.g. number of neurons, activation functions, …)) .

As you see some of your decisions how to design your system really depend on how much margin you have to reach your non-functional requirements. But overall I believe you have a good rough action plan, which you can refine with new learnings along the journey!

Would be quite interested if your project works as planned. Good luck and happy learning!

Best regards
Christian

1 Like

One question, @Christian_Simonis:
The FaceNet model used in the course is based on older versions of Python and TF. If I try to use that in the current version, I am getting errors as seen others: https://community.deeplearning.ai/search?q=bad%20marshal%20data.

My question is, can I retrain the model from scratch using the new versions or that will be too expensive? Is it even possible given the performance is based on hyperparameters? Please advise. Thank you!

Yes, theoretic it is possible but practicably it’s too expensive and after all you want to benefit from that already pre-trained model. So I would rather advise to go for finetuning as outlined in the posts above. Here another repo you can check out for a more recent version: GitHub - R4j4n/Face-recognition-Using-Facenet-On-Tensorflow-2.X

Best regards
Christian

How about using Siamese architecture? I may be wrong but I believe they are quite fast and efficient for image similarity. Also good for one shot image detection which is what is sounds like your after. Here is an implementation from keras and a paper in regards to one-shot recognition.

thank you, @Christian_Simonis!
I implemented a face recognition based on the approach taken in GitHub - R4j4n/Face-recognition-Using-Facenet-On-Tensorflow-2.X . It works correctly and identifies faces. I am also able to transfer just the encodings of a new face to the receiver.

However, one thing I notice is that MTCNN is very slow (takes more than a minute to react) when running on a RPI 4 with 8 GB RAM. Is there a way to speed it up? Please advise. Thank you!

1 Like

Glad to hear that, @David00! Congrats on that achievement!

Some hints:

Best regards
Christian

thank you for quick response!

Side topic: This recent article from the batch might be interesting for some of the followers of this thread since it touches upon another use case of neural networks deployed on a raspberry pi: How to Run PilotNet on a Raspberry Pi Pico Microcontroller

Best regards
Christian

thank you for kindly remembering and forwarding! gratefully appreciated.

1 Like

I am confused about re-training the weights. The whole point of models like FaceNet is to leverage one-shot learning. Why someone would want to re-train the model with new individuals as a class label? Prof. Andrew discussed this in detail in week 4, course 4 of DLS. So, why? Kindly do explain this in detail.

Thank you.