Beginner planning a computer vision Project – need guidance on what to learn and which courses to take

Hi everyone!
I’m a beginner planning to do my thesis in computer vision. My project idea is to build a hand gesture detection system using CNNs and then integrate it into an application that works on both laptops and mobile devices (Android/iOS).

I’m looking for advice on:

  • What topics should I study to be ready for this project?
  • What tools and libraries should I learn (e.g., TensorFlow, OpenCV)?
  • Are there any recommended online courses or learning paths (Coursera, Udemy, etc.)?
  • How should I approach the cross-platform app development part (e.g., use Flutter, React Native, or something else)?
  • Any general tips for combining machine learning models with apps?

I’m comfortable with basic Python and just getting started with ML and deep learning.

Any suggestions or learning resources would be greatly appreciated!

Thanks in advance!

Hello, @MinHeinKhant,

Since you are doing hand gesture, I suppose your data is videos, so it is going to be either CNN with 3D convolutional layers, or some combination of CNN and Sequential models like LSTM.

Two relevant courses offered by DLAI are the beginner-level Machine Learning Specialization and the advanced Deep Learning Specialization. If you have time, you may finish MLS Course 1 & 2 first, before pursuing the DLS. As a beginner course, the MLS takes 2 courses to go through the first part of the DLS course 1, so for any reason if you feel the need to skip the MLS, it is totally feasible as the MLS is not a prerequisite for the DLS.

DLS Course 4 is about CNNs and Course 5 is about sequential models so they will be most relevant, but I am not suggesting to skip the first 3 courses because they cover a lot of fundamentals which can be important for your work and for you to defend your thesis.

For the actual work, Tensorflow can do the job and the courses above use Tensorflow as well. There is another library that I believe is more popular among the academia called PyTorch, but I recommend you to stick to the one used by the courses.

I used Flutter for Android development, and I know it supports desktop as well, so it should be a good candidate to begin with. Since we mostly develop our model on Desktop, the model can easily become too large for a mobile phone, if so, you may want to explore “Tensorflow Lite” for compressing your model.

Now thinking all of these again, it seems to me you may want to learn the fundamentals to help you train models and defend your work, you may want to learn Tensorflow to make it happen, and you may want to learn Flutter to build your APPs, so I think time management is also a challenge for you.

Cheers,
Raymond

3 Likes

Hi, I am also new in computer vision and i completed DLS. You completed your project ?