Help Needed: Developing a Deep Learning Algorithm for Action Recognition

Hi everyone,

I’m a PhD student working on “Multimodal Egocentric Action Recognition Based on Context Information,” and I’m new to this research area.

My background is in Mechatronics and Control Engineering. Recently, I completed the Deep Learning Specialization courses, which gave me a basic understanding of deep learning concepts. However, I’m finding the concepts around sequence models quite challenging.

My current goal is to develop a deep learning algorithm to recognize actions using the Assembly101 dataset (https://assembly-101.github.io). One of my colleagues has already developed an action recognition algorithm using self-supervised learning for sequence data. I aim to extend this to video data, specifically for the Assembly101 dataset. However, I’m unsure where to start and don’t feel very confident about it.

I’m looking for guidance and hope to connect with people who have experience in this area. I want to get started on the right track to build my desired algorithm.

Any help or advice would be greatly appreciated.

Thank you!