I have been learning AI/ML from few months and I want to create some project as in real-life data and wants to contribute in open source project how should approach it
@manu2576a so, I’ve heard some mixed opinions about this (including and particularly from programmers that are much more experienced than I am [i.e. those that maintain the Linux Kernel]).
Thus, I would say probably the best way to get started in open source is to 1) Get yourself a Github account 2) Find a problem you feel is interesting 3) Release it on Github under the GNU, MIT, etc license you prefer !
You’re now part of the open source community.
A second opportunity is to find a project on Github you like, and do a fork (not a pull), then make your improvements. Perhaps you will get noticed and if enough people like what you’ve done, they may decided to add you to the main repo.
The reason I say not a pull, and what the very serious programmers have expressed-- Unless the is a direct ask to be part of a project, someone has to maintain the codebase. And especially if you are just new and starting out… Well only to be honest, the maintainers just don’t have the time (or perhaps the will) to examine every piece of novice level code.
So, for some people this can be a big turn off to deal with.
However, this is only IMHO, and based upon what I’ve heard expressed by others.
I agree with @Nevermnd. If you establish your reputation first, your path will be more open to collaboration on other projects.
Pick a project or field or area you wanna work on, get experience in, or understand. Can be as broad as Python, as generic as AI, and as narrow as agent skills or memory.
Find good first issues by filtering projects against your interested and looking for labels like good first issue or help wanted, (some repos can have custom labels like skill level low or beginner etc.).
Read the issues you cherry pick against the repo. Readme, contributing, locense, code of conduct, AI policies etc. Understand the architecture and ripple effects. Eg. If you change a script A, a test relayed to it might also have to change even if not part of the issue. Be proactive and learn to have a birds eye view on what your changes affect. Or for axample a test changes and a doc related to the test might need to changet with it.
Learn git or use an AI assistant to generate the commands for your machine and dirs and gh account, but try to use them manually at start so you understand ehat each git command does and why.
Before opening a PR, comment on the issue and kindly ask if maintainers could assign it to you. If you see issues already assigned or issues with open PRs, skip them. You can monitor a repo you like for new issues and get email notifications for when new issues come out so you can grab them first.
Finally, have fun doing it. Don’t overthink it. Your first PRs will probably be not accepted immediately, but most open source repos have nice maintainers and they will point to your errors and thus guide you and give you constructive feedback so you can learn and reiterate your pr.
If you’re looking for easy good first issues and tailored for AI assisted resolutions check GitHub - ARPAHLS/skillware: A Python framework for modular, self-contained skill management for machines. · GitHub and GitHub - ARPAHLS/rooms: A secure, local-first Python framework for orchestrating complex multi-agent think tanks with dynamic expertise-weighted routing. · GitHub
Note that this thread has been cold for two years.