Hey Guys,
I am an undergrad from India, and during my undergrad, me and one of my colleagues published a paper on Data-Centric AI, as a part of our 6th Semester Mini Project.
Since this was my first paper, for which I came up with the idea and the workflow, I won’t say that this does some ground-breaking research, but as most of you might know, the feeling of getting your first own research paper published is something phenomenal. That being said, I would love to get as much constructive feedback as possible for our work.
The paper is entitled Efficient Solution to Develop an Exercising Algorithm using Model-Centric and Data-Centric Artificially Intelligent Approaches (something I wish would be more concise), and can be found here. Looking forward to hearing from you guys
I am not familiar with this research field, so this is just my observation after reading your paper: there are hyperparameters in model for tuning, and ain’t there also hyperparameters in some of those data augmentation approaches? That seems not mentioned in figure 1 and algorithm 1. We can inform ourselves on how to tune model-related hyperparameters (e.g. by checking overfitting / underfitting), is there any established approach as to how to tune data-related hyperparameters?
Congratulations! The page is asking me to sign up or to enter from an educational institution. If you can send me the pdf of the paper in private, I would be more than happy to give you my feedback
Hi @Elemento , That’s great news! I tried to read it but it is behind a paywall. I created an account but still won’t allow me to read it. Any chance that you send us the PDF?
Hey @arosacastillo and @Juan_Olano,
Thanks a lot for taking out the time to review my paper. I have attached my paper in the PDF format below for your reference.
That’s absolutely correct. Since we provided the source code for all our kernels, we thought adding the hyper-parameters in the paper that we used would be redundant, hence, we excluded those.
During our literature survey, we didn’t come across any such established approach. So, we adopted a simple and intuitive methodology. We kept the model fixed, along with all it’s hyper-parameters, and we used over-fitting and under-fitting to tune the data related hyper-parameters.
Although, I would like to highlight one additional point. Since the hyper-parameter tuning that can be done is practically endless for most of the methods, hence, it would become hard for us to compare 2 methods with different kinds of hyper-parameter tuning. So, we kept the same hyper-parameter tuning for all our methods, so that, we can compare the methods in their raw form. I hope this somewhat makes our approach a bit more clear.
sorry for the late response but I was busy organizing an event and did not have time til now to read your paper. First of all congrats on the publication because this is a very hot topic on AI. It is just a coincidence that I am watching these videos about Data-Centric AI from MIT:
So the main thing I see on your comparison is that the advantage of a data-centric approach is totally dependent on the dataset that you choose. I mean to do a fair comparison I would choose two types of datasets:
dataset with very few noisy labels
dataset with a lot of noisy labels
Then the results might be different and indeed the accuracy that you measure on a model-centric approach might be high but maybe a very wrong metric if it shows only that it learned very well to predict wrong labels. I love the concept they explain on lecture 2 about the so-called “Confident learning” and the different concepts related when comparing both approaches.
This is indeed an amazing field. Looking forward to reading more of your future published papers
I completely agree with the above point, but the thing was that, we were working on a deadline, and we had to submit the paper before our thesis presentation, so that, we could include in the same. We hope to extend this work sometime in the future