My first Data-Centric AI based Publication

Elemento · March 1, 2023, 7:51am

Hey Guys,
I am an undergrad from India, and during my undergrad, me and one of my colleagues published a paper on Data-Centric AI, as a part of our 6th Semester Mini Project.

Since this was my first paper, for which I came up with the idea and the workflow, I won’t say that this does some ground-breaking research, but as most of you might know, the feeling of getting your first own research paper published is something phenomenal. That being said, I would love to get as much constructive feedback as possible for our work.

The paper is entitled Efficient Solution to Develop an Exercising Algorithm using Model-Centric and Data-Centric Artificially Intelligent Approaches (something I wish would be more concise ), and can be found here. Looking forward to hearing from you guys

Cheers,
Elemento

rmwkwok · March 1, 2023, 9:44am

@Elemento

I am not familiar with this research field, so this is just my observation after reading your paper: there are hyperparameters in model for tuning, and ain’t there also hyperparameters in some of those data augmentation approaches? That seems not mentioned in figure 1 and algorithm 1. We can inform ourselves on how to tune model-related hyperparameters (e.g. by checking overfitting / underfitting), is there any established approach as to how to tune data-related hyperparameters?

Cheers
Raymond

arosacastillo · March 1, 2023, 1:37pm

Congratulations! The page is asking me to sign up or to enter from an educational institution. If you can send me the pdf of the paper in private, I would be more than happy to give you my feedback

Best and celebrate it!

Rosa

Juan_Olano · March 1, 2023, 2:26pm

Hi @Elemento , That’s great news! I tried to read it but it is behind a paywall. I created an account but still won’t allow me to read it. Any chance that you send us the PDF?

@arosacastillo is also interested in the reading.

Thanks,

Juan

Elemento · March 1, 2023, 3:46pm

Hey @arosacastillo and @Juan_Olano,
Thanks a lot for taking out the time to review my paper. I have attached my paper in the PDF format below for your reference.

Efficient_Solution_to_Develop_an_Exercising_Algorithm_using_Model-Centric_and_Data-Centric_Artificially_Intelligent_Approaches.pdf (1.1 MB)

Hey @rmwkwok,
Thanks a lot for your feedback.

That’s absolutely correct. Since we provided the source code for all our kernels, we thought adding the hyper-parameters in the paper that we used would be redundant, hence, we excluded those.

During our literature survey, we didn’t come across any such established approach. So, we adopted a simple and intuitive methodology. We kept the model fixed, along with all it’s hyper-parameters, and we used over-fitting and under-fitting to tune the data related hyper-parameters.

Although, I would like to highlight one additional point. Since the hyper-parameter tuning that can be done is practically endless for most of the methods, hence, it would become hard for us to compare 2 methods with different kinds of hyper-parameter tuning. So, we kept the same hyper-parameter tuning for all our methods, so that, we can compare the methods in their raw form. I hope this somewhat makes our approach a bit more clear.

Cheers,
Elemento

rmwkwok · March 1, 2023, 8:25pm

Thanks @Elemento!

Raymond

arosacastillo · March 23, 2023, 6:43pm

Hi Elemento,

sorry for the late response but I was busy organizing an event and did not have time til now to read your paper. First of all congrats on the publication because this is a very hot topic on AI. It is just a coincidence that I am watching these videos about Data-Centric AI from MIT:

So the main thing I see on your comparison is that the advantage of a data-centric approach is totally dependent on the dataset that you choose. I mean to do a fair comparison I would choose two types of datasets:

dataset with very few noisy labels
dataset with a lot of noisy labels

Then the results might be different and indeed the accuracy that you measure on a model-centric approach might be high but maybe a very wrong metric if it shows only that it learned very well to predict wrong labels. I love the concept they explain on lecture 2 about the so-called “Confident learning” and the different concepts related when comparing both approaches.
This is indeed an amazing field. Looking forward to reading more of your future published papers

Best,

Rosa

Christian_Simonis · March 23, 2023, 7:40pm

Congrats, @Elemento! I am also looking forward to your future publications.

Thanks also for sharing the PDF!

Best
Christian

Elemento · March 24, 2023, 10:49am

Hey @arosacastillo and @Christian_Simonis,
Thanks a lot for your wishes and feedback.

I completely agree with the above point, but the thing was that, we were working on a deadline, and we had to submit the paper before our thesis presentation, so that, we could include in the same. We hope to extend this work sometime in the future

Cheers,
Elemento

Topic		Replies	Views
Learning Material AI Discussions ai-discussions , data-centric	1	176	August 12, 2021
Review my article AI Discussions	13	157	January 25, 2023
Recent Papers on Data-centric for time series AI Discussions ai-discussions , data-centric	1	74	May 18, 2023
Is Data Centric AI only applicable to supervised learning? AI Discussions ai-discussions , data-centric	1	55	May 16, 2023
[Classic ML] Article AI Discussions	3	78	November 5, 2023

My first Data-Centric AI based Publication

Related topics