Week 2: General doubt regarding flow of topics

I am currently on week 2, yet to start with the lab assignment. However, post the feature selection lecture, I started to wonder about these two questions.

First: In week 1, why did we randomly drop all columns except the three used? Was that done on purpose to simplify the transformations? Shouldn’t we have done feature engineering first before deciding what to drop and what not to?

Second: I have heard from many industry folks, that it is recommended to just create a baseline model first and then move towards feature-engg, selection, tuning, etc so that we have a benchmark to check if we are making progress or not. That is how I usually do it (just plug in a model with minimal tweaks, get a pipeline setup, achieve 65% say, and then work your way upwards).

PS: Regarding the second question, I am asking it w.r.t a real-life scenario. I understand it is a course and each step is done in detail right there itself.

Hi @aniruddhaDas

For complete this first lab assignment are just necessary this three columns. As it’s mentioned in the 1.3. Transform the data point of the notebook this is to simplify this task:

“To simplify the task, you will transform the data into a comma-separated value (CSV) file that contains only a review_body , product_category , and sentiment derived from the original data”.

In addition, you could address the F.E. task in the third week assignment of the course.

You can see here the main goals covered in the first week’s lab.

Hope that helps :wink:

2 Likes

Hi @aniruddhaDas, thank you for the relevant questions.:slight_smile:

Regarding your 2nd point, you got it right. In R&D it is usually done a baseline model to then try improve upon analyzing initial results. You usually find this script also in published academic papers.

In our case, since in C1 we train and deploy an automated built-in model in AutoML, you can consider this built-in model as baseline to later on pursue custom approach/ fine-tune with hyperparameters.

In Computer Vision and NLP applications, the process of transfer-learning from a pre-trained model with little modifications is very effective in general, when you can use an established architecture with little adjustments as baseline, to then improve.

In sum, it is highly important that we have data centric approach when building our models (endorsed by Prof. Andrew Ng). The hands-on exercises make us put this in practice.

Keep up with the good work!

1 Like