Week 2 Feature importance: SHAP - Quick Model Question

In the video it’s said that a simple model is created behind the scenes. Is this the same Random Forest model mentioned in the video? It’s clear how an arbitrary model processes numerical and categorical data. However, it’s not obvious how to process raw text data. Is it preprocessed in some way like bag of words, word2vec, some pre-trained neural network embeddings?

Also, in the video it’s said that the SHAP framework is used for feature importance detection, but the AWS docs page says that the Gini importance method is used. Which one is correct?

Hi @vbogach,

Can you give us a bit more details? What is the title of the video from week 2 that you are referring to?

The title is Feature importance: SHAP.

In the video i indeed said at 7:50 that a random forest model is created on the background to calculate the feature importance.

About the preprocessing, I suggest that you wait until you get to Course 2. You will learn about the use of raw text data.

1 Like