Challenge with the C3_W1_Assignment

Hello everyone,

I’m currently facing a challenge with the C3_W1_Assignment.

My main issue revolves around this particular line of code, which is causing me some trouble:

treatment_model, best_hyperparam_treat = holdout_grid_search(RandomForestClassifier, X_treat_train, y_treat_train, X_treat_val, y_treat_val, hyperparams)

The problem is that the X_treat_train dataset no longer contains the TRTMT column, as it was removed by the treatment_dataset_split function.

However, the c_statistic function (the estimator_score), which is a part of my holdout_grid_search function, requires the TRTMT column to identify the best model.

Here are the parameters needed for the function c_statistic:

c_statistic(pred_rr, y, w, random_seed=0)

where w is an array of true treatments.

The conundrum deepens because, even if X_treat_train included the TRTMT column, filled entirely with ones, it wouldn’t be feasible to split it into two arrays within the c_statistic function - one with TRTMT==1 and another with TRTMT==0 - since we only have TRTMT==1 data.
Consequently, the c_for_benefit_score function, which is a component of c_statistic , fails to function as intended.

This has left me quite perplexed, and I would greatly appreciate any guidance or assistance on this matter.

Best regards,
Alex

Hi @Learning_Guy

Welcome! Good to see you on the community.

Not sure if you have already got past this problem, that I am frankly not quite able to understand completely.

X_(anything) is typically the feature set and y_(anything) is typically the label column.

So here…
X_treat_train = X_train[X_train[“TRTMT”]==1]
X_treat_train = X_treat_train.drop(‘TRTMT’,axis=1)
is indeed the correct statement as it removes TRTMT.

And the label column is:
y_treat_train = y_train[X_train[“TRTMT”]==1]

Best Regards.
Jaidev

Hello Jaidev,

Thank you very much for your message!

I’d like to describe the still existing problem in more detail for your assistance.

My task involves completing the function c_statistic(pred_rr, y, w, random_seed=0), where w represents an array of true treatments. This function is integral to my assignment and needs to be incorporated into another function, holdout_grid_search.

The holdout_grid_search function requires the following parameters:

  • RandomForestClassifier
  • X_treat_train
  • y_treat_train
  • X_treat_val
  • y_treat_val
  • hyperparams

The challenge arises because X_treat_train and X_treat_val no longer include the w parameter. This leads to my question: How can I effectively integrate c_statistic(pred_rr, y, w, random_seed=0) with X_treat_train and X_treat_val in the holdout_grid_search function?

I greatly appreciate your time and any guidance you can provide on this matter.

Best regards,
Alex

Hi, @Learning_Guy

Let me confirm the intent of your question.
Your question is how to treat the argument w in the function c_statistic when w is NOT included in the arguments of the function holdout_grid_search?
Then, it would be difficult to call c_statistic in holdout_grid_search.

If so, can you tell us the relevant part of the code?
We would like to see where c_statistic is called in the holdout_grid_search function.

Kind regards,
Nakamura

Hello Nakamura,

Thank you for your message. I appreciate your prompt response. Would it be convenient for you if I send the code via a private message?

Best regards,
Alex

Hi, @Learning_Guy

Sorry for the late response.
Sure.

Best regards,
Nakamura