First, you don’t have to recalculate y_train, as this is given to you via parameters. You should use y_train as it comes.
Now, to get y_treat_train, you will want to extract from y_train the items in those indexes that, according to X_train, have received treatment (TRMT==1).
Review these tips and try to reframe your command. Let me know how it goes.
So you are trying to get from y_train a subset of indices that correspond to all X-train’ed items. It is much more simpler than what you did last. It looks almost exactly as what you did in your original post.
Remember what was shown in one of the exercises: If you want to select the elements of X that have not been treated, you can do something like x_train.TRTM == 0 and this will return the elements that meet this condition.
You can subset the y variables by using boolean indexing. For example, if you have a column in your X_train dataframe that indicates whether a patient received treatment (let’s call it ‘treatment’), you can use that column to subset the y_train labels like this:
This will select only the rows from y_train where the corresponding ‘treatment’ value in X_train is equal to 1.
It is also important to make sure that the index of the y_treat_train dataframe match the index of the X_treat_train dataframe. If they don’t match, the test case will fail.
y_treat_train = y_train.loc[X_treat_train.index]
You can also use the .iloc or .loc function to subset the y variables based on the index of rows that correspond to the treatment.
It’s also important to check that the index of your y_treat_train and X_treat_train match, if they don’t it will cause test case to fail.
Hope so this answers your question
Regards
Muhammad John Abbas