UNQ_C11 treatment_dataset_split

This seems to be problem many are having …
I am unsure how to subset the y variables. E.g.:

 # From the training set, get the labels of patients who received treatment
y_treat_train = None

I’ve tried multiple options, such as getting the outcomes for patients that had the treatment:

y_train = pd.DataFrame(y_train, index = X_train.index)
y_treat_train = y_train.loc[X_treat_train.index]

The unit test still fails:

Error: Wrong output.
 2  Tests passed
 1  Tests failed

Any help appreciated :slight_smile:

Hi @wdduncan ,

Regarding getting y_treat_train:

First, you don’t have to recalculate y_train, as this is given to you via parameters. You should use y_train as it comes.

Now, to get y_treat_train, you will want to extract from y_train the items in those indexes that, according to X_train, have received treatment (TRMT==1).

Review these tips and try to reframe your command. Let me know how it goes.

Juan

Hi Juan,
Do you mean like this?:

y_treat_train = y_train[X_treat_train.index]

This throws an index out of bounds error. E.g., here is the head of X_treat_train:

    ID
6    6
99  99
82  82
60  60
80  80

The results in following error when trying to subset y_treat_train:

IndexError: index 99 is out of bounds for axis 0 with size 75

Given the way X_treat_train is indexed, the error makes sense.

Almost like that :slight_smile: you are pretty close.

You have to do a couple of refinements:

  • First, try using the original X_train
  • Second, from that X_train, select the entries that have treatment.

Ugghhh … I tried this, but it is not passing the unit tests :frowning:

y_treat_train = [y_train[i] for i, v in enumerate(X_train.TRTMT) if v == 1]

So you are trying to get from y_train a subset of indices that correspond to all X-train’ed items. It is much more simpler than what you did last. It looks almost exactly as what you did in your original post.

Remember what was shown in one of the exercises: If you want to select the elements of X that have not been treated, you can do something like x_train.TRTM == 0 and this will return the elements that meet this condition.

You are right :slight_smile:
I can shorten what I posted previously to:

y_treat_train = y_train[X_train.TRTMT == 1]

However, this doesn’t pass the unit tests :frowning:

@Juan_Olano Any hints/ideas as to why my unit tests are still failing after the change I made?

@wdduncan , can you please send me your code via DM? I’ll look at it in detail and try to provide more hints.

Sure thing. Just did.

Hi @wdduncan

You can subset the y variables by using boolean indexing. For example, if you have a column in your X_train dataframe that indicates whether a patient received treatment (let’s call it ‘treatment’), you can use that column to subset the y_train labels like this:

y_treat_train = y_train[X_train[‘treatment’] == 1]

This will select only the rows from y_train where the corresponding ‘treatment’ value in X_train is equal to 1.

It is also important to make sure that the index of the y_treat_train dataframe match the index of the X_treat_train dataframe. If they don’t match, the test case will fail.

y_treat_train = y_train.loc[X_treat_train.index]

You can also use the .iloc or .loc function to subset the y variables based on the index of rows that correspond to the treatment.

It’s also important to check that the index of your y_treat_train and X_treat_train match, if they don’t it will cause test case to fail.

Hope so this answers your question
Regards
Muhammad John Abbas

Thanks @Muhammad_John_Abbas !