UNQ_C11 treatment_dataset_split

wdduncan · January 17, 2023, 6:48pm

This seems to be problem many are having …
I am unsure how to subset the y variables. E.g.:

 # From the training set, get the labels of patients who received treatment
y_treat_train = None

I’ve tried multiple options, such as getting the outcomes for patients that had the treatment:

y_train = pd.DataFrame(y_train, index = X_train.index)
y_treat_train = y_train.loc[X_treat_train.index]

The unit test still fails:

Error: Wrong output.
 2  Tests passed
 1  Tests failed

Any help appreciated

Juan_Olano · January 17, 2023, 8:00pm

Hi @wdduncan ,

Regarding getting y_treat_train:

First, you don’t have to recalculate y_train, as this is given to you via parameters. You should use y_train as it comes.

Now, to get y_treat_train, you will want to extract from y_train the items in those indexes that, according to X_train, have received treatment (TRMT==1).

Review these tips and try to reframe your command. Let me know how it goes.

Juan

wdduncan · January 17, 2023, 8:28pm

Hi Juan,
Do you mean like this?:

y_treat_train = y_train[X_treat_train.index]

This throws an index out of bounds error. E.g., here is the head of X_treat_train:

The results in following error when trying to subset y_treat_train:

IndexError: index 99 is out of bounds for axis 0 with size 75

Given the way X_treat_train is indexed, the error makes sense.

Juan_Olano · January 17, 2023, 8:40pm

Almost like that you are pretty close.

You have to do a couple of refinements:

First, try using the original X_train
Second, from that X_train, select the entries that have treatment.

wdduncan · January 17, 2023, 10:50pm

Ugghhh … I tried this, but it is not passing the unit tests

y_treat_train = [y_train[i] for i, v in enumerate(X_train.TRTMT) if v == 1]

Juan_Olano · January 18, 2023, 12:11am

So you are trying to get from y_train a subset of indices that correspond to all X-train’ed items. It is much more simpler than what you did last. It looks almost exactly as what you did in your original post.

Remember what was shown in one of the exercises: If you want to select the elements of X that have not been treated, you can do something like x_train.TRTM == 0 and this will return the elements that meet this condition.

wdduncan · January 18, 2023, 1:07am

You are right
I can shorten what I posted previously to:

y_treat_train = y_train[X_train.TRTMT == 1]

However, this doesn’t pass the unit tests

wdduncan · January 19, 2023, 3:25pm

@Juan_Olano Any hints/ideas as to why my unit tests are still failing after the change I made?

Juan_Olano · January 19, 2023, 3:35pm

@wdduncan , can you please send me your code via DM? I’ll look at it in detail and try to provide more hints.

wdduncan · January 19, 2023, 4:05pm

Sure thing. Just did.

Muhammad_John_Abbas · January 21, 2023, 11:57am

wdduncan:

I am unsure how to subset the y variables. E.g.:
 # From the training set, get the labels of patients who received treatment
y_treat_train = None
I’ve tried multiple options, such as getting the outcomes for patients that had the treatment:
y_train = pd.DataFrame(y_train, index = X_train.index)
y_treat_train = y_train.loc[X_treat_train.index]
The unit test still fails:
Error: Wrong output.
 2  Tests passed
 1  Tests failed

Hi @wdduncan

You can subset the y variables by using boolean indexing. For example, if you have a column in your X_train dataframe that indicates whether a patient received treatment (let’s call it ‘treatment’), you can use that column to subset the y_train labels like this:

y_treat_train = y_train[X_train[‘treatment’] == 1]

This will select only the rows from y_train where the corresponding ‘treatment’ value in X_train is equal to 1.

It is also important to make sure that the index of the y_treat_train dataframe match the index of the X_treat_train dataframe. If they don’t match, the test case will fail.

y_treat_train = y_train.loc[X_treat_train.index]

You can also use the .iloc or .loc function to subset the y variables based on the index of rows that correspond to the treatment.

It’s also important to check that the index of your y_treat_train and X_treat_train match, if they don’t it will cause test case to fail.

Hope so this answers your question
Regards
Muhammad John Abbas

wdduncan · January 21, 2023, 1:13pm

Thanks @Muhammad_John_Abbas !

Topic		Replies	Views
AI4M Course 3 Week 1 UNQ_11 AI For Medical Treatment week-module-1	2	692	October 13, 2022
UNQ_C11 treatment_dataset_split - How to define y_treat/control_train/val? AI For Medical Treatment week-module-1	1	547	July 1, 2022
Bug in unit testing for UNQ_C11 AI For Medical Treatment week-module-1	1	622	October 3, 2021
AI4M C3W1 Assignment - All tests passed, but computations after #UNQ_C11 fail AI For Medical Treatment week-module-1 , week-module-3	6	65	July 9, 2025
UNQ_C8 treated and untreated filtering handled incorrectly? AI For Medical Treatment week-module-1	8	660	February 13, 2025

UNQ_C11 treatment_dataset_split

Related topics