AI4M Course 3 Week 1 UNQ_11

MrHuanwang · October 12, 2022, 4:40pm

Hi,

I’m stuck at course 3 week 1 assignment exercise 11 # UNQ_C11 def treatment_dataset_split for the part below.

 # From the training set, get the labels of patients who received treatment
    y_treat_train = y_train[X_treat_train.index]

Frankly I’m not 100% sure exactly what the question is asking for.

Is it just to subset the input y_train numpy array to output array y_treat_train WITHOUT adding another column for labels? More specifically, is y_treat_train expected to have 1 column?
If y_treat_train is expected to have additional column for label, then what should the labels be? In the text cell below, it uses a column called ID. But in later cells, the actual data X_dev doesn’t have such ID column. So it doesn’t seem correct to hardcode in the function to have the labels of y_treat_train to be ID column. This is VERY confusing to me.
I tried using index as shown in the snippet above but it failed, because the input y_train is an array, without index, unlike the input X_train. Normally, just like the rest of the assignment, input X and y should have the same index, and both shall be dataframe.
Based on description of this function def treatment_dataset_split, it’s expecting output y_treat_train to be np.array. However, in later cells, you can see output of treatment_dataset_split is used as input for holdout_grid_search which is expecting y_treat_train to be a 1 column dataframe. So this is another confusing inconsistency to me.

Hopefully I’m making some sense this. This truly is quite frustrating. I’d appreciate any clarifications here.

Thanks

Huan

MrHuanwang · October 12, 2022, 7:07pm

I seem to manage to make it work by adding two additional lines at the beginning to enforce input y_train / y_val to be panda df, with the same index from corresponding X_train / X_val

y_train = pd.DataFrame(y_train, index = X_train.index)
y_val = pd.DataFrame(y_val, index = X_val.index)

After this, I can successfully generate y_treat/control_train/val, using example here

 y_treat_train = y_train.loc[X_treat_train.index]

This way, it works for unit test which uses array as input for y_train/val, and also works for later cell which uses df as input.

However, the autograder is still failing

I’d appreciate if someone could look into this myterious bug.

Thanks

Huan

Mubsi · October 13, 2022, 12:46pm

Hi @MrHuanwang,

I have replied to your failing autograder post.

Cheers,
Mubsi

Topic		Replies	Views
UNQ_C11 treatment_dataset_split AI For Medical Treatment week-1	11	570	January 21, 2023
UNQ_C11 treatment_dataset_split - How to define y_treat/control_train/val? AI For Medical Treatment week-1	1	541	July 1, 2022
Bug in unit testing for UNQ_C11 AI For Medical Treatment week-1	1	615	October 3, 2021
Challenge with the C3_W1_Assignment AI For Medical Treatment week-1	5	389	December 23, 2023
UNQ_C11 Block - Event times can only be 1-dimensional: AI For Medical Treatment week-1	3	378	September 26, 2023

AI4M Course 3 Week 1 UNQ_11

Related topics