Hi,
I’m stuck at course 3 week 1 assignment exercise 11 # UNQ_C11 def treatment_dataset_split for the part below.
# From the training set, get the labels of patients who received treatment
y_treat_train = y_train[X_treat_train.index]
Frankly I’m not 100% sure exactly what the question is asking for.
- Is it just to subset the input
y_trainnumpy array to output arrayy_treat_trainWITHOUT adding another column for labels? More specifically, isy_treat_trainexpected to have 1 column? - If
y_treat_trainis expected to have additional column for label, then what should the labels be? In the text cell below, it uses a column calledID. But in later cells, the actual dataX_devdoesn’t have suchIDcolumn. So it doesn’t seem correct to hardcode in the function to have the labels ofy_treat_trainto beIDcolumn. This is VERY confusing to me. - I tried using
indexas shown in the snippet above but it failed, because the inputy_trainis an array, without index, unlike the inputX_train. Normally, just like the rest of the assignment, input X and y should have the same index, and both shall be dataframe. - Based on description of this function
def treatment_dataset_split, it’s expecting outputy_treat_trainto benp.array. However, in later cells, you can see output oftreatment_dataset_splitis used as input forholdout_grid_searchwhich is expectingy_treat_trainto be a 1 column dataframe. So this is another confusing inconsistency to me.
Hopefully I’m making some sense this. This truly is quite frustrating. I’d appreciate any clarifications here.
Thanks
Huan