UNQ_C11 treatment_dataset_split - How to define y_treat/control_train/val?

Hello,

I have tried for several hours now and looked through Discourse and the Coursera Forum as well as the internet but am still stuck.

In the C3M1_Assignment in UNQ_C11 for the def treatment_dataset_split - could someone point out to me how to define the y_treat/control_train/val variables correctly please?

I can’t seem to get the correct list of corresponding outcome values that match the respective patients from the filtered (treatment yes / no) dataframes. Currently I am not getting the same amount of items for y_filtered as for X_filtered which leads to a shape error in the last couple of cells of the assignment.

I have tried to pick the values for y_filtered either via the (reset) index of the X_filtered or by adding y to X as a column and then bringing X_filtered[“y”] to list y_filtered - both to no avail so far.

What is a good strategy to choose values from y for y_filtered? What methods could one use?
Any help would be much appreciated!
Thanks in advance

My LabID: tekoaqlt

I would suggest KFold

As an alternative to train-test split, K-fold provides a mechanism to use all data points in your dataset as both the training data and test data.