C2_W3_Assignment UNQ_C3 HomemadeKM

Hello,
I am unable to complete the for loop of the HomemadeKM function.
The problem is that when comparing n_t = sum((df[‘Time’] >= t)), I get “Invalid comparison between dtype=int64 and str”. Assuming that df[‘Time’] is an array of strings, I can’t understand how others in the community were able to complete the comparison using only df[‘Time’] >= t, because string comparison in this way cannot be done. Thanks!
My lab code is yumlvzizuecs

Hi @Alberto_Zacchini,

You are getting an error, because in your code you are summing df[‘Time’] which is an array of dates! You should try instead to get the length of the array. Give it a try and let me know.

Regards,
Samuel

thanks, but it seems not working again. I got the same issue :

compute n_t, number of people who survive to time t

    n_t = len(df[df['Time'] >= t])

compute d_t, number of people who die at time t

    d_t = len(df[(df['Event'] == 1) & (df['Time'] == t)])

Invalid comparison between dtype=int64 and str

The problem seems to compare only with the “>=” operator

Hi @Alberto_Zacchini,

Your code above is correct and the tests should pass. Maybe there is an error in the rest of your code. Can you please send me a message with your code for UNQ_C3 so that I can check it?

regards,
Samuel

1 Like

Hi Samuel,
Thanks! This is the code:

UNQ_C3 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)

def HomemadeKM(df):
“”"
Return KM estimate evaluated at every distinct
time (event or censored) recorded in the dataset.
Event times and probabilities should begin with
time 0 and probability 1.

Example:

input:

Time Censor
0 5 0
1 10 1
2 15 0

correct output:

event_times: [0, 5, 10, 15]
S: [1.0, 1.0, 0.5, 0.5]

Args:
df (dataframe): dataframe which has columns for Time
and Event, defined as usual.

Returns:
event_times (list of ints): array of unique event times
(begins with 0).
S (list of floats): array of survival probabilites, so that
S[i] = P(T > event_times[i]). This
begins with 1.0 (since no one dies at time
0).
“”"

individuals are considered to have survival probability 1

at time 0

event_times = [0]
p = 1.0
S = [p]

START CODE HERE (REPLACE INSTANCES OF ‘None’ with your code)

get collection of unique observed event times

observed_event_times = pd.Series.unique(df[‘Time’])

sort event times

observed_event_times = df.sort_values(by=‘Time’)

iterate through event times

for t in observed_event_times:

compute n_t, number of people who survive to time t

n_t = len(df[df[‘Time’] >= t])

compute d_t, number of people who die at time t

d_t = len(df[(df[‘Event’] == 1) and (df[‘Time’] == t)])

update p

p *= 1-(d_t/n_t)

update S and event_times (ADD code below)

hint: use append

event_times.append(t)
S.append(p)

END CODE HERE

return event_times, S

Hi @Alberto_Zacchini ,

In the following code:


" sort event times

observed_event_times = df.sort_values(by=‘Time’)"


you are sorting the entire dataframe, and not the event times as requested.
Correct this and let me know if all works fine.

Regards,
Samuel

1 Like

Hi Samuel,
thank you so much! It works very well now!