Hello,
I am unable to complete the for loop of the HomemadeKM function.
The problem is that when comparing n_t = sum((df[‘Time’] >= t)), I get “Invalid comparison between dtype=int64 and str”. Assuming that df[‘Time’] is an array of strings, I can’t understand how others in the community were able to complete the comparison using only df[‘Time’] >= t, because string comparison in this way cannot be done. Thanks!
My lab code is yumlvzizuecs

You are getting an error, because in your code you are summing df[‘Time’] which is an array of dates! You should try instead to get the length of the array. Give it a try and let me know.

Regards,
Samuel

thanks, but it seems not working again. I got the same issue :

# compute n_t, number of people who survive to time t

n_t = len(df[df['Time'] >= t])

# compute d_t, number of people who die at time t

d_t = len(df[(df['Event'] == 1) & (df['Time'] == t)])

Invalid comparison between dtype=int64 and str

The problem seems to compare only with the “>=” operator

Your code above is correct and the tests should pass. Maybe there is an error in the rest of your code. Can you please send me a message with your code for UNQ_C3 so that I can check it?

regards,
Samuel

1 Like

Hi Samuel,
Thanks! This is the code:

# UNQ_C3 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)

“”"
Return KM estimate evaluated at every distinct
time (event or censored) recorded in the dataset.
Event times and probabilities should begin with
time 0 and probability 1.

Example:

input:

Time Censor
0 5 0
1 10 1
2 15 0

correct output:

event_times: [0, 5, 10, 15]
S: [1.0, 1.0, 0.5, 0.5]

Args:
df (dataframe): dataframe which has columns for Time
and Event, defined as usual.

Returns:
event_times (list of ints): array of unique event times
(begins with 0).
S (list of floats): array of survival probabilites, so that
S[i] = P(T > event_times[i]). This
begins with 1.0 (since no one dies at time
0).
“”"

# at time 0

event_times = [0]
p = 1.0
S = [p]

# get collection of unique observed event times

observed_event_times = pd.Series.unique(df[‘Time’])

# sort event times

observed_event_times = df.sort_values(by=‘Time’)

# iterate through event times

for t in observed_event_times:

# compute n_t, number of people who survive to time t

n_t = len(df[df[‘Time’] >= t])

# compute d_t, number of people who die at time t

d_t = len(df[(df[‘Event’] == 1) and (df[‘Time’] == t)])

p *= 1-(d_t/n_t)

# hint: use append

event_times.append(t)
S.append(p)

### END CODE HERE

return event_times, S

In the following code:

" sort event times

observed_event_times = df.sort_values(by=‘Time’)"

you are sorting the entire dataframe, and not the event times as requested.
Correct this and let me know if all works fine.

Regards,
Samuel

1 Like

Hi Samuel,
thank you so much! It works very well now!