Question: How can I resolve a blank plot issue when the debugging output is EmptyDataFrame?
When I execute the debugging commands provided by,
# Print the results
print("\nRecent 'Afternoon' rows:")
print(recent_data)
print("\nPrevious rows:")
print(previous_data)
Output:
Enter a date (format: dd/mm/yyyy): 10/09/2023
No recent 'Afternoon' rows found
No recent 'Afternoon' rows found
No recent 'Afternoon' rows found
No recent 'Afternoon' rows found
No recent 'Afternoon' rows found
No recent 'Afternoon' rows found
No recent 'Afternoon' rows found
No recent 'Afternoon' rows found
No recent 'Afternoon' rows found
No recent 'Afternoon' rows found
No recent 'Afternoon' rows found
No recent 'Afternoon' rows found
No recent 'Afternoon' rows found
No recent 'Afternoon' rows found
No recent 'Afternoon' rows found
No recent 'Afternoon' rows found
No recent 'Afternoon' rows found
No recent 'Afternoon' rows found
No recent 'Afternoon' rows found
No recent 'Afternoon' rows found
Previous rows:
Patient_ID Order Period Order Date Order Amount Order Quantity
0 432 morning 2023-09-05 3203.84 7
1 433 morning 2023-09-05 1596.74 5
2 434 morning 2023-09-05 1732.09 7
3 435 morning 2023-09-05 3113.25 5
4 436 morning 2023-09-05 2213.22 7
5 437 morning 2023-09-05 2231.64 7
6 438 morning 2023-09-05 3158.68 5
7 439 morning 2023-09-05 785.62 7
8 440 morning 2023-09-05 1778.42 7
9 441 morning 2023-09-05 2161.15 5
10 442 morning 2023-09-05 1173.44 7
11 443 morning 2023-09-05 816.47 4
12 444 morning 2023-09-05 608.74 7
13 445 morning 2023-09-05 2416.47 7
14 446 morning 2023-09-05 2062.04 7
15 13218 morning 2023-09-05 3122.52 7
16 13219 morning 2023-09-05 2520.31 7
17 13220 morning 2023-09-05 2394.10 7
18 13221 morning 2023-09-05 471.54 3
19 13222 morning 2023-09-05 2901.95 5
Image of the data:
This is not the result that I intended. There are four steps that I hope to achieve.
- I hope to call out rows that have a timestamp that is closest to the user’s input date, which may come in two rows as per patient ID.
- I hope to pick out the row that has Order Period == Afternoon. For instance, in the image that I posted. I would pick row 426 for patient ID = 13220, row 429 for patient ID 445, and row 432 for Patient ID = 446. These rows of data points would be grouped as Recent data, as the user’s input date is 10/09/2023 which closely matches with the recent data of timestamp 2023-09-05.
- Next, would be to call out the associated patient ID that has been called out by the user’s closest input date by tracing back to the previous timestamp. Using the image as an example, I would pick row 425 since row 426 has been called out, and the fact that I need rows with Order Period == Afternoon. Row 419 and 422 would also be called out. The same procedure applies to the other rows not shown in the image. Together they would be concatenated as the previous data by weaving together with the respective ID.
- For each unique Patient ID profile, I hope to create a trend plot by integrating the recent and previous data.
I am aware that I made a few errors in the code function, most likely in the select_previous_data, select_recent_afternoon, and find_closest_date functions. How can I get it to work?
My full code:
import pandas as pd
import datetime
import ipywidgets as widgets
import numpy as np
from sklearn.linear_model import LinearRegression
from IPython.display import display
import matplotlib.pyplot as plt
import urllib
import matplotlib.dates as mdates
selected_cols = ['Patient ID', 'Order Period', 'Order Date', 'Order Amount', 'Order Quantity']
df.columns = [col.strip() for col in df.columns]
df = pd.read_excel('Mount Hospital.xlsx', usecols=selected_cols)
df['Order Date'] = pd.to_datetime(df['Order Date'])
print(df.head(50))
# Define a function to find the closest date in the DataFrame to the user's input date
def find_closest_date(df, input_date):
dates = df["Order Date"]
if input_date in dates:
return input_date
else:
closest_date = min(dates, key=lambda x: abs(x - input_date))
return closest_date
# Define a function to select the most recent 'Afternoon' rows for each patient
def select_recent_afternoon(group_data, input_date):
afternoon_rows = group_data[group_data['Order Period'] == 'Afternoon']
if not afternoon_rows.empty:
print(f" Most recent 'Afternoon' row: \n{afternoon_rows.tail(1)}")
return afternoon_rows.tail(1)
else:
print(f" No recent 'Afternoon' rows found")
return pd.DataFrame()
# Define a function to select the most recent row before the input date with the same Patient ID
def select_previous_data(group_data, input_date):
rows_before_input_date = group_data[group_data['Order Date'] < input_date]
if not rows_before_input_date.empty:
afternoon_rows_before_input_date = rows_before_input_date[rows_before_input_date['Order Date'] == 'Afternoon']
if not afternoon_rows_before_input_date.empty:
return afternoon_rows_before_input_date.sort_values(by='Order Date', ascending=False).head(1)
return group_data.iloc[[0]] # Return an empty DataFrame with the same structure as the input DataFrame
# Get user input for the date
input_date_str = input("Enter a date (format: dd/mm/yyyy): ")
input_date = datetime.datetime.strptime(input_date_str, "%d/%m/%Y").date()
# Find the closest date in the DataFrame to the user's input date
input_date = find_closest_date(df, input_date)
# Filter the DataFrame based on the closest date to the user's input date
df_filtered = df[df['Order Date'] == input_date]
# Rename the 'Patient ID' column in df_filtered DataFrame
df_filtered = df_filtered.rename(columns={'Patient ID': 'Patient_ID'})
# Sort the DataFrame by 'Order Date' and 'Order Period' in descending order
df_filtered = df_filtered.sort_values(by=['Order Date', 'Order Period'], ascending=[False, False])
# Group the DataFrame by 'Patient_ID'
recent_data = df_filtered.groupby('Patient_ID').apply(lambda x: select_recent_afternoon(x, input_date)).reset_index(drop=True)
# Get previous data for each patient ID
previous_data = df_filtered.groupby('Patient_ID').apply(select_previous_data, input_date).reset_index(drop=True)
# Print the results
print("\nRecent 'Afternoon' rows:")
print(recent_data)
print("\nPrevious rows:")
print(previous_data)
# Reset the index to avoid MultiIndex
previous_data.reset_index(drop=True, inplace=True)
# Fill NaN values in the 'Patient_ID' column with -1 and convert it to integer type
previous_data['Equipment_ID'] = previous_data['Equipment_ID'].fillna(-1).astype(int)
# Concatenate the most recent data and second most recent data for each patient ID
concatenated_data = pd.concat([recent_data, previous_data], axis=0, sort=False)
# Plot the data for each patient
for index, row in concatenated_data.iterrows():
patient_id = row['Patient ID']
print(f"Plotting data for Patient ID: {patient_id}")
# Plot the data points as a line plot
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(row['Order Date'], row['Order Amount'], color='gray', label='Most Recent Data')
# Check if second most recent data exists for plotting
if not pd.isna(row['Order Amount']):
ax.plot(row['Order Date'] - pd.DateOffset(1), row['Order Amount'], color='orange', label='Previous Data')
# Customize plot
ax.set_xlabel('Order Date')
ax.set_ylabel('Order Amount')
# Set title with patient ID and input date
ax.set_title(f'Order Amount vs Order Date for Patient {patient_id} (Input Date: {input_date_str})')
# Format x-axis with major formatters
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d/%m/%Y'))
plt.xticks(rotation=45)
# Add legend
ax.legend()
# Add grid
ax.grid(True)
# Tight layout
plt.tight_layout()
# Show plot
plt.show()