How can I resolve a blank plot issue when the debugging output is EmptyDataFrame?

Question: How can I resolve a blank plot issue when the debugging output is EmptyDataFrame?

When I execute the debugging commands provided by,

# Print the results
print("\nRecent 'Afternoon' rows:")
print(recent_data)
print("\nPrevious rows:")
print(previous_data)

Output:


Enter a date (format: dd/mm/yyyy): 10/09/2023
 No recent 'Afternoon' rows found
 No recent 'Afternoon' rows found
 No recent 'Afternoon' rows found
 No recent 'Afternoon' rows found
 No recent 'Afternoon' rows found
 No recent 'Afternoon' rows found
 No recent 'Afternoon' rows found
 No recent 'Afternoon' rows found
 No recent 'Afternoon' rows found
 No recent 'Afternoon' rows found
 No recent 'Afternoon' rows found
 No recent 'Afternoon' rows found
 No recent 'Afternoon' rows found
 No recent 'Afternoon' rows found
 No recent 'Afternoon' rows found
 No recent 'Afternoon' rows found
 No recent 'Afternoon' rows found
 No recent 'Afternoon' rows found
 No recent 'Afternoon' rows found
 No recent 'Afternoon' rows found

Previous rows:
    Patient_ID    Order Period Order Date  Order Amount  Order Quantity
0            432          morning      2023-09-05     3203.84         7
1            433          morning      2023-09-05     1596.74         5
2            434          morning      2023-09-05     1732.09         7
3            435          morning      2023-09-05     3113.25         5
4            436          morning      2023-09-05     2213.22         7
5            437          morning      2023-09-05     2231.64         7
6            438          morning      2023-09-05     3158.68         5
7            439          morning      2023-09-05      785.62         7
8            440          morning      2023-09-05     1778.42         7
9            441          morning      2023-09-05     2161.15         5
10           442          morning      2023-09-05     1173.44         7
11           443          morning      2023-09-05      816.47         4
12           444          morning      2023-09-05      608.74         7
13           445          morning      2023-09-05     2416.47         7
14           446          morning      2023-09-05     2062.04         7
15         13218          morning      2023-09-05     3122.52         7
16         13219          morning      2023-09-05     2520.31         7
17         13220          morning      2023-09-05     2394.10         7
18         13221          morning      2023-09-05      471.54         3
19         13222          morning      2023-09-05     2901.95         5


Image of the data:

This is not the result that I intended. There are four steps that I hope to achieve.

  1. I hope to call out rows that have a timestamp that is closest to the user’s input date, which may come in two rows as per patient ID.
  2. I hope to pick out the row that has Order Period == Afternoon. For instance, in the image that I posted. I would pick row 426 for patient ID = 13220, row 429 for patient ID 445, and row 432 for Patient ID = 446. These rows of data points would be grouped as Recent data, as the user’s input date is 10/09/2023 which closely matches with the recent data of timestamp 2023-09-05.
  3. Next, would be to call out the associated patient ID that has been called out by the user’s closest input date by tracing back to the previous timestamp. Using the image as an example, I would pick row 425 since row 426 has been called out, and the fact that I need rows with Order Period == Afternoon. Row 419 and 422 would also be called out. The same procedure applies to the other rows not shown in the image. Together they would be concatenated as the previous data by weaving together with the respective ID.
  4. For each unique Patient ID profile, I hope to create a trend plot by integrating the recent and previous data.

I am aware that I made a few errors in the code function, most likely in the select_previous_data, select_recent_afternoon, and find_closest_date functions. How can I get it to work?

My full code:

import pandas as pd
import datetime
import ipywidgets as widgets
import numpy as np
from sklearn.linear_model import LinearRegression
from IPython.display import display
import matplotlib.pyplot as plt
import urllib
import matplotlib.dates as mdates

selected_cols = ['Patient ID', 'Order Period', 'Order Date', 'Order Amount', 'Order Quantity']

df.columns = [col.strip() for col in df.columns]
df = pd.read_excel('Mount Hospital.xlsx', usecols=selected_cols)

df['Order Date'] = pd.to_datetime(df['Order Date'])

print(df.head(50))


# Define a function to find the closest date in the DataFrame to the user's input date
def find_closest_date(df, input_date):
    dates = df["Order Date"]
    if input_date in dates:
        return input_date
    else:
        closest_date = min(dates, key=lambda x: abs(x - input_date))
        return closest_date


# Define a function to select the most recent 'Afternoon' rows for each patient
def select_recent_afternoon(group_data, input_date):
    afternoon_rows = group_data[group_data['Order Period'] == 'Afternoon']
    if not afternoon_rows.empty:
        print(f" Most recent 'Afternoon' row: \n{afternoon_rows.tail(1)}")
        return afternoon_rows.tail(1)
    else:
        print(f" No recent 'Afternoon' rows found")
        return pd.DataFrame()


# Define a function to select the most recent row before the input date with the same Patient ID
def select_previous_data(group_data, input_date):
    rows_before_input_date = group_data[group_data['Order Date'] < input_date]
    if not rows_before_input_date.empty:
        afternoon_rows_before_input_date = rows_before_input_date[rows_before_input_date['Order Date'] == 'Afternoon']
        if not afternoon_rows_before_input_date.empty:
            return afternoon_rows_before_input_date.sort_values(by='Order Date', ascending=False).head(1)
    return group_data.iloc[[0]]  # Return an empty DataFrame with the same structure as the input DataFrame


# Get user input for the date
input_date_str = input("Enter a date (format: dd/mm/yyyy): ")
input_date = datetime.datetime.strptime(input_date_str, "%d/%m/%Y").date()


# Find the closest date in the DataFrame to the user's input date
input_date = find_closest_date(df, input_date)


# Filter the DataFrame based on the closest date to the user's input date
df_filtered = df[df['Order Date'] == input_date]


# Rename the 'Patient ID' column in df_filtered DataFrame
df_filtered = df_filtered.rename(columns={'Patient ID': 'Patient_ID'})


# Sort the DataFrame by 'Order Date' and 'Order Period' in descending order
df_filtered = df_filtered.sort_values(by=['Order Date', 'Order Period'], ascending=[False, False])


# Group the DataFrame by 'Patient_ID'
recent_data = df_filtered.groupby('Patient_ID').apply(lambda x: select_recent_afternoon(x, input_date)).reset_index(drop=True)


# Get previous data for each patient ID
previous_data = df_filtered.groupby('Patient_ID').apply(select_previous_data, input_date).reset_index(drop=True)


# Print the results
print("\nRecent 'Afternoon' rows:")
print(recent_data)
print("\nPrevious rows:")
print(previous_data)


# Reset the index to avoid MultiIndex
previous_data.reset_index(drop=True, inplace=True)

# Fill NaN values in the 'Patient_ID' column with -1 and convert it to integer type
previous_data['Equipment_ID'] = previous_data['Equipment_ID'].fillna(-1).astype(int)

# Concatenate the most recent data and second most recent data for each patient ID
concatenated_data = pd.concat([recent_data, previous_data], axis=0, sort=False)


# Plot the data for each patient
for index, row in concatenated_data.iterrows():
    patient_id = row['Patient ID']
    print(f"Plotting data for Patient ID: {patient_id}")

    # Plot the data points as a line plot
    fig, ax = plt.subplots(figsize=(10, 6))
    ax.plot(row['Order Date'], row['Order Amount'], color='gray', label='Most Recent Data')

    # Check if second most recent data exists for plotting
    if not pd.isna(row['Order Amount']):
        ax.plot(row['Order Date'] - pd.DateOffset(1), row['Order Amount'], color='orange', label='Previous Data')

    # Customize plot
    ax.set_xlabel('Order Date')
    ax.set_ylabel('Order Amount')
    # Set title with patient ID and input date
    ax.set_title(f'Order Amount vs Order Date for Patient {patient_id} (Input Date: {input_date_str})')
    # Format x-axis with major formatters
    ax.xaxis.set_major_formatter(mdates.DateFormatter('%d/%m/%Y'))
    plt.xticks(rotation=45)

    # Add legend
    ax.legend()

    # Add grid
    ax.grid(True)

    # Tight layout
    plt.tight_layout()

    # Show plot
    plt.show()


Hello @Amelia-Analyst

Based on your image of data and your codes, the error is actually justified. you have created order date from the input date which has a format of format: dd/mm/yyyy): and the image clearly shows the data only include 12 a.m. or 00:00:00, hence there are no afternoon row and gives you an Empty data frame.

can I know why are you looking afternoon row data or basically by dividing the order in morning and afternoon what are you trying to achieve.

Regards
DP

Hi mentor,

This is the format in which the system’s database is displayed in the image that I uploaded. I must process the excel file instead of making any changes to the data that the system generates. My task is to develop the function find_closest_date so that I can filter out the data that has timestamps that are close to the date that the user entered. I selected Order Period == Afternoon because, in order to plot a trend and analyze the patient’s data, which is being called out by the ID, I need to concatenate the most recent data of rows with Order Period == Afternoon using the select_previous_data and select_recent_afternoon functions. Based on the date that is closest to the user date entered, I will analyze the patient dataset. The reason I chose Order Period == Afternoon is that I need to use the select_previous_data and select_recent_afternoon functions to concatenate with the most recent data of rows with Order Period == Afternoon in order to analyze the patient’s data, which is being called out by the ID, based on the date that is closest to the user date entered. I also have to make recommendations individually to the dietitians overseeing the food distribution for the afternoon.

I hope this clears up your queries.

Hello @Amelia-Analyst

The way you have converted order period into afternoon from the original dataset seems not clear.

the original dataset provided by you shows order period with only morning but you used the input date as well as order period to create the afternoon rows but I am not clear if this was assigned on what basis. Although usual way would be using the date format of the input date to assign based on the timeline givens and based on your image it shows there are no afternoon timeline data. So kindly elaborate how the group data step you created is creating afternoon rows, according to me the data frame you used with group data, actually shows you would end up getting empty data frame.

Regards
DP

Mount Hospital Ward.xlsm - Sheet1.csv (36.7 KB)

Since the excel file cannot be uploaded, the CSV file is attached. I don’t exactly grasp what you mean. I try to respond according to my best understanding. Actually, the order date is simply represented in the format chosen by the original vendor. The way the data is presented is not within my control. It’s possible that you saw that the AM portion of the ‘Order Date’ column, as you correctly pointed out. In order to extract data that closely matches the timestamp the user entered for the input, I attempted to develop the find_closest_date function.

I acknowledge that the database creates the information based on two parameters, “Order Date” and “Order Period,” over which I have no control. Using the ‘Order Period’ column as my primary focus, I want to identify rows where the Order Period == Afternoon for both the recent and historical data, then plot the data points collectively to show a trend pattern.

Based on the excel sheet provided, the afternoon and morning column comes under column ID but you have defined afternoon row using the order date which was again formatted using input date based on your muddy hammy format.

Kindly format using column ID to get the afternoon rows.

Regards
DP

I found the mistake and corrected it. I value your advice. Much appreciated it.

Best wishes,
Amelia

1 Like

Kindly explain how you resolved your issue.

To ensure that the datetime format matches the user input_date, I used pd.to_datetime and added this line. pd.to_datetime(df[‘Order Date’], errors=‘coerce’, dayfirst=True)

This is the root cause of the problem.

Best wishes,
Amelia

1 Like