Predicting the end of the company’s compensatory time balance

"Hi everyone!

At the company where I work, we have over 150 employees who clock in and out daily. They log their times for both the morning and afternoon shifts. For example: an employee clocks in at 7:10 AM, clocks out for lunch at 11:30 AM, clocks back in at 1:00 PM, and clocks out for the day at 5:30 PM.

This generates a massive dataset of timesheet records and comp time balances. Every six months, the company settles these balances. If an employee has a positive balance (accumulated overtime), they get paid for those extra hours. If they have a negative balance (undertime), the company deducts the missing hours from their paycheck.

Ideally, the company wants the overall comp time balance to be exactly zero at the end of every six-month period to minimize overtime costs.

With that in mind, I had the idea of creating a machine learning project (I was thinking about time series forecasting) to identify patterns in how employees clock in and out. The expected outcome would be a model that says: ‘If employees keep clocking in and out following their current patterns, the final balance at the end of the six-month period will be X.’

Additionally, I would be able to pinpoint the main employees contributing to excessive positive or negative balances. This would allow us to flag the issue to their managers so they can step in and adjust the work schedules.

Do you guys think it’s possible to build a reliable forecast with this kind of dataset? Does this project make sense?"

I’m struggling to convince myself that 150 records per day for 6 months is a complex enough data set to warrant training and testing a machine learning model. You used historical data to test ideas in a spreadsheet first, right? And it was insufficient because why?

Identifying individuals who are running over or under plan seems completely straightforward, as does computing the aggregate (and my understanding is that you don’t mind paying overtime to some as long as the net is zero) and comparing that against plan.

Is it really more complicated than simple algebra can handle?

Hi @gelsonwj

Nothing is impossible but probably before you plan your creative idea, approach should be first create a rough layout, your achievable targets, your company manager’s(stakeholders) expectations and limitations towards employees workable shift.

Remember no matter what you are dealing with robots here, you are dealing with human workforce, so will your machine learning incorporate human based intellectual variability. For example for a female workforce, menstrual cycles, pregnancy will be critical factor affecting their workable shifts where as for male critical factors would be health, family emergency.

How do you plan to incorporate all of these factors in your idea.

As @ai_curious mentioned you are also going to handle huge dataset, so before using that dataset, you need analyse the dataset into different! groups, so that could be further used in your creative outlet.

For example if you have timeout factor of being zero, you also have negative and positive time-balance factor. So create a data (SQL) Chart that handles all these 3 variation, then also compare gender based variation. Then not to forget festivities and weekend factors.

when you have done the above work, you will know if this creative. pursuit is worth time and money, or a researchable paper presentation. But mind it, analysis this data can actually find critical factors on how to make your company more work oriented and comfortable for your employees to improve your workforce as other factors like salary, high authorities behaviour, office politics plays more critical factors in workable shifts than any other reasoning i mentioned earlier :grimacing::rofl:

Regards

Dr. Deepti

Sorry if I wasn’t clear above. I meant to suggest that one value per day per employee for 6 months (150 * 5 * 4 * 6) is actually a rather small and simple data set. My impression is the goal as stated is achievable with a simple equation and could be done in Excel. Maybe there is more to it than I understand. Is there seasonal variation or drift or some other repeating (and thus discoverable and predictable) pattern? That targeted statistical analysis cannot reveal (meaning you don’t know what you don’t know)?

I would invest in exploratory data analyses to demonstrate the need before I embarked on any real machine learning project. Cheers

My response was exactly to explore on data first before riding on machine learning project for this creative idea

The data is small if only consider the timing factors like in-time, out-time, lunch time, but that would be incomplete if one doesn’t address or look into all factors that affect or reason out the workable shift.

regards

Dr. Deepti