I have a dataset of leads. Every lead, along wiith other features, has responsible agent. Some of leads are successful, some of them are unsuccessful.
I want to build an algorithm to distribute leads to those managers so the conversion rate from lead to successful will increase.
I don’t even know where to dig. I think about clustering - to combine leads into clusters then to calculate rate of success for managers for each cluster. Or to make a decision trea (random forest or xgboost)
I think clustering is a pretty popular way. You may want to google about recommendation system such as movie-user recommendation and modify it into lead-agent recommendation.
PS: the Machine Learning Specialization Course 3 Week 2 is about recommendation. You may audit the course to watch the videos for some ideas.
Agent becomes the Y. Success or failure becomes one more attribute of the X, lets call it ‘Won/Lost’, where 1 = won and 0 = lost. Then you feed a neural network which ends in a SoftMax with n units, each unit representing an agent.
Train the model with your data.
Next time you have a lead, run it through the model, setting the won/lost attribute to ‘1’ (won) - the SoftMax may provide the agent with the highest chance of closing the lead with a sale.
I would really want to know what you end up doing - I think this is a very interesting case.
leads have attributes like Country, Language, Source, and google tags like Gender, Income (rich, poor) and etc. Actually, it’s pretty rich dataset per lead.
actually, leads and sales agents are both vectors with some mutual features, like language, country and etc. We see that some agents are greater at closing certain type of leads, like rich clients.
Have caution about assuming too early which factor is predictive or causal. Are some leads more likely to result in a close regardless of which agent gets them? Just because one agent closed a lead (ie success) doesn’t mean another agent couldn’t have also closed that opportunity, or done more cross- or upselling. Or closed sooner, etc. Your client also may be interested in scaling, and just giving all the leads to the one agent that closes the most doesn’t scale. So how can you elicit what differentiates success from failure other than just the agent’s id? Look for root causes. Maybe what really impacts conversion is better qualified leads before an agent is even assigned, etc.
I would certainly try the idea of converting the agents as the units in a softmax, and moving all other attributes to the input X of a model.
One challenge will be to complete the missing data. Many of the attributes that google and facebook and other platforms provide are incomplete. For instance, gender: I’ve seen that high percent of the cases have an N/A on gender. Same for other attributes.