Obtaining Labels for Fine-Tuning LLMs

In practice, it is probably difficult or time consuming to manually label a dataset of hundreds of examples of task-specific items for fine tuning an LLM. What are some ways or best practices to get these labels, if there is a shortage of labelers or there is simply too many examples to label?

1 Like

Yes, it is. That’s why crowd-sourcing was a very popular method.

I see. What about in the typical company setting, where a team wishes to fine-tune a model using the data they have gathered for their task, but there are only perhaps 1-3 people who can manually label the data? Let us also assume there are no existing datasets out there and no one else in the company has the time to help in this endeavor.

They’re going to have to work long hours, or recruit some new team members, or manage a crowd-sourced labeling project.

I think you may be already suspecting it: The company has to either have these 1-3 people work on this, or hire people to do it. Or both.

And since we are in the topic, labeling is art and science, and requires a very clear set of instructions. Human labelers have different backgrounds, biases, understandings, points of view, so the same sample can be one class for some, and another class for others. So it is very important to have super detailed instructions, and to do cross-validation (same sample labeled by multiple labelers).

There are platforms that assist in the labeling process, and I think it is important to use one of these platforms, because they already contain a lot of ‘knowledge’ on the process of labeling.

One other factor to consider. Currently there is a trend in data labeling to rely on chat-bots to automate the labeling process.

This is fraught with peril, as chatbots are simply language models. Their results can pollute your data set with hallucinations or outright lies.

The industry is struggling with this dilemma, as it can also impact a crowd-sourcing project if the participants resort to chatbots to do their work.