How to tackle problem with nlp dataset

i have a text data-frame with 10000 rows for regression task,
the data-frame columns are text and value
i tried training a transformer to predict the value for given text, but as the dataset is short and biased in middle range of values its performance is very bad for text with extreme values.
how can i tackle this issue, as dataset is one of a kind…

Hi @karthik_rathod

Welcome to the community.

I was wondering that it might be more suitable for your task using a classification technique instead.

Could you clarify your business problem in order to me to understand why are you using a regression approach?

Remember, the LLMs are a language model.


hi @elirod
The aim of my project is to predict VAD(emotions ) values of the sentence which range from 0 to 5 in continuous manner eg. 0.4,0.7,2 3,2.6,3.4,4.5… . As they are continues classification cannot help out.

Hi @karthik_rathod

Thanks for your update.

I got to be honest with you, this is a quite challenge task.

Modifying the loss function to penalize errors on extreme values it is one valuable approach, i guess.

I worked before in a similar project but i used a Word2Vec package.

Never the less, i wondering if the transformers is the ideal architecture for this task.

Take a look at this paper. Maybe it can help you with your project:

So sorry for not be able to help you in a short time.

I’m very interesting on the results of your work. If you don’t mind to share how do you solve the problem and what approaches you tried until achieve betters results, I would be very grateful.

Best ragards