Medical concepts extraction from documents through LLM finetuining

krishnareddy_Nandyal · July 20, 2023, 5:25pm

After completing the week 2 course, I would like to finetune a LLM model that can extract clinical concepts (NER) from medical documents which are written in free text. How can I prepare a dataset to finetune any openly available clinical related based model something like clinicalBert for this task. any example over finetuning llm models for NER task would be helpful

Juan_Olano · July 20, 2023, 11:34pm

Hi @krishnareddy_Nandyal ,

First thought is: most probably it already exists.

Second thought: to do it, you will need to create (or find) a robust dataset (1000s of samples) with at least these two parts each sample: Sentences from medical document written in free text" and “the medical terms in that sentence”.

IF you find or create such dataset, you can fine-tune a model like DistilBERT or others and get good results (85% precision probably). But it will all depend on the quality and size of the dataset.

As for the HOW: You can probably explore this fine tuning using Peft with LoRA as shown in the course.

Hope this helps!

Let me know how it goes or if you have more questions.

krishnareddy_Nandyal · July 21, 2023, 4:18am

Thank you @Juan_Olano for the suggestions, I will look into this

krishnareddy_Nandyal · August 9, 2023, 5:39am

Hi @Juan_Olano ,

Once again thank you for the previous response. Recently I am able to collect dataset for clinical information extraction from medical free text. Following is the one of the sample from my data set.

[Input] Clinical text: “Patient admitted with abdominal pain, nausea. and denied any chills or fever. Patient also has history of appendectomy.”

[Output] Clinical Concepts: I would like to see output in following Json format and i am able to prepare my entire training dataset as follows
[
{“concept”:“abdominal pain”, “history”:“false”, “sentiment”:“positive”},
{“concept”:“nausea”, “history”:“false”, “sentiment”:“positive”},
{“concept”:“chills”, “history”:“false”, “sentiment”:“negative”},
{“concept”:“fever”, “history”:“false”, “sentiment”:“negative”},
{“concept”:“appendectomy”, “history”:“true”, “sentiment”:“positive”},
]

Now I would like to prepare this samples for fine tune llama2 , but I am unable to get how i can convert this data set as accepted by huggingface transformers trl library. can you please help me on how we can do this ? I mean i need this samples in the format of llma2 can accept as a training dataset.

As you said there may be already available models for this, but i want to do it this using llama2.

Thanks
Krishna Reddy.

Juan_Olano · August 9, 2023, 9:35pm

This is looking interesting!

Check out this dataset:

This dataset is in the format accepted by Llama2 for fine-tuning.

Just take the first record of the dataset and you’ll discover that the syntax goes along these lines:

<s>[INST] <INPUT>[/INST] <OUTPUT></s>

Try this and let me know how it goes!

krishnareddy_Nandyal · August 22, 2023, 3:41am

Dear @Juan_Olano,

I want to express my gratitude for your valuable suggestions regarding dataset preparation. Following your guidance, I have successfully prepared the dataset. It is clear that for fine-tuning, adherence to a standard format is essential. While the format need not be specific, consistency throughout the training dataset samples is imperative.

I attempted fine-tuning using the llama2 7B model. Unfortunately, the results did not meet our expectations. I recall hearing that fine-tuning may not enable the acquisition of entirely new domain knowledge but rather focuses on altering the response format. Could you kindly confirm if this is accurate? Furthermore, do you recommend trying a larger model, such as the 13B or 70B, for potentially improved results?

Your insights on this matter would be greatly appreciated

Juan_Olano · August 22, 2023, 10:01pm

Hi @krishnareddy_Nandyal !

I am glad that you were able to move in the right direction! I understand that there’s still a long way to go. This would be my advise:

Fine-tuning. This will improve several aspects of the model that will definitively impact the quality of the output: Format, Style, and to a certain extent domain knowledge. But don’t trust this last one - it is just like the model acquires the ‘flavors’ of the domain, and it will be very dependent on the quality of your training set. Testing with larger models may bring some benefits, but I would stick to the 7B until all options are tested here. What other options? see items 2 and 3 below.
Prompt engineering. Now that you have a fine-tuned model, your prompts will behave better. Make sure that you provide the appropriate context on every call, and try different prompts to guide your model to the expected outputs.
Information retrieval. This one is fundamental. With your data, create a vector database. This will be the perfect context for item 2 above (prompting).

This is the perfect trifecta. If/when you combine these 3 tools, you will get the best possible result from your LLM system.

Please ask any question and I’ll be happy to dig deeper.

Thanks,

Juan

Adeel_Hasan · August 24, 2023, 8:03am

can u provide the link of data so that others can use it

krishnareddy_Nandyal · August 25, 2023, 6:47am

Hi @Adeel_Hasan currently this dataset contains patient sensitive information, so not possible for us to open source the dataset at the moment.

krishnareddy_Nandyal · September 7, 2023, 7:48am

Hi @Juan_Olano

I have a question about training and preparing data for a model. Specifically, when we have our training and validation datasets organized as a single text column in a standard format ex: <s>[INST] <INPUT>[/INST] <OUTPUT></s>, how does the model understand and train on this single column as input? It seems like the model needs to know what the input is and what the corresponding output should be in order to compare results. Am I missing something here?

Please check following notebook about training the llama2 7B over some public Q&A dataset. in the script they combining both question and answer in a prompt and using this for training. But in our course we have prepared both input and labels columns to check quality measures using ROUGE score. How do we calculate ROUGE scores for this notebook. my concern even when model getting training how it validate if dataset contains both input and output in same column?

https://github.com/krishnareddyML/llm-notebooks/blob/main/sft_llama2.ipynb

Thanks,
Krishna Reddy

Juan_Olano · September 7, 2023, 1:12pm

When you train an LLM, the input of the training is text and a label. The text format is usually dependent on the specific model you are training. Llama2, Bert, FlanT5, each have a preferred way to receive their prompts for training. These can be reviewed in their respective documentations in Huggingface. And as with any model that is trained, you have the label. In the case of a transformer being trained for a classification task, the label will be the class of the prompt (the ground truth). In the training loop, you pass the training data (in batches) and the model responds with its inference. This output (the inference) is used to calculate the loss and do the weights adjustments. If you are using a Huggingface library (the transformers library), all this is done for you behind the scenes.

You are in control of this. You load the dataset and you create the data loader. In the data loader you specify how the training data is formatted. Check out this section in the notebook of the lesson. You will see there how the data is prepared for the model.

krishnareddy_Nandyal · September 7, 2023, 1:23pm

Thank you @Juan_Olano. I got it now. the input will be used for both input and output.
just sake of more clarification for any one please check following link

github.com/huggingface/trl

Correct way to incorporate train and validation datasets in SFTTrainer

opened 02:55AM - 02 Aug 23 UTC

karths8

I used the following [script](https://gist.github.com/younesbelkada/9f7f75c94bdc…1981c8ca5cc937d4a4da) that uses SFTTrainer to train Llama-2 on my own dataset using QLoRA. I am quite stuck on how to format the validation dataset in this case. From this [link](https://www.philschmid.de/instruction-tune-llama-2) it seems apparent that the train_data for the SFTTrainer would have a format similar to this: ``` ### Instruction: {sample['instruction']} ### Input: {sample['input']} ### Response: {sample['response']} ``` First of all, I was curious why this was the format to be passed into the Trainer. How does the SFTTrainer work behind the scenes to turn this into an input/output pair suitable for training? Secondly, I wanted to know how I should format and pass in my validation dataset in the proper manner Please let me know!

Bilal_Ahmadai · September 12, 2023, 1:07pm

Hi Krishna, Thanks for sharing the notebook it really helped me. Would you please guide me to being an expert in fine-tuning models with having good knowledge of how things work so I can easily understand problems solve those fine-tuning parameters and prepare a dataset for the specific problem?

krishnareddy_Nandyal · September 13, 2023, 7:18am

Sent Direct Message, Please check hope that will help.

Topic		Replies	Views
What if we train model on plain text , with no instructions Generative AI with Large Language Models week-3	2	426	August 2, 2023
How to create dataset on a specific topic to fine tune llm? Finetuning Large Language Models	0	184	November 27, 2023
How to create a dataset from the excel or pdf files and fine tune the LLM for a specific task AI Discussions ai-discussions , data-centric , llm	1	495	June 27, 2024
Finetuning logs AI Discussions	0	99	September 22, 2023
Fine tuning for health article Q&A Generative AI with Large Language Models week-2	1	276	January 9, 2024

Medical concepts extraction from documents through LLM finetuining

Related topics