I am looking for a way to extract highlights from a long body of text. The input is a full document, and the output is a sequence of references into the text (e.g. lines 100-130 and lines 200-220, …), that represent the most interesting/relevant/important parts of the document (based on some subjective relevance criterion that can be trained for specifically, or expressed in a text prompt).
Can this be achieved with GPT-4 by some clever prompt engineering? Are there any task-specific models for this?
MIT and Columbia researchers found that the best way to obtain a high-quality summary of a text, is to ask the LLM to improve its own output by adding information that corresponds to certain criteria. All of this is done by applying one, single prompt.
Article: [Insert article here]
You will generate increasingly concise entity-dense summaries of the above article. Repeat the following two steps 5 times.
Step 1: Identify 1-3 informative entities (delimited) from the article which are missing from the previously generated summary.
Step 2: Write a new denser summary of identical length which covers every entity and detail from the previous summary plus the missing entities.
A missing entity is:
- Relevant: to the main stories.
- Specific: descriptive yet concise (5 words or fewer).
- Novel: not in the previous summary.
- Faithful: present in the article.
- Anywhere: located in the article.
Guidelines:
The first summary should be long (4-5 sentences, ~100 words), yet highly non-specific, containing little information beyond the entities marked as missing. Use overly verbose language and fillers (e.g., “this article discusses”) to reach ~200 words.
- Make every word count. Rewrite the previous summary to improve flow and make space for additional entities.
- Make space with fusion, compression, and removal of uninformative phrases like “the article discusses”.
- The summaries should become highly dense and concise, yet self-contained, e.g., easily understood without the article.
- Missing entities can appear anywhere in the new summary.
- Never drop entities from the previous summary. If space cannot be made, add fewer new entities.
Remember: Use the exact same number of words for each summary.
Prompt 1: My goal is to understand the most interesting/relevant/important parts of the document. Write 10 different prompts for me to reach this goal, and the outputs that are generated from these prompts. Then, evaluate these prompts out of 100 (100 is highest quality, 0 is lowest), according to the following criteria: relevance to a professional in the [field], tangibility, and clarity.
_After result, try this second prompt: Prompt 2: Based on this output, generate 3 prompts and their outputs, aiming to maximise the score in all three criteria.
This method offers multiple advantages:
Reduces your cognitive load: You can start with only a vague idea of what you want, and you’ll only need to write the starting prompt, Speed: It drastically accelerates how fast you can find great prompts, Quality: The LLM will experiment and learn how to design prompts to get the best results.