How to design a writing assistive system that can do writing style guide check like how Grammarly does for gramma checking

I have a style guide that specifies the use of punctuations, line length, and how to break up a line (e.g. before prepositions). I have a dataset that contains original text and the version of the text are conformed to this style guide.

Now, how do I design a system that can recommend to user how to conform a given text to the style guide? More specifically,

  • How do I frame this task as? Is this a type of sequence-to-sequence modeling task?
  • Is fine-tuning a LLM with my data a good approach? How to deal with hallucinations?
  • How can this system also tell users that what style guide rules are used in the recommendations?

Any pointers to how such system is designed or relevant research papers would be much appreciated!

2 Likes

Hello @Wayne_Yang - Welcome to the community :slight_smile:

  1. Task Framing:
  • This task can be framed as a sequence-to-sequence modeling task where the input sequence is the original text, and the output sequence is the text conformed to the style guide. Each token in the input sequence corresponds to a word or punctuation mark, and the output sequence contains the corrected version of each token.
  • You can also frame it as a text generation task, where the system generates the recommended revisions directly based on the style guide rules.
  1. Approach Considerations:
  • Fine-tuning a Large Language Model (LLM) such as GPT with your dataset is a good approach. LLMs have shown remarkable performance in various natural language processing tasks and can capture complex patterns in text.
  • To deal with hallucinations, you can impose constraints during training to encourage the model to generate outputs that adhere to the style guide rules. For example, you can penalize outputs that violate punctuation rules or line length constraints.
  1. Handling Style Guide Rules:
  • Encode the style guide rules as constraints or conditions during training. For example, you can provide the model with examples of how to break up lines before prepositions and encourage it to generate similar outputs.
  • Post-process the generated text to identify and highlight which style guide rules were applied in the recommendations. This could be done by annotating the text with tags indicating the specific rules that were followed.
  1. System Design:
  • The system could consist of several components, including:
    • Preprocessing: Tokenization, data cleaning, and formatting.
    • Model Training: Fine-tuning the LLM with your dataset and style guide rules.
    • Inference: Generating recommendations for conforming text to the style guide.
    • Post-processing: Identifying and highlighting style guide rules applied.
  • You’ll also need a user interface to allow users to input text and receive recommendations.
  1. Research and Resources:
  • Look into research papers on sequence-to-sequence modeling, text generation, and style transfer for relevant techniques and methodologies.
  • Explore existing writing assistance tools and their methodologies for inspiration and insights.
  • Consider studying papers on constraint-based generation and rule-based text processing for handling style guide rules.

By following these guidelines and considering the specific requirements of your task, you can design an effective system for recommending text conformations to a style guide.

1 Like

Thanks for your reply, @jayambe36 .
Am I talking to ChatGPT 4 here? :stuck_out_tongue:

1 Like

I would guess so, given the characteristic formatting (numbered bold subjects that summarize topics using 2- to 3-item bulleted lists, with a summary that provides no additional information).

1 Like