Hi everybody i have a question.
I’m trying to fine-tune Stable Diffusion XL to take in a sketch or AutoCAD image of a building, along with a prompt specifying the style, materials, and building type. The goal is to generate a realistic image that matches the prompt while preserving the structure of the input sketch.
From my understanding, I can fine-tune the base Stable Diffusion XL model to improve prompt adherence and fine-tune ControlNet to better capture the input image.
My plan is:
Fine-tune the Stable Diffusion XL model using the LoRA algorithm to enhance its ability to follow prompts.
Fine-tune the ControlNet model to ensure it accurately interprets the input sketch.
During inference, I will load the SDXL base model along with the fine-tuned LoRA and ControlNet weights to generate the final output.
Do you guys think this is a correct approach?