How to apply NLP and AI, into this project

Hello, I’m building something I’d love you guys’ perspective on.

It’s a document analysis engine — ingests PDFs, articles, web pages and outputs structured action briefs with scored recommendations. Not a chatbot. One input, one brief. I’ve got the ingestion pipeline and a rule-based analysis layer working end-to-end (FastAPI, OCR, web scraping). It is how it was ordered by our Boss to build.

The next phase I’m in now needs real NLP: sentence embeddings for computing semantic similarity between documents, spaCy NER for extracting entities and source authorities, and eventually fine-tuning for finance/regulatory content.

I’m learning the tools myself but would really value your guidance on key architectural decisions — model selection, pipeline structure, that kind of thing.

It’s a real product for a company, targeting banking and mining professionals. And it’s the project I want to kick-off my ML career around.

Would any of you please help guide me on how to maneauver about bringing in both RAG and Context Engineering into this project?

4_5852803276298263723.docx (33.8 KB)

4_5852803276298263722.pdf (88.4 KB)