Hello, I’m building something I’d love you guys’ perspective on.
It’s a document analysis engine — ingests PDFs, articles, web pages and outputs structured action briefs with scored recommendations. Not a chatbot. One input, one brief. I’ve got the ingestion pipeline and a rule-based analysis layer working end-to-end (FastAPI, OCR, web scraping). It is how it was ordered by our Boss to build.
The next phase I’m in now needs real NLP: sentence embeddings for computing semantic similarity between documents, spaCy NER for extracting entities and source authorities, and eventually fine-tuning for finance/regulatory content.
I’m learning the tools myself but would really value your guidance on key architectural decisions — model selection, pipeline structure, that kind of thing.
It’s a real product for a company, targeting banking and mining professionals. And it’s the project I want to kick-off my ML career around.
Would any of you please help guide me on how to maneauver about bringing in both RAG and Context Engineering into this project?
4_5852803276298263723.docx (33.8 KB)
4_5852803276298263722.pdf (88.4 KB)