Working with multiple datasources

When I look at the code it seems feasible. E.g. I could extract (load, split, tokenize) information from all my datasources and put it into a single chroma vectorDB. Then I can point to it for queriying and retrieval.

I want to know any technical caveats in this approach, e.g. working of splits from different data sources together etc. before trying it out.

Please let me know.

Please change the way your question is framed.

There are a lot of things that can go wrong when using even a single data source.
It’s better to ask a specific question along lines of “I did X and observed Y instead of Z. Here’s more information about it:”. This way, it’s possible for someone with specific knowledge on the topic to help you out.

Good luck.

@balaji.ambresh good point. I am thinking of a use case of generating weekly / monthly / quartery / annual reports for employees using data sources like confluence, jira (I know Langchain still doesn’t have a loader for this), email, sharepoint (need to investigate about available loaders), SQL DB.

However as you have rightly pointed out there are issues with even single datasource (e.g. refer my another post on PDF resume). I want to check at this time whether my use case is overambtitious or should I continue investigating, e.g. datasource by datasource or any other suggestions etc.

Python has libraries for interacting with email servers. Jira offers a REST API. I don’t know about Sharepoint.

Sorry to mention this but your question is too broad at an architectural level. Please consider hiring a freelancer for bootstrapping your project.