Programming Assignment: Question Answering

Sorry, I have a question.
How to I can add personal data for training?
Personal data that’s from excel file, pdf file, powerpoint file, SQL data,…

1 Like

Hi @Tr_ng_Hoang_M_nh

That’s a very broad question and very dependent on your approach/application. The broad category you are asking about is called “Data Engineering” and there are many aspects when it comes to datasets (even a full blown Specialization course would not cover majority of it, let alone the post on some forum).

But to answer your question quickly, I would suggest:

  • There’s a Python pandas library which has a method read_excel, it could help you very much with most cases.
  • There are many Python libraries (PyPDF2, pdfplumber, pdfminer to name a few) for parsing pdfs and in my experience they are very different and very pdf document specific (language sensitive, non-text element sensitive, secured or not, etc.). I would suggest to try a couple of them and see which parses your documents best.
  • I have no experience with this, but I’m sure there are options for it, just search the internet.
  • Again, it is dependent on the database system at hand (MS Access, MySQL, SQLite or others) and there are a lot of different solutions for many of them. (For example, you can read MS Access database by creating odbc connection (an example), mysql case, sqlite case).

Cheers

1 Like

Thanks for your answer.
I want to make a chat bot that support for engineering when problem solving.
The Bot can answer any question as chat GPT but
train data are from company data from technical report, previous problem, previous report,…
I know can use API as open AI API but at my company, it is prohibited connection.

1 Like

If you can’t use existing tools, you’re going to re-create a very expensive wheel.

1 Like