ChatGPT Document Processor

The provided project is a Flask-based web application that implements a language-driven document retrieval and question-answering system. This system leverages various components to process uploaded documents and respond to user queries.

Key Features:

  1. Document Upload and Processing:
  • Users can upload documents through the web interface.
  • The system handles document uploads using Flask’s file handling capabilities.
  • Upon upload, the system processes the document, extracts text using the PyPDFLoader, and splits it into chunks for efficient processing.
  1. Text Embeddings and Vector Storage:
  • The system utilizes OpenAI’s GPT-3.5 Turbo model for generating text embeddings.
  • Embeddings are then stored in a vector store created using the FAISS library.
  • The vector store facilitates efficient document retrieval based on semantic similarity.
  1. Document Retrieval and QA:
  • The system supports querying using a natural language question or phrase.
  • The question is processed using the OpenAI language model (LLM) and passed through a retrieval QA chain.
  • The chain utilizes the FAISS-based vector store to identify relevant documents and provides an answer based on the stored embeddings.
  1. Web Interface:
  • The web interface includes an index page (“index.html”) to serve as the main entry point.
  • Users can interact with the system through a web browser, uploading documents and submitting queries.
  1. Persistence and Local Storage:
  • Uploaded documents are saved in a designated directory (“uploaded_documents”).
  • The vector store is serialized and saved locally as “faiss_index_test” for reuse across application runs.

How to Use:

  1. Document Upload:
  • Access the application through the provided Flask web server (default port 8080).
  • Use the index page to upload documents. The system processes and stores uploaded documents.
  1. Query Processing:
  • Submit queries through the web interface.
  • The system uses the OpenAI language model and the stored vector store to retrieve relevant information and provide answers.

Dependencies:

  • Flask
  • PyPDFLoader
  • CharacterTextSplitter
  • OpenAIEmbeddings (GPT-3.5 Turbo)
  • FAISS
  • RetrievalQA
  • OpenAI Language Model (LLM)

Note:

  • Ensure that the necessary dependencies are installed before running the application.
  • This project provides a foundation for building a language-driven document retrieval and QA system, and further customization can be done based on specific requirements.

You can find and try the app using below link
https://pratikskarnik.pythonanywhere.com/