Web Scrapping

Hi everyone,

I’m trying to scrape data from the CPWD eTender portal, but I’ve been hitting multiple roadblocks. I’m looking for guidance on how to extract data (like awarded tenders or live tenders) from pages such as:

:cross_mark: What I’ve tried:

  1. Requests:
  • Works only for the homepage (<Response [200]>).
  • Other URLs like /TenderswithinOneday.html return <Response [405]>.
  1. Adding fake User-Agent headers and proxies:
  • Still getting 405 or redirected pages with no useful content.
  1. Selenium (with headless Chrome):
  • Can load the homepage.
  • But then gets blocked by a JavaScript alert:
    4.“To access the application, please install CPWD Signer…”*
  • This causes UnexpectedAlertPresentException.
  1. Tried undetected-chromedriver, but ran into ModuleNotFoundError: No module named 'distutils' on Python 3.12.
  • Installing distutils fails since it’s deprecated in newer versions.

:package: Goal:

I just want to extract publicly visible tender information — not login or submit anything. If there’s a workaround, API endpoint, or public dump, I’m happy to use that instead of full automation.

:speech_balloon: Questions:

  • Has anyone successfully scraped public tender data from this portal?
  • Is there a way to bypass or handle the CPWD Signer alert without actually installing it?
  • Would switching to a lower version of Python or using Playwright help?
  • Are there alternative data sources (like APIs or open datasets) for CPWD tenders?

Feel free to reach me at - paharisunandan@gmail.com