Hi everyone,
I’m trying to scrape data from the CPWD eTender portal, but I’ve been hitting multiple roadblocks. I’m looking for guidance on how to extract data (like awarded tenders or live tenders) from pages such as:
- https://etender.cpwd.gov.in/TenderswithinOneday.html
- https://etender.cpwd.gov.in/TenderAwardPublishSearch.jsp
- https://etender.cpwd.gov.in
What I’ve tried:
- Requests:
- Works only for the homepage (
<Response [200]>
). - Other URLs like
/TenderswithinOneday.html
return<Response [405]>
.
- Adding fake User-Agent headers and proxies:
- Still getting 405 or redirected pages with no useful content.
- Selenium (with headless Chrome):
- Can load the homepage.
- But then gets blocked by a JavaScript alert:
4.“To access the application, please install CPWD Signer…”* - This causes
UnexpectedAlertPresentException
.
- Tried undetected-chromedriver, but ran into
ModuleNotFoundError: No module named 'distutils'
on Python 3.12.
- Installing
distutils
fails since it’s deprecated in newer versions.
Goal:
I just want to extract publicly visible tender information — not login or submit anything. If there’s a workaround, API endpoint, or public dump, I’m happy to use that instead of full automation.
Questions:
- Has anyone successfully scraped public tender data from this portal?
- Is there a way to bypass or handle the CPWD Signer alert without actually installing it?
- Would switching to a lower version of Python or using Playwright help?
- Are there alternative data sources (like APIs or open datasets) for CPWD tenders?