Help Needed: Setting Up a Virtual Environment on Ubuntu for OCR Workflows

adminravi · June 17, 2025, 1:46am

Good morning friends,

I need your guidance in setting up a virtual environment on my Linux system (Ubuntu 22.04 LTS – Jammy Jellyfish) that can handle OCR tasks efficiently without any package conflicts. Below is my laptop configuration:

Acer Laptop Configuration

Model: Acer ALG AL15G-53
Processor: 13th Gen Intel® Core™ i5-13420H (8-core, 12-thread) @ 2.10 GHz
Graphics: Dedicated 6 GB NVIDIA GPU (likely GeForce RTX 3050)
RAM: 16 GB DDR4 (3200 MHz)
Storage: 512 GB SSD

Use Case / Workflow

My job primarily involves:

Downloading scanned PDFs (public domain).
Converting them into JPG images.
Extracting text using OCR — mostly English, but also Indian languages like Hindi, Malayalam, Bengali, etc.

What I Need

I’d like to create a virtual environment (preferably using conda or venv) that includes:

OCR libraries (like PaddleOCR or Tesseract)
Support for Indian languages
GPU acceleration (CUDA & cuDNN compatible with my NVIDIA GPU)
PDF/image tools (like pdf2image, PIL, OpenCV, etc.)

I want to avoid version conflicts and make the setup future-proof. If anyone has a similar setup or can guide me step-by-step, I’d be really grateful.

Thank you so much in advance!

Regards,
Ravi Verma

TMosh · June 17, 2025, 3:42am

That is entirely impossible. The AI industry moves way too fast, and puts no value on backward compatibility.

adminravi · June 17, 2025, 3:44am

Can I have an ideal workshop (virtual environment) to perform the ocr extraction work?

TMosh · June 17, 2025, 5:47am

Sorry, I do not know.
“Ideal” is a very high standard.

adminravi · June 17, 2025, 6:46am

Minimum possible?

TMosh · June 17, 2025, 7:10am

We’ll have to wait and see if someone from the community has the information you need.

carlosrl · June 18, 2025, 3:06am

Hey @adminravi , let me see if I can help you.
Your hardware configuration seems to be able to manage OCR tasks, which means work with DL or ML solutions. One point that I think that I think is good to take care of is your storage of 512 gb, which I think is too low. It is better to use an external SSD, starting with 1 TB.
Regarding the virtual environment, some time ago I created a repo of envs here that you can just download and use.
Yes, you can also use Docker, but it is a package that will eat more of your limited memory.
Hope this can help you.
Keep learning!

adminravi · June 18, 2025, 4:41am

Thank you so much. I will be needing GPU for paddleOCR, tesseract with multiple language support enabled. I want this venv to have everything for OCR extraction from images.

adminravi · June 18, 2025, 4:55am

Can you please give me the pip install path for PaddlePaddle GPU version compatible with CUDA 11.7?

carlosrl · June 29, 2025, 6:21pm

In this link, you will find instructions on how to install CUDA 11.x to PyTorch using pip.

Topic		Replies	Views
OCR algorithm suggestion AI For Everyone project	8	71	March 19, 2025
Virtual environments getting big best practice AI Discussions ai-discussions	0	74	February 24, 2024
Best Computer Setup for Deep Learning + Computer Vision Convolutional Neural Networks coursera-platform	16	1179	November 9, 2024
Recommendations for the Best AI Laptop: Upgrading from My Old Laptop AI Discussions ai-discussions	3	426	September 27, 2024
Coding environment (Colab tips) AI Discussions ai-discussions , colab	41	1823	December 20, 2024

Help Needed: Setting Up a Virtual Environment on Ubuntu for OCR Workflows

Acer Laptop Configuration

Use Case / Workflow

What I Need

Related topics