How to download the helper.py file used in the course lab for 'Document AI: From OCR to Agentic Doc Extraction'?

Hi there,

I am in the process of creating a tool to extract financial statements from pdfs. Two questions:

  1. How to access the helper.py files used in the Lab4? I am working on my laptop and hoping to use them.

  2. ADE extraction is messy when I send a 100 page pdf. The balance sheet isn’t read well. However, when I just use one page pdf that only contains balance sheet the extractionworks well. How to fix?

Try the “File” → “Open” menu command. You should find the files there.

Sorry, I have no idea.

1 Like

Thanks TMosh. I found the helper file.

1 Like

Thanks @TMosh for the your reply to @Pranay7. Hope you’re enjoying the course with LandingAI / AWS!

Hi @Pranay7 !
Thank you for taking the course and sharing these questions.

About 1. it is fixed! Do not forget that when going to the top menu in the left side, option “File” you can “Open” the documents.

Now

2. ADE extraction is messy when I send a 100 page pdf.

We recommend to get a free license to LandingAI worth 1000 credits (good idea for 100 page pdf) and experiment there.

Also we suggest 2 other APIs (for this kind of task):

  • Split. Split allows to break up large doc into small doc’s before processing.
  • Parse Jobs. Parse Jobs is for big batches that run in background.

Happy to help!

Lesly