ModuleNotFoundError: No module named 'unstructured'

Hello all, I am trying the course “Preprocessing Unstructured Data for LLM Applications” on my Colab. When I run the code below:

from IPython.display import JSON

import json

from unstructured_client import UnstructuredClient

from unstructured_client.models import shared

from unstructured_client.models.errors import SDKError

from unstructured.partition.html import partition_html

from unstructured.partition.pptx import partition_pptx

from unstructured.staging.base import dict_to_elements, elements_to_json

I find this error:
ModuleNotFoundError: No module named ‘unstructured’

I already tried: !pip install unstructured_client

Any help?

Thanks

You need to download all the files required to run the labs in your local lab environment. Hence you are getting the error. Also as far short courses dlai don’t allow you to download all the files, but you can try file====open, and click download as after selecting the required file or folder.

Regards
DP

3 Likes

Many thanks!

1 Like

It worked. There is a file named requirements.txt with lots of libraries to install. It worked when I saw and install all the libraries. It is on View/File Browser. Thanks Deepti_Prasad.

2 Likes

Hi, I tried to look for the requirement.txt in the View/FileBrowser. But I don’t see that file and what I can see are the ipynp file, two file contains the static images and one utli.py. Would you mind remind where do you find that file specificly?

Hello @deadpan

Did you get the file? @mmarques you can probably make a list of all libraries required here in post comment, so others learners can get help through your post.

But just to be clear @deadpan, libraries version and installation needs to be done by you in your local jupyter environment or python terminal.

Regards
DP

Hi, good idea. Here are the libraries need to be installed:

chromadb==0.4.22
langchain==0.1.5
langchain-community==0.0.17
langchain-core==0.1.19
langchain-openai==0.0.5
openai==1.11.1
tiktoken==0.5.2
#“unstructured[md,pdf,pptx]”
unstructured-client==0.16.0
unstructured==0.12.3
unstructured-inference==0.7.23
unstructured.pytesseract==0.3.12
urllib3==1.26.18
python-dotenv==1.0.1
panel==1.3.8
ipython==8.18.1
python-pptx==0.6.23
pdf2image==1.17.0
pdfminer==20191125
opencv-python==4.9.0.80

pikepdf==8.13.0
pypdf==4.0.1

And here is a code that need to be on your environment (Utils.py):

import os
import sys
from dotenv import load_dotenv, find_dotenv
import panel as pn
pn.extension()

class Utils:
def init(self):
pass
def get_dlai_api_key(self):
_ = load_dotenv(find_dotenv())
return os.getenv(“DLAI_API_KEY”)

def get_dlai_url(self):
_ = load_dotenv(find_dotenv())
return os.getenv(“DLAI_API_URL”)

class upld_file():
def init(self):
self.widget_file_upload = pn.widgets.FileInput(accept=‘.pdf,.ppt,.png,.html’, multiple=False)
self.widget_file_upload.param.watch(self.save_filename, ‘filename’)

def save_filename(self,_):
    if len(self.widget_file_upload.value) > 2e6:
        print("file too large. 2 M limit")
    else:
        self.widget_file_upload.save('./example_files/' + self.widget_file_upload.filename)
    #print(f"filename_ = {self.widget_file_upload.filename}")
    #print(f"length of value {len(self.widget_file_upload.value)}")
1 Like

Hello,

Here is my list after View/FileBrowser:

image

You need to install the libraries from requirements.txt (look at my other post, I listed the names) and you will need the file Utils.py in some directory of your environment.

best,

1 Like

Pretty small list :rofl::joy: @mmarques
All the best @deadpan downloading them :saluting_face:

Thank you!

I get the following error ERROR: No matching distribution found for unstructured==0.12.3 when i try to install requirements.txt

Could you share the screenshot of the complete error you got

pip install -r requirements.txt
Defaulting to user installation because normal site-packages is not writeable
Collecting chromadb==0.4.22 (from -r requirements.txt (line 1))
Using cached chromadb-0.4.22-py3-none-any.whl.metadata (7.3 kB)
Collecting langchain==0.1.5 (from -r requirements.txt (line 2))
Using cached langchain-0.1.5-py3-none-any.whl.metadata (13 kB)
Collecting langchain-community==0.0.17 (from -r requirements.txt (line 3))
Using cached langchain_community-0.0.17-py3-none-any.whl.metadata (7.9 kB)
Collecting langchain-core==0.1.19 (from -r requirements.txt (line 4))
Using cached langchain_core-0.1.19-py3-none-any.whl.metadata (6.0 kB)
Collecting langchain-openai==0.0.5 (from -r requirements.txt (line 5))
Using cached langchain_openai-0.0.5-py3-none-any.whl.metadata (2.5 kB)
Collecting openai==1.11.1 (from -r requirements.txt (line 6))
Using cached openai-1.11.1-py3-none-any.whl.metadata (18 kB)
Collecting tiktoken==0.5.2 (from -r requirements.txt (line 7))
Using cached tiktoken-0.5.2-cp312-cp312-win_amd64.whl.metadata (6.8 kB)
Collecting unstructured-client==0.16.0 (from -r requirements.txt (line 9))
Using cached unstructured_client-0.16.0-py3-none-any.whl.metadata (4.9 kB)
ERROR: Ignored the following yanked versions: 0.8.3, 0.10.19.dev18
ERROR: Ignored the following versions that require a different python version: 0.12.0 Requires-Python >=3.9.0,<3.12; 0.12.2 Requires-Python >=3.9.0,<3.12; 0.12.3 Requires-Python >=3.9.0,<3.12; 0.12.4 Requires-Python >=3.9.0,<3.12; 0.12.5 Requires-Python >=3.9.0,<3.12; 0.12.6 Requires-Python >=3.9.0,<3.12; 0.13.0 Requires-Python <3.12,>=3.9.0; 0.13.1 Requires-Python <3.12,>=3.9.0; 0.13.2 Requires-Python <3.12,>=3.9.0
ERROR: Could not find a version that satisfies the requirement unstructured==0.12.3 (from versions: 0.0.1.dev0, 0.2.0, 0.2.1, 0.2.2, 0.2.3, 0.2.4, 0.2.5, 0.2.6.dev1, 0.3.0, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.5, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.4.4, 0.4.6, 0.4.7, 0.4.8, 0.4.9, 0.4.10, 0.4.11, 0.4.12, 0.4.13, 0.4.14, 0.4.15, 0.4.16, 0.5.0, 0.5.1, 0.5.2, 0.5.3, 0.5.4, 0.5.6, 0.5.7, 0.5.8, 0.5.9, 0.5.10, 0.5.11, 0.5.12, 0.5.13, 0.6.0, 0.6.1, 0.6.2, 0.6.3, 0.6.4, 0.6.5, 0.6.6, 0.6.7, 0.6.8, 0.6.9, 0.6.10, 0.6.11, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.7.4, 0.7.5, 0.7.6, 0.7.7, 0.7.8, 0.7.9, 0.7.10, 0.7.11, 0.7.12, 0.8.0, 0.8.1, 0.8.4, 0.8.5, 0.8.6, 0.8.7, 0.8.8, 0.9.0, 0.9.1, 0.9.2, 0.9.3, 0.10.0, 0.10.1, 0.10.2, 0.10.4, 0.10.5, 0.10.6, 0.10.7, 0.10.8, 0.10.9, 0.10.10, 0.10.11, 0.10.12, 0.10.13, 0.10.14, 0.10.15, 0.10.16, 0.10.18, 0.10.19, 0.10.20, 0.10.21, 0.10.22, 0.10.23, 0.10.24, 0.10.25, 0.10.26, 0.10.27, 0.10.28, 0.10.29, 0.10.30, 0.11.0, 0.11.1, 0.11.2, 0.11.4, 0.11.5, 0.11.6, 0.11.7, 0.11.8)
ERROR: No matching distribution found for unstructured==0.12.3

Hello @abhat_sctist23

your python version is not matching with required unstructured==0.12.3

Regards
DP

1 Like

I second to this, it happens to me once because I used python 3.12, I downgraded to python 3.11 and it worked

3 Likes

thanks Deepti, it worked with different python version

How to get API Key for pdf, can you please provide link for same.

I don’t think so API keys can be provided for short courses assignments.

Whatever API keys are being used it is along the metadata/utils file with assignment you are doing.

Just to be sure
@Mubsi can this be provided?

Regards
DP

Util file has a function find_dotenv() which is used to automatically look for .env file. And that contains API key.

def get_dlai_api_key(self):
_ = load_dotenv(find_dotenv())
return os.getenv(“DLAI_API_KEY”)

So how we do get this DLAI_API_KEY and DLAI_API_URL

I know this and that is what I am trying to state, DLAI API keys cannot be shared.

Regards
DP