How to install additonal python packages not inside the pytorch image

Dear mentors, I am tring on my own train.py in sagemaker using my personal AWS.
Inside my train.py, nltk and pytorch is needed.
In sagemaker studio notebook, i setup my pytorch estimator as below:

estimator = PyTorchEstimator(
    entry_point='train_sagemaker.py',
    source_dir='src',
    role=role,
    instance_count=train_instance_count,
    instance_type=train_instance_type,
    py_version='py3', # dynamically retrieves the correct training image (Python 3)
    framework_version='1.6.0', # dynamically retrieves the correct training image (PyTorch)
    hyperparameters=hyperparameters,
    input_mode=input_mode
)

then i used

estimator.fit(
    inputs=data_channels, 
    wait=False
)

to execute the training. However, i got “module not found error nltk”
In this case, i thi nk the default pytorch image does not have nltk installed.
Can advise me how to install extra packages needed by my train.py so that i can successfully do the training on SageMaker? thanks in advance!

dear mentors, i have found the solution as per this website “Use PyTorch with the SageMaker Python SDK — sagemaker 2.59.5 documentation
The problem for me was that my requirements file name was not “requirements.txt” but “requirements_train.txt” in “src” directory. so not recognized by the estimator.
After i renamed to “requiremens.txt”, then nltk package can be installed successfully. Thanks anyway! :slight_smile:

1 Like

Awesome, @thicc_fart! :slight_smile: Thanks for sharing!!