Issues in Lab "No module named 'torch.distributed"

Hi. I’ve been encountering the persistent issue of “No module named ‘torch.distributed’” repeatedly. I’ve diligently attempted all the suggested solutions, installing the required packages one by one as advised in similar cases. However, I haven’t been able to resolve this issue. I’ve already invested the 2-h lab into troubleshooting this problem. I’d really appreciate any assistance to help me resolve this error. Thank you!

ModuleNotFoundError: No module named ‘torch.distributed’

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last)
Cell In[25], line 2
1 from datasets import load_dataset
----> 2 from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
3 import torch
4 import time

File :1075, in handle_fromlist(module, fromlist, import, recursive)

File /opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py:1116, in _LazyModule.getattr(self, name)
1114 value = self._get_module(name)
1115 elif name in self._class_to_module.keys():
→ 1116 module = self._get_module(self._class_to_module[name])
1117 value = getattr(module, name)
1118 else:

File /opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py:1128, in _LazyModule._get_module(self, module_name)
1126 return importlib.import_module(“.” + module_name, self.name)
1127 except Exception as e:
→ 1128 raise RuntimeError(
1129 f"Failed to import {self.name}.{module_name} because of the following error (look up to see its"
1130 f" traceback):\n{e}"
1131 ) from e

RuntimeError: Failed to import transformers.training_args because of the following error (look up to see its traceback):
No module named ‘torch.distributed’

3 Likes

Hi @Giorun ,

Which course are you referring to?

1 Like

Hi Kic, the course is “Generative AI with Large Language Models”. The error is from Lab 2. Thank you.

1 Like

@Giorun Just curious, have you executed the !pip install statements on the notebook?

Hi chuall. Yes, all of them.

@Giorun That seem weird. I completed the course without the error. I am also unable to replicate the issue by rerunning the notebook in the environment. I moved your question to the course Q&A section to see if anyone can help.

I am having the same problem

I have found the solution to this issue…

Do as the prompt says, Restart the Kernel and try again.
My steps that worked →

  1. Stopped the Kernels (From the Stop icon below the Folder icon)
  2. Deleted all the python notebooks and local downloaded files.
  3. Re-copied the files and started a new kernel
  4. Execute