Hi. I’ve been encountering the persistent issue of “No module named ‘torch.distributed’” repeatedly. I’ve diligently attempted all the suggested solutions, installing the required packages one by one as advised in similar cases. However, I haven’t been able to resolve this issue. I’ve already invested the 2-h lab into troubleshooting this problem. I’d really appreciate any assistance to help me resolve this error. Thank you!
ModuleNotFoundError: No module named ‘torch.distributed’
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
Cell In[25], line 2
1 from datasets import load_dataset
----> 2 from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
3 import torch
4 import time
File :1075, in handle_fromlist(module, fromlist, import, recursive)
File /opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py:1116, in _LazyModule.getattr(self, name)
1114 value = self._get_module(name)
1115 elif name in self._class_to_module.keys():
→ 1116 module = self._get_module(self._class_to_module[name])
1117 value = getattr(module, name)
1118 else:
File /opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py:1128, in _LazyModule._get_module(self, module_name)
1126 return importlib.import_module(“.” + module_name, self.name)
1127 except Exception as e:
→ 1128 raise RuntimeError(
1129 f"Failed to import {self.name}.{module_name} because of the following error (look up to see its"
1130 f" traceback):\n{e}"
1131 ) from e
RuntimeError: Failed to import transformers.training_args because of the following error (look up to see its traceback):
No module named ‘torch.distributed’