Issues in Lab "No module named 'torch.distributed"

Giorun · January 26, 2024, 11:14am

Hi. I’ve been encountering the persistent issue of “No module named ‘torch.distributed’” repeatedly. I’ve diligently attempted all the suggested solutions, installing the required packages one by one as advised in similar cases. However, I haven’t been able to resolve this issue. I’ve already invested the 2-h lab into troubleshooting this problem. I’d really appreciate any assistance to help me resolve this error. Thank you!

ModuleNotFoundError: No module named ‘torch.distributed’

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last)
Cell In[25], line 2
1 from datasets import load_dataset
----> 2 from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
3 import torch
4 import time

File :1075, in handle_fromlist(module, fromlist, import, recursive)

File /opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py:1116, in _LazyModule.getattr(self, name)
1114 value = self._get_module(name)
1115 elif name in self._class_to_module.keys():
→ 1116 module = self._get_module(self._class_to_module[name])
1117 value = getattr(module, name)
1118 else:

File /opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py:1128, in _LazyModule._get_module(self, module_name)
1126 return importlib.import_module(“.” + module_name, self.name)
1127 except Exception as e:
→ 1128 raise RuntimeError(
1129 f"Failed to import {self.name}.{module_name} because of the following error (look up to see its"
1130 f" traceback):\n{e}"
1131 ) from e

RuntimeError: Failed to import transformers.training_args because of the following error (look up to see its traceback):
No module named ‘torch.distributed’

Kic · January 26, 2024, 11:35am

Hi @Giorun ,

Which course are you referring to?

Giorun · January 26, 2024, 11:48am

Hi Kic, the course is “Generative AI with Large Language Models”. The error is from Lab 2. Thank you.

chuaal · January 26, 2024, 2:12pm

@Giorun Just curious, have you executed the !pip install statements on the notebook?

Giorun · January 26, 2024, 3:21pm

Hi chuall. Yes, all of them.

chuaal · January 26, 2024, 3:58pm

@Giorun That seem weird. I completed the course without the error. I am also unable to replicate the issue by rerunning the notebook in the environment. I moved your question to the course Q&A section to see if anyone can help.

alexturtleneckk · June 4, 2024, 5:57am

I am having the same problem

hiteshn97 · November 17, 2024, 6:44pm

I have found the solution to this issue…

Do as the prompt says, Restart the Kernel and try again.
My steps that worked →

Stopped the Kernels (From the Stop icon below the Folder icon)
Deleted all the python notebooks and local downloaded files.
Re-copied the files and started a new kernel
Execute

Topic		Replies	Views
ModuleNotFoundError: No module named 'torch.distributed' Generative AI with Large Language Models week-2	2	1015	February 6, 2024
Getting "No module named 'torch._C'" error for Lab 1 Generative AI with Large Language Models week-1	4	973	August 21, 2023
Error while importing the necessary requirements for Week 2 lab Generative AI with Large Language Models week-2	3	515	August 3, 2023
[Troubleshooting] Week #2 Lab -- Failed Import Generative AI with Large Language Models lab-help	2	35	March 8, 2025
Module Not Found Error: No module named 'torch._C' Generative AI with Large Language Models week-1	28	3147	November 28, 2023

Issues in Lab "No module named 'torch.distributed"

Related topics