C1M2_Ungraded_Lab_1 : HTTPError: HTTP Error 403: Forbidden

When running second cell:

from sklearn.datasets import fetch_20newsgroups

Load the 20 Newsgroups dataset

newsgroups_train = fetch_20newsgroups(subset=‘train’, shuffle=True, random_state=42)

Get the error: HTTPError: HTTP Error 403: Forbidden

hi @xzhang14

I already have reported this issue to staff, please follow the below thread for update on this issue

Replace the failing cell with the following code

from datasets import load_dataset
import pandas as pd

# 1. Load the dataset

dataset = load_dataset(“SetFit/20_newsgroups”)
df = pd.DataFrame(dataset[‘train’])

# 2. Rename ‘label’ to ‘category’
df = df.rename(columns={‘label’: ‘category’})

# 3. Create the target_names mapping manually from the data
# We sort by category ID (0-19) to make sure the index matches the ID

mapping = df[[‘category’, ‘label_text’]].drop_duplicates().sort_values(‘category’)
category_names_list = mapping[‘label_text’].tolist()

# 4. Re-create the Mock object

class MockNewsgroups:
    def __init__(self, target_names):
        self.target_names = target_names

newsgroups_train = MockNewsgroups(category_names_list)

# 5. Final check

print(f"Success! Dataset Size: {df.shape}“)
print(f"Number of categories: {len(newsgroups_train.target_names)}”)
print(f"Category 0 is: {newsgroups_train.target_names[0]}")