def generate_data_for_breed(breed, features, n_samples, params):
“”"
Generate synthetic data for a specific breed of dogs based on given features and parameters.
Parameters:
- breed (str): The breed of the dog for which data is generated.
- features (list[str]): List of features to generate data for (e.g., "height", "weight", "bark_days", "ear_head_ratio").
- n_samples (int): Number of samples to generate for each feature.
- params (dict): Dictionary containing parameters for each breed and its features.
Returns:
- df (pandas.DataFrame): A DataFrame containing the generated synthetic data.
The DataFrame will have columns for each feature and an additional column for the breed.
"""
df = pd.DataFrame()
for feature in features:
match feature:
case "height" | "weight":
df[feature] = gaussian_generator(params[breed][feature].mu, params[breed][feature].sigma, n_samples)
case "bark_days":
df[feature] = binomial_generator(params[breed][feature].n, params[breed][feature].p, n_samples)
case "ear_head_ratio":
df[feature] = uniform_generator(params[breed][feature].a, params[breed][feature].b, n_samples)
df["breed"] = breed
return df
Generate data for each breed
df_0 = generate_data_for_breed(breed=0, features=FEATURES, n_samples=1200, params=breed_params)
df_1 = generate_data_for_breed(breed=1, features=FEATURES, n_samples=1350, params=breed_params)
df_2 = generate_data_for_breed(breed=2, features=FEATURES, n_samples=900, params=breed_params)
Concatenate all breeds into a single dataframe
df_all_breeds = pd.concat([df_0, df_1, df_2]).reset_index(drop=True)
Shuffle the data
df_all_breeds = df_all_breeds.sample(frac = 1)
Print the dataframe
df_all_breeds.head(10)
The code above (that is already provided), gives me a:
ValueError: Length of values (1) does not match length of index (1200).
I am not sure what is going on, I would appreciate some help!