C3W1 assignment exercise 6: incompatible shapes for broadcasting: (8, 2), (40, 2)

I’m having issues with exercise 6train_model in the assignment.
I’m passing all tests until exercise 6, but I’m getting the error below regarding data shape.
This is after implementing tl.Mean(axis=1). Anyone figured this out?

LayerError: Exception passing through layer Serial (in pure_fn):
layer created in file […]/trax/supervised/training.py, line 1033
layer input shapes: (ShapeDtype{shape:(8, 13), dtype:int32}, ShapeDtype{shape:(40,), dtype:int32}, ShapeDtype{shape:(16,), dtype:int32})

File […]/trax/layers/combinators.py, line 88, in forward
outputs, s = layer.pure_fn(inputs, w, s, rng, use_cache=True)

LayerError: Exception passing through layer CrossEntropyLoss (in pure_fn):
layer created in file […]/, line 12
layer input shapes: (ShapeDtype{shape:(8, 2), dtype:float32}, ShapeDtype{shape:(40,), dtype:int32}, ShapeDtype{shape:(16,), dtype:int32})

File […]/trax/layers/combinators.py, line 88, in forward
outputs, s = layer.pure_fn(inputs, w, s, rng, use_cache=True)

LayerError: Exception passing through layer _CrossEntropy (in pure_fn):
layer created in file […]/, line 12
layer input shapes: (ShapeDtype{shape:(8, 2), dtype:float32}, ShapeDtype{shape:(40,), dtype:int32})

File […]/trax/layers/base.py, line 743, in forward
raw_output = self._forward_fn(inputs)

File […]/trax/layers/base.py, line 784, in _forward
return f(*xs)

File […]/trax/layers/metrics.py, line 581, in f
return -1.0 * jnp.sum(model_output * target_distribution, axis=-1)

File […]/site-packages/jax/core.py, line 506, in mul
def mul(self, other): return self.aval._mul(self, other)

File […]/_src/numpy/lax_numpy.py, line 5819, in deferring_binary_op
return binary_op(self, other)

File […]/src/numpy/lax_numpy.py, line 431, in fn
return lax_fn(x1, x2) if x1.dtype != bool
else bool_lax_fn(x1, x2)

File […]/_src/lax/lax.py, line 348, in mul
return mul_p.bind(x, y)

File […]/site-packages/jax/core.py, line 264, in bind
out = top_trace.process_primitive(self, tracers, params)

File […]/jax/interpreters/ad.py, line 274, in process_primitive
primal_out, tangent_out = jvp(primals_in, tangents_in, **params)

File […]/jax/interpreters/ad.py, line 449, in standard_jvp
val_out = primitive.bind(*primals, **params)

File […]/site-packages/jax/core.py, line 264, in bind
out = top_trace.process_primitive(self, tracers, params)

File […]/jax/interpreters/partial_eval.py, line 1059, in process_primitive
out_avals = primitive.abstract_eval(*avals, **params)

File […]/_src/lax/lax.py, line 2125, in standard_abstract_eval
return ShapedArray(shape_rule(*avals, **kwargs), dtype_rule(*avals, **kwargs),

File […]/_src/lax/lax.py, line 2221, in _broadcasting_shape_rule
raise TypeError(msg.format(name, ', '.join(map(str, map(tuple, shapes)))))

TypeError: mul got incompatible shapes for broadcasting: (8, 2), (40, 2).

1 Like

Hi @sunnyjooey

Thanks for reaching out.

Can you share the code where you are getting this error?

Also, as per seeing the error, I assume that mul refers to some multiplication operation. So in that case you should have the dimensions of shape (8, 2) and (2, 40) [not (40, 2)]. You can try this out. Although as I am not aware of the could yet, this is my first guess, and might not be true.

So if above doesn’t work, request you to share the code so that I can help you better. :slight_smile:

1 Like

This is my train_model:

# UNQ_C6 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION: train_model
def train_model(classifier, train_task, eval_task, n_steps, output_dir):
    '''
    Input: 
        classifier - the model you are building
        train_task - Training task
        eval_task - Evaluation task. Received as a list
        n_steps - the evaluation steps
        output_dir - folder to save your files
    Output:
        trainer -  trax trainer
    '''
    rnd.seed(31) # Do NOT modify this random seed. This makes the notebook easier to replicate
    
    ### START CODE HERE (Replace instances of 'None' with your code) ###          
    training_loop = training.Loop( 
                                classifier, # The learning model
                                train_task, # The training task
                                eval_tasks=eval_task, # The evaluation task
                                output_dir=output_dir, # The output directory
                                random_seed=31 # Do not modify this random seed in order to ensure reproducibility and for grading purposes.
    ) 

    training_loop.run(n_steps = n_steps)
    ### END CODE HERE ###
    
    # Return the training_loop, since it has the model.
    return training_loop

But I think the issue might be in the classifier:

# UNQ_C5 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION: classifier

def classifier(vocab_size=9088, embedding_dim=256, output_dim=2, mode='train'):
    
    ### START CODE HERE (Replace instances of 'None' with your code) ###
        
    # create embedding layer
    embed_layer = tl.Embedding( 
        vocab_size=vocab_size, # Size of the vocabulary
        d_feature=embedding_dim # Embedding dimension
    ) 
    
    # Create a mean layer, to create an "average" word embedding
    mean_layer = tl.Mean(axis = 1)
    
    # Create a dense layer, one unit for each output
    dense_output_layer = tl.Dense(n_units = output_dim)
    
    # Create the log softmax layer (no parameters needed)
    log_softmax_layer = tl.LogSoftmax()
    
    # Use tl.Serial to combine all layers
    # and create the classifier
    # of type trax.layers.combinators.Serial
    model = tl.Serial( 
      embed_layer, # embedding layer
      mean_layer, # mean layer
      dense_output_layer, # dense output layer
      log_softmax_layer # log softmax layer
    ) 
    ### END CODE HERE ###
    
    # return the model of type
    return model
type or paste code here

Thank you!

1 Like

I am also facing similar issue with Exercise 6.

2 Likes

Any general advice related to this issue? Mine is (mis)behaving similarly

LayerError: Exception passing through layer Serial (in pure_fn):
  layer created in file [...]/trax/supervised/training.py, line 1033
  layer input shapes: (ShapeDtype{shape:(8, 13), dtype:int32}, ShapeDtype{shape:(16,), dtype:int32}, ShapeDtype{shape:(16,), dtype:int32})
...


TypeError: mul got incompatible shapes for broadcasting: (8, 2), (16, 2).

The 16 seems derived from batch size. I assume there is a Transpose missing somewhere, but don’t see it and the unit tests above all passed (for what that’s worth)

I’m a pretty experienced debugger but don’t see an obvious way forward here :frowning:

1 Like

I don’t remember, but I think it was something in one of the earlier functions that wasn’t quite producing the right shape. Use reshape with -1 whenever possible. Also, if you know you have the right answer, but a particular test isn’t passing, try submitting anyway (I first did this out of pure frustration). Sometimes you’ll pass the test in the submission, even though the individual test failed.

1 Like

@sunnyjooey, Appreciate the reply. I ended up fixing a couple of things in that file. One was ignoring this comment

        # Using the same batch list, start from neg_index and increment i up to n_to_take

and starting the loop iteration from 0.

The other, more insidious problem was I copied this code from an ungraded cell

compute_accuracy(preds=tmp_pred, y=tmp_targets, y_weights=tmp_example_weights)

into a graded cell where we also were asked to compute accuracy. This meant variables of global scope, tmp_pred, tmp_targets, and tmp_example were in my graded function. I hope you never personally experienced this, but the grader is really not happy with that situation.

Appreciate the help so long after you completed the exercise. Hope this thread helps others, too.

No problem! I can’t access my code anymore, so I can’t take a look (should’ve saved it somewhere). But yes, hope this helps others!

Same issue here! Any updates?

Where is the admin? This issue seems persistent! Can we get any updates?!

1 Like