Output for QA ungraded assignment

In the QA ungraded assignment, the model’s final Dense layer has the config pasted below. There are 2 units in the Dense layer. When using the model to compute the output, it is 2 tensors of shape (1, 26) which represents the logits for the start position and end position of the answer. Does this mean that each unit in the Dense layer is producing a (1, 26) tensor?

If so, I did not know that the units in a Dense layer can produce a vector instead of a scalar. How does one set up such a structure?

{‘name’: ‘qa_outputs’,
‘trainable’: True,
‘dtype’: ‘float32’,
‘units’: 2,
‘activation’: ‘linear’,
‘use_bias’: True,
‘kernel_initializer’: {‘class_name’: ‘TruncatedNormal’,
‘config’: {‘mean’: 0.0, ‘stddev’: 0.02, ‘seed’: None}},
‘bias_initializer’: {‘class_name’: ‘Zeros’, ‘config’: {}},
‘kernel_regularizer’: None,
‘bias_regularizer’: None,
‘activity_regularizer’: None,
‘kernel_constraint’: None,
‘bias_constraint’: None}

The input shape of qa_outputs is (batch_size, seq_length, dim), which is a part of distilbert outputs . Just like you said, qa_outputs is a Dense layer with 2 units, so its output is (batch_size, seq_length, 2). To separate start position and end position, applies tf.split(), and uses tf.squeeze() to squeeze shape. Here is source code.

1 Like