Output for QA ungraded assignment

luchungi · May 28, 2021, 1:19am

In the QA ungraded assignment, the model’s final Dense layer has the config pasted below. There are 2 units in the Dense layer. When using the model to compute the output, it is 2 tensors of shape (1, 26) which represents the logits for the start position and end position of the answer. Does this mean that each unit in the Dense layer is producing a (1, 26) tensor?

If so, I did not know that the units in a Dense layer can produce a vector instead of a scalar. How does one set up such a structure?

{‘name’: ‘qa_outputs’,
‘trainable’: True,
‘dtype’: ‘float32’,
‘units’: 2,
‘activation’: ‘linear’,
‘use_bias’: True,
‘kernel_initializer’: {‘class_name’: ‘TruncatedNormal’,
‘config’: {‘mean’: 0.0, ‘stddev’: 0.02, ‘seed’: None}},
‘bias_initializer’: {‘class_name’: ‘Zeros’, ‘config’: {}},
‘kernel_regularizer’: None,
‘bias_regularizer’: None,
‘activity_regularizer’: None,
‘kernel_constraint’: None,
‘bias_constraint’: None}

edwardyu · May 29, 2021, 1:43am

The input shape of qa_outputs is (batch_size, seq_length, dim), which is a part of distilbert outputs . Just like you said, qa_outputs is a Dense layer with 2 units, so its output is (batch_size, seq_length, 2). To separate start position and end position, applies tf.split(), and uses tf.squeeze() to squeeze shape. Here is source code.

Topic		Replies	Views
C5_W4_A1 exercise8 Sequence Models coursera-platform	5	521	June 26, 2022
Week 4 Assignment Transformer Architecture: Linear Layer before Softmax Sequence Models coursera-platform	2	732	May 24, 2021
"All tests passed!" (wrong!). C5 W3 A1: Machine Translation Sequence Models coursera-platform	2	600	July 1, 2021
What is the dense layer for in week 3 assignment? NLP with Sequence Models week-module-3	3	586	April 11, 2022
C5w3 A1 modelf() Sequence Models coursera-platform	3	611	February 19, 2022

Output for QA ungraded assignment

Related topics