I struggle a lot with exercise 2: I am not able to get rid of the following error, even when casting all intermediate results to tf.float64 when possible. I noticed that scores.dtype is float64 when I try to print it. Is that right?
This error is thrown by the line triplet_loss = tf.math.reduce_sum(triplet_loss1, triplet_loss2)
InvalidArgumentError: Value for attr 'Tidx' of double is not in the list of allowed values: int32, int64
; NodeDef: {{node Sum}}; Op<name=Sum; signature=input:T, reduction_indices:Tidx -> output:T; attr=keep_dims:bool,default=false; attr=T:type,allowed=[DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_QINT16, DT_QUINT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64]; attr=Tidx:type,default=DT_INT32,allowed=[DT_INT32, DT_INT64]> [Op:Sum] name:
Do you have any idea what I might have done wrong in my function?
Thanks!
scores.dtype is mentioned for mask_exclude_positives, so this is where the correction is required.
Read the below instructions which will help you do the correction
To create the mask, you need to check if the cell is diagonal by computing tf.eye(batch_size) ==1 , or if the non-diagonal cell is greater than the diagonal with (negative_zero_on_duplicate > tf.expand_dims(positive, 1) .
The instruction for closest negative
Remember that positive already has the diagonal values. Now you can use tf.math.reduce_max , row by row (axis=1), to select the maximum which is closest_negative .
In my notebook the hint for closest_negative mention axis None instead of 1 - is that wrong?
I created the mask as tf.cast((tf.eye(batch_size) == 1) | (negative_zero_on_duplicate > tf.expand_dims(positive, 1)), scores.dtype): should I cast the mask as tf.float64 directly and not use scores.dtype?
Or since it is also used when we define the new batchsize as tf.cast(tf.shape(v1)[0], scores.dtype) is that where I should make the correction?
Yep, the scores are computed this way indeed using tf.linalg.matmul as recommended in the instructions. I wonder if I made a mistake when computing the new batchsize (I use the number of lines from v1) but if I compute it from scores I should get the same because it’s a matrix of shape (batch_size, batch_size), right?