How to decide: one-hot encoding with tf or with numpy?

Hi, in C5, W1, A3
the instruction on ex. 2 is asking to find the indices with the maximum probabilities with tf.math.argmax and convert them to one-hot with tf.one_hot
But ex. 3 is asking to find the indices with the maximum probabilities with np.argmax and then use to_categorical, to convert them to one-hot vectors.
Ok, I did accordingly for the sake of the assignment. But in real life, what should we choose?

In general it doesn’t really matter, except in the case that the functions involved are part of the compute graph for any tensor for which you need gradients. In TF the gradients are automatically computed for you using something like “autodiff” and only TF functions have that logic: if you insert a numpy function into the compute graph, the training will fail with a message about gradients not being available.

To see an example of this in action, go back to DLS C2 W3 to the Tensorflow Introduction assignment. In the compute_total_loss function, you need to transpose the logits and labels using tf.transpose. Try rewriting the code using .T or np.transpose and watch what happens: you’ll pass the test case for compute_total_loss, but it will throw the error I described above when you run the training in the later cell.

I’m sure you can find ways to reproduce that behavior here in the C5 assignments. But from an experimental standpoint you can just try using the numpy functions and see what happens. If it “just works”, then you’re ok, meaning that the computations in question were not part of the compute graph for any gradients.