Course 2 - Week 3 assignment - Exercise 3: one_hot_matrix

For week 3, exercise 3, regarding the function

def one_hot_matrix(label, depth=6), here’s what I have:

{moderator edit - solution code removed}

I have spent an incredible amount of time for this and have looked at the TensorFlow documentation online but it’s not entirely clear. I know my shape = tf.shape(one_hot) is not right, I tried to print out this shape to get the dimension but it didn’t work. TensorFlow online documentation is not comprehensive. I got

AssertionError: Wrong output. Use tf.reshape as instructed

What is the point of reshaping it to the shape that it already has? That’s what the code you show does, right?

What you need is to make sure that the return value is a “rank 1” tensor (what you’d call a 1D array in numpy) with dimension [depth,]. Try using that latter expression as your target shape on the reshape.

The other caveat here is that the unit test for this function is not strong enough: it will allow you do things that will then fail on the later cells in this notebook. A bug is filed about that, but no fix is available yet.

If you look at the assertion that is actually failing, it uses numpy “allclose” to compare your result to the expected output. The problem is that allclose uses broadcasting, so your shapes don’t really have to be correct. They are comparing to a 1D array and all that’s necessary is that your array be “broadcastable” to that shape. So I’m worried that maybe your values are actually wrong as well, although I don’t see how that is possible from the code you show.

If the suggestion in my first reply doesn’t get you there, it would be worth actually showing the full output you get when you run the test cell for one_hot_matrix.

Thanks Paul! I got passed it a few minutes after receiving your response! It turns out that [-1,] is used for making it a 1-D vector. I’m on Exercise 4 now.

I’m on Exercise 6, in def compute_cost(logits, labels), here’s what I have:

loss = tf.keras.losses.categorical_crossentropy(labels, logits)
cost = tf.reduce_mean(loss)

Am I close ? I notice that the cost function for softmax is different from that for binary classification problems but I assume that the softmax cost function is taken care of by the

loss = tf.keras.losses.categorical_crossentropy(labels, logits)

AssertionError: Test does not match. Did you get the mean of your cost functions?

The problem is that you are passing in the “logits”, meaning that you have not done the softmax activation function on the outputs yet. So you need to tell the loss function that the inputs are logits, not activations, so that it can do the softmax for you. The way you do that is by using the from_logits argument. Prof Ng always recommends doing it that way. Here’s the docpage for the loss function. It works the same way when you are doing binary classifications and using sigmoid + binary cross entropy loss.

Also I hope you remembered to transpose both the labels and the logits. You don’t show that code.

Thanks Paul!

I don’t feel that we had enough coverage on TensorFlow. I find it much easier if I were to do it in Octave or Matlab. For example, I wanted to do a simple thing such as checking matrix dimension to decide if I should transpose the matrix before passing it in, I didn’t see a straightforward way to do so with TensorFlow, nor with Numpy. I think there should be a video to cover these fundamental things.

This is what I have:

predicted_probabilities = tf.keras.activations.softmax(logits, axis=-1)

loss = tf.keras.losses.categorical_crossentropy( tf.transpose(labels), tf.transpose(predicted_probabilities) )

cost = tf.reduce_mean(loss)

From your previous post, it seems that you suggested me to use this function to compute loss

tf.keras.losses.CategoricalCrossentropy( from_logits=False, label_smoothing=0.0, axis=-1, reduction=losses_utils.ReductionV2.AUTO, name=‘categorical_crossentropy’)

but I’m not sure how to pass logits to this function …

You can do the manual application of softmax, although I’m not sure your implementation is correct. There again, you need to understand what that function expects in terms of the order of the dimensions of its input.

But you are doing things the hard way. The point I was hoping you would pick up from the documentation is that saying from_logits = True is how you tell the cost function that the inputs are logits and that it needs to apply softmax for you. That’s way easier than doing it the way you have done. You also don’t need all those extra arguments about label smoothing, axis, reduction and name.

Thanks! I just got All Tests Passed as soon as you got this message but I’m looking at your words closely for the next course. Will courses 3, 4, and 5 involve TensorFlow heavily ?

There is no programming in Course 3, but C4 and C5 use TensorFlow quite heavily.

I tried your way, setting from_logits = True and omitting other parameters, it didn’t work. It’s easy for you because you are familiar enough with TensorFlow! My way is not really the hard way, one or two extra lines of code are not a big deal, they help clarify things and make it easier to follow and remember.

Thanks for everything!

You must have made some other mistake in the from_logits = True case. It works fine for me. Just keep that in mind: that is the way we will always do things going forward. Having the loss function apply the activation also is more efficient and more numerically stable.

OK, I will try this separately after installing TensorFlow. Thanks.

Hello, Paul,

I don’t know how to initiate a question, (I’m not lazy, I just can’t find out how), so I’m using this thread, which relates to my question.
Since I’m not allowed to show my code, I’m including just an error screen shot:

From what I understand, the label is the constant 1, which a one-hot encoding over 4 categories (i.e., depth) should be the vector (float) [0,1,0,0], which it is calling the wrong output. (And, of course, I am using tf.one_hot.)
This seems straightforward enough that the code the answer comes right from the link that is given in the instructions [tf.one_hot(labels, depth, axis=0)](tf.one_hot  |  TensorFlow Core v2.9.1) is exactly the code that I understand is required (except that I replace labels with label). I’ve perused the documentation and find nothing else that seems to be required or relevant.
Of course, before bothering you with this issue, I experimented with various constants and depths, and that only seemed to confirm my impression that the code should be working correctly. It’s also possible that it’s a type problem, but I don’t see particularly why.
Any suggestions?
BTW, I went ahead with as much of the rest of the program to see what else I could do, and it looks like I’ll have more questions.
I’m addressing you since you were in this thread, and we’ve corresponded before, but I’ll take whatever advice I can get from whomever cares to offer it.

Thanks in advance.

Aaaand guess what. I just fixed it. Apparently, I was overspecifying the dimensions by requiring a column matrix in the reshape function. Just passing [depth] instead of [depth,1] worked for both tests. I don’t know how I should have known this, except for my meager familiarity with underspecified shape parameters that allow for broadcasting.
But again, don’t be surprised if I have to pipe up again.
Thanks.

Hi, Marshall.

I’m glad to hear that you were able to find the solution under your own power.

It’s fine to tag onto an existing thread on a relevant topic. If you want to create a new thread, first select the appropriate category and subcategory and then you should see a “New Topic” button in the upper right corner as highlighted in the rectangle in this screenshot:

The category and subcategory are highlighted in the oval in the upper left. Note that Discourse has some “safety” rules such that they won’t let new users create threads until they’ve established a level of trust by behaving responsibly. So if you don’t see the “New Topic” button, it might be because you haven’t done enough activity on the forum yet. But you’ve made a number of posts by this point, so I’d be a little surprised if that were the issue. Have a look and see if you can see the “New Topic” link as shown above.

Regards,
Paul

Hi, Paul,
Thanks for the instructions on how to start a new topic. Also, I benefitted from your advice for Khiem_Viet_Ngo in this thread, and was able to finish Course 2 with no further ado. However, I do have one suggestion that I’d like to offer. Perhaps it should be explicitly noted to students for this assignment that it’s important that they use Tensorflow 2.3.0. I’ve 2.8.0 installed in my virtual environment that I use for courses and projects. As a result, I was getting different results on Assignment #5, that, in retrospect, should not have been that surprising, given that everything up to that point worked fine. Of course, in hindsight, different Tensorflow versions would lead to anomalous behavior despite the correct seeding of the random generator; the code, after all, has changed. When I took my work back to the browser, I got the expected results, which was quite a relief, because I didn’t have the slightest clue as to what was wrong.

Hi, Marshall.

Thanks for your thoughts on this. Yes, it is definitely the case that everything mutates very rapidly in this space and in python packages in general and that can cause “versionitis” problems. Whereas it is in the nature of online courses like this that they are published at a particular point in time and get “major” upgrades that would deal with things like changing package versions typically only every couple of years at most. The last such upgrade for DLS was in April of 2021, when they rewrote things using TF2 instead of TF1. I can try suggesting that they add something about this, but it sort of opens a can of worms: how far do they have to go in addressing what it takes to get things to actually work in every possible student’s environment? It would not surprise me if they elected just to drop the subject. In fact, I don’t remember that they ever say anything about running the notebooks in a different environment, do they? Please correct me if I just missed it …

There have been a number of threads on Discourse around the question of how to run things in your own environment. E.g, this one or this one or this one.