Week3 programming assignment; compute_total_loss question

Sorry, I have a question which I did not find in this board.
I am transposing labels and logits
I am feeding them both to CategoricalCrossentropy.
Then I am taking tf.reduce_sum.
My observations: whether I do tf.reduce_sum or tf.reduce_mean or don’t do any tf.reduce*, I get the same result, which is half of expected value. As if there was just one number, so whether you sum it up or average it or do nothign you get the same thing…
Even funnier, if I “cheat” and do:
total_loss = 2 * tf.reduce_sum(total_loss)

I pass grader and get 100%. I am happy to pass grader, but still want to get to the bottom of this puzzle! Thank you

Did you also use from_logits = True?

yes, I did use from_logits = True.

1 Like

I am happy to provide my code but there is a rule against this

To close the loop here on the public thread, the error was using

tf.keras.losses.CategoricalCrossentropy

instead of what was clearly given in the instructions:

tf.keras.losses.categorical_crossentropy

Of course one might innocently suppose that those two functions are equivalent and I totally admit that the TF documentation is not at all clear about the distinction, but it turns out that the first one returns the scalar mean of the loss values. The second returns the loss values as a vector with one entry per sample.

Given that there are two samples here, that explains why the “hack” of multiplying by 2 works. The frankly shocking thing is that the hack solution also passes the grader, meaning that the grader test case also has exactly two samples. I’ve filed a bug about this deficiency in the test cases.

One has to wonder what the Keras authors were thinking when designing this API.

Exactly. And the documentation is not so easy to understand either, especially from the standpoint that this is our very first exposure to TF. I had to read it a couple of times, until I figured out that it was the reduction parameter there which by default does the mean. We could try using that API with reduction = None and maybe then it’s equivalent.

But they did specifically tell us which function to use in the instructions :nerd_face:

Ok, you can get it to work with the function tf.keras.losses.CategoricalCrossentropy by using the reduction parameter:

reduction = tf.keras.losses.Reduction.NONE

gets you the equivalent behavior to the intended function: vector of losses per sample.

reduction = tf.keras.losses.Reduction.SUM

gets you the reduce_sum of the losses, so you don’t need a separate call to get that.

New learnings about the deep waters of TF! :nerd_face:

Thank you, Paul, both for your help and for your persistence to get to the very bottom of this issue and more. tf.keras.losses.categorical_crossentropy worked perfectly for me after you pointed out my basic mistake. :slight_smile:
Dennis

By the way, I am doing CNN class now and expect to finish DL specialization within 2 weeks. I want to explore the next course/specialization to enroll. Any suggestion? Is there a discussion board for such kind of questions?

Actually, I am good with math and I have two undergrad degrees in math and physics and Ph.D. in electrical engineering from Berkeley, not far from Palo Alto, I hope to take courses on a more advanced level, if offered. I am also taking some Udemy courses but they are more oriented towards hands-on tool usage and more shallow; I like OpenAI more even though it costs more $$$.
Thank you

Hi, Dennis.

Thanks for the description of your background. I was also a math major “back in the day”. Did undergrad at Stanford and then went to Univ of Wisconsin Madison with the intent of getting a PhD, but then decided that the academic job market was a disaster at that point. I took my “consolation prize” masters degree and came back to California. Turns out there was a very nice job market for people with a technical background who liked to program computers! :nerd_face:

There are lots of choices for what to take next. You can also use the “general discussion” forums to get more input on this, e.g. AI Discussions. My approach would be to say that it depends on what your goals are. If you’re just curious and exploring for interest, the GANs Specialization is very interesting. If you’re interested in NLP, then DLS C5 is a good start, but you can then try the NLP specialization. I’ve only taken the first course of that one, which is more background. The real “meat” of NLP is the material in C3 and C4 which covers the same topics as DLS C5, but (I assume) goes into more detail. But that’s all with the proviso that I have not actually taken those courses.

If you have some specific area in which you want to apply ML/DL/AI, then take a look at the catalog here and see if there are any courses more specific to your proposed type of application.

If you want to get a flavor for the latest and greatest LLM action that’s totally “happening” at the moment, have a look at some of the new Short Courses. From what I’ve seen, those are just about how to use LLMs by building apps on top of them. The DLS C5 and NLP are how you actually build an LLM, although those courses are just the beginning. Actually doing what the real “players” are doing right now (OpenAI, Google, Meta, …) is like the difference between taking a first year graduate course in some area of physics, versus doing leading edge research that could get you a Nobel Prize someday. :grinning:

Hi Paul
I am coming back from China in a year and rather than doing same old stuff (analog design) I want to do ML/AI as real work. I am not expert about specializations etc., but hope to understand more throught courses and later readings. Basically, you are saying GAN vs. NLP specialization courses, each of them has 4 courses; which one first? May be if I do both eventually, it would not really matter? :slight_smile: I also hope to get into some project doing; may be first get more familiarity with TF tools? My TF skills are worse than my Python skills which also still need to be beefed up a lot.
Thanks!

GANs and NLP are completely unrelated and not dependent on each other, so you could do them in either order.

Yes, if your goal is to make a career change into ML/AI, then getting more TF expertise would definitely be a plus. There is a whole family of TF specializations here, but I have not personally taken any of them. Of course realize that which “platform” you use is analogous to the choice of programming language, so the more you have on your resume, the better. TF has historically been more dominant in commercial applications, whereas PyTorch is more dominant in research, but I’ve heard it said that PyTorch is getting more popular over time. I have never actually had a paying job in this space, so my input is of limited value on this point.

One thing that I can say with concrete knowledge is that the GANs specialization is the only one here on DeepLearning.AI that uses PyTorch instead of TF. So one side benefit you get from taking GANs is a nice intro to PyTorch.