Output layer `category to unit mapping` determination

p_s_rathore · November 4, 2023, 10:54am

As in ML Specialisation -> Course 2 -> Week 2 Lab Assignment, I’ve created a neural network to read handwritten digits from images. It has 3 layers,
L1 = 25 units
L2 = 15 units.
L3 = 10 units each representing a category (digit in this case)
How do I know which unit in the final layer represents probability for which handwritten digit? For example, how do I know that unit 1 doesn’t correspond to probability for digit 7.

As per ChatGPT:
“To know which unit represents which digit, you can refer to the order of classes when the neural network was trained. Often, the order is assigned based on numerical order, as in the example above. However, it’s crucial to verify this order and ensure it matches the expected class labels in your specific application. If the order is not as expected, you may need to adjust it accordingly.”

Is there a more definitive rule or way of determining this category to unit mapping?

rmwkwok · November 4, 2023, 11:09am

Hey @p_s_rathore,

It’s by how you label them. If you label digit seven as 1, then the unit 1 represents digit seven. The algorithm only looks at the labels (which has to be started from 0), and you are responsible for how to assign labels to digits.

Raymond

TMosh · November 4, 2023, 4:44pm

I recommend you not use ChatGPT for programming advice, or for help in working on the assignments.

It’s very likely to contain incorrect information.

trandromeda · November 4, 2023, 10:21pm

I had this exact same question and thought experiment in my head today. To take it a step further, recall in lecture:

a_1 = \frac{e^{z1}}{e^{z1} + e^{z2} + e^{z3} + e^{z4}} = P(y = 1|\overrightarrow x)

Why is a_1 the probability of y = 1 (the label 1) in the first place? Is there something about the formula that makes it so? Digging into it more, it seems this is more a design convention, but please correct me if I’m wrong. And I think it’s also by how we do back propagation and train the model, because we need to calculate the loss and to calculate loss, we need to have some “yardstick” each unit measures its outputs against (like when y = 1, or y = 2, and so on). I have not gotten to the back propagation lectures yet, but I hope this gets covered there and that it’ll all make more sense.

rmwkwok · November 5, 2023, 12:04am

Yes, @trandromeda, I think it is right to say that this is a design convention, and it is the design of how to calculate loss, as you said.

Therefore, speaking of implementation design, if we trace the source code of tf.keras.losses.SparseCategoricalCrossentropy, we will get to this line:

cost = math_ops.negative(array_ops.gather(log_probs, labels, batch_dims=1))

It does the following “y-dependent selection of losses” by array_ops.gather (ref) which treats labels as array indices for log_probs.

See if you can follow the above and jump to the answer of your question.

Cheers,
Raymond

Topic		Replies	Views
Multiclass - class values Advanced Learning Algorithms week-module-2	17	533	December 25, 2022
C2_W1_Lab02_CoffeeRoasting_TF layer functions Advanced Learning Algorithms week-module-1	12	603	October 26, 2024
Softmax layer preds Convolutional Neural Networks week-module-2 , coursera-platform	9	31	February 9, 2025
C2_W2_Multiclass_TF - Output layer explanation Advanced Learning Algorithms week-module-2	13	801	July 31, 2024
Point of clarification for video "Neural Network with Softmax" Advanced Learning Algorithms week-module-2	4	402	July 10, 2023

Output layer `category to unit mapping` determination

Related topics