Confused on C2_W2_SoftMax Lab

Hi team,

I am so confused about the objective of this Lab? Would greatly appreciate if someone could please provide detailed expaination to my following questions?

Q1: What are we trying to predict here? Is 4 deferent classifications (0,1,2,3?)

Q2: X_train is (2000 by 2) why the output is 2 by 4?

Q3: in the content below says "the outputs are not probabilities, but can range from large negative numbers to large positive numbers. " Is not -log(ez/sum(ez) should be greater than 0 hence, the outputs should be range from 0 to large positive numbers?

Q4: in the content below says: " To select the most likely category, the softmax is not required. One can find the index of the largest output using [np.argmax()]. What this means? does it mean we can just use .argmax() to make prediction instead of using .softmax?(numpy.argmax — NumPy v1.26 Manual)."

Many thanks for you help! Look forward to your responses.

Christina

A1 The training dataset consists of 2000 examples, where each example has two coordinates, and the values are clustered around four different coordinate pairs.
centers = [[-5, 2], [-2, -2], [1, 2], [5, -2]]
These clusters are numbered 0, 1, 2, and 3 by the ‘y_train’ values.

However, since we want to create a model that gives the best classification for each example, we are doing a classification method, rather than directly trying to predict the integer values 0 through 3 (which would be a linear regression method).

This is why the model is compiled to use SparseCategoricalCrossentropy and either softmax, or “from_logits = True”. Both of these methods will automatically convert the y_train values into one-hot outputs (where there are four outputs, one for each classification), as seen in the model definition here:

A2) The size (2 x 4) output gives the four output values for each of two selected examples.

A3) When the model uses a linear output, the values can range from -Inf to +Inf. The softmax function re-scales these values to between 0.0 and 1.0.

I created a plot of the X_train data, color-coded by the y_train values:
image

Thank you Tom for your detailed responses. much appreciated.

I could not see a response for Q4, could you please provide an answer to it as well?

Q4: in the content below says: " To select the most likely category, the softmax is not required. One can find the index of the largest output using [np.argmax()]. I don’t get what this means? does it mean we can just use .argmax() to make prediction instead of using .softmax?(numpy.argmax — NumPy v1.26 Manual)."

Since the softmax function is monotonic, it doesn’t change which of the values will have the highest value.

So if you just need to find the output that has the highest value, you don’t need softmax at all.

Brilliant Thanks Tom.