I will like to know how softmax was able to place digits in categories for the digit recognition model. We only gave the input (pixels) and target (actual value) but did not specify that digit 0 should be in category (index) 0 or digit 9 should be in category 9 for the predictions.

The mapping of the images to their labels is handled when the data set is created, via the values assigned for the ‘y’ outputs.

For each example, we make a prediction for each label.

Then, whichever output has the highest value is used as the prediction.

Does it evaluate the size of the target and use that to make a category? If it is like this, how does it create category when the targets are not integers and the values can’t be estimated.

For example, the prediction corresponds to index 2. Is softmax evaluating the size of the digits and that’s why the probability of getting 2 is exactly on index 2 and not index 4 or somewhere else?

I’m not sure what you mean by “the size of the target”.

Since machine learning is a mathematical process, everything must be characterized as a value.

For each example, you’ll get a prediction for each label. The predictions are saved in a vector or matrix. The column that has the highest value is the best prediction. The column indexes are mapped to the labels.