In the example of the digit recognizer how do we know that the index of the output activation corresponds to a particular symbol (digit) i.e. how do we know, or how did tf organize the units such that second index of the output activation vector corresponds to the z/probability of it being the symbol “2”. What if it was “dog”, “cats”, “other” not digits?
It’s based on prior knowledge of how the labels were applied to the data set.
For this assignment it’s just lucky that the information conveyed in the images of the digits happen to match some handy numerical codes.
If you were trying to identify other sorts of things, you’d have to know the mapping between the object names and their numerical codes.
Thanks for the quick response! I’m not sure I still completely understand. Is tf in the background creating a set of unique label values and that the order of discovery of these labels is the order of the output vector values i.e. say the labels where strings 0…9 and the data-set just happened to have been ordered such that the first unique label discovered in the set was “9” and the last was “0”, would a is assumed to relate to “9” or is tf doing some other sorting of the labels.
The labels are applied by whoever prepared the data set. TensorFlow is just a method for performing machine learning - it doesn’t do anything to create data sets.
Fundamentally, for each example X[i], someone created a corresponding y[i] value as its label. Then they saved the data set (X and y) as data files, and packaged it with the notebook.
You might want to take a look at tf.keras.utils.image_dataset_from_directory | TensorFlow v2.11.0
image_dataset_from_directory(main_directory, labels='inferred') will return a
tf.data.Dataset that yields batches of images from the subdirectories
class_b , together with labels 0 and 1 (0 corresponding to
class_a and 1 corresponding to
Works for larger sets of classes, too. You can do that every time and read the images into X and have TensorFlow create the labels, Y, automagically. Or, do it once and write your X and Y back out as .npy files or TF datasets, ready for quick reload. HTH