VideoInference in Code : TensorFlow implementation

Hi All,

Having a doubt in Handwritten Digit classification" using TensorFlow. It was explained before that the typical input will be 8x8 matrix of Pixel intensity values. But why do we define just 25 Units in Layer1? Could you please explain?


The number of units in a layer is a hyperparameter that should be tuned based on the dataset and the problem constraints. This is a demo. Do play with the configuration to see how results vary.

Thanks Balaji. I was thinking that in Layer1, 64 units might have been specified as the input, 1 per pixel, but it was just 25.

Few more questions:

  1. Looks like the values for Wj, Bj will be changing during forward propagation. In Tensor-Keras framework, by chance, does the “Gradient Descent” (as per the activation of the previous layer) gets computed during forward propagation and the newer w, b values gets passed on to the layers to the right, so as to minimize the cost function ? Pls chime in.

  2. Could you let me know the rationale to make decision about the number of Dense layers required for the final prediction? For eg: For Coffee Roasting prediction, 2 layers were opted, whereas, for Digit Rocognition prediction, 3 layers were opted.

Model parameters (i.e. weights and biases) change only during backward pass. Forward pass is used to compute the loss based on model prediction.

The goal is to maximize the predictive power of the model while obeying the constraints of memory & compute. Hyperparameters like number of layers and number of units per layer are decided on a trial and error basis.