Order of Inputs (features) :- m x n (m – rows; n – columns)

My ANN model for binary classification problem:- 3 hidden layers with 1 output layer.

(Relu)Hidden layer 1 :- 20 nodes Hence., order of activation vector from this layer will be m x 20

(Relu)Hidden layer 2 :- 12 nodes Hence., order of activation vector from this layer will be m x 12

(Relu)Hidden layer 3 :- 6 nodes Hence., order of activation vector from this layer will be m x 6

(logistic)Output layer :- 1 node Final output order shall be m x 1

Compute Jcostfunction for the m x 1 output from Output layer.

Check ------->if Jcostfunction >Jthreshold is TRUE., then keep doing from Hidden layer 1 until Output layer.,by updating w and b

Check --------> if Jcostfunction <= Jthreshold., then stop., return w and b.

Use the returned w and b for prediction of test set.
PS: Though, I dont explicitly mention Gradient descent in this flow above, the nuts & bolts of computing Jcostfunction stems from gathering Losses and averaging.

Any corrections/suggestions very much appreciated!
Thanks for your time!

So essentially you are suggesting a way for early stopping the gradient descent iterations if the cost is less than a threshold. I think that sounds like a good idea! Another idea is to measure the cost of a cv set, and if the cv cost stops improving, then the iteration can stop.

So, Jthreshold was a generalization from my side.
But, seeing your post, was thinking of using it say from an empirical conditional probability value. Eg., Given some image analysis attributes (which I derive from a set of 90000images)., classify each image as belonging to class 1 or 2 based on Region of Interest (ROI).

Calculate(cp1) P(class 1/set of specific range of values for all the attributes) like wise calculate for (cp2) conditional probability of class2.

I can use cp1 or cp2 for Jthreshold to train my model.

My guess is this approach will consume additional computational resource just for arriving at these conditional probability values.

If maximizing cp1 and cp2 are your project goals, then we need to spend those resources, but since the attributes are static (I assume), you may precompute something to save some redundant burden at training.

Yes I have the same idea about fixing Jthreshold for maximizing performance., more over each unique conditional probability here for eg., cp1 which corresponds to p(1/‘a prior event’ which in turn is occurence of a set of attribute values) shall be input as an extra feature for my ANN.

I am doing literature search to look at specific examples where people train their ANN models with conditional probability of set of image attribute values.

Yes, so if I were you, I would pre-compute all the attributes’ values for each image, so that each image has a tuple of attributes (1, 0, 0, 1, 3., 2.8, …). It is just like an image has a class label, but now it also has a tuple of attributes.

Then I would convert each tuple to an event (such as defining event = 0 when attribute_0 == 1 and attribute_1 < 2 and …). Then each image has a class label and an event label.

Then I could pass both the class label and the event label into the model, either as model input or as model output. And then I would need to custom define the loss and metric functions to use the y_pred, y_true (the class label) and y_event (the event label) properly.