In the function modelf(), after Conv1D layer we have BatchNormalization layer come before Dropout layer, while after the first GRU Layer we have Dropout come before BatchNormalization layer. Does the order matter and how to choose the order?
Hi there, I think this is a good question.
From my point of view, the order matters because it is two different structures of the network, although sometimes the difference can be negligible. I found a post that might be helpful, and I also look forward to other replies with insights.
I would think it should matter. Even if in some empirical experiment outputs may not be different, that does not mean that at higher scale-ups or different input data sets differences would not show up.
First, last layer usually would be have one unit for two class classification. Having L1 with 1 unit and last layer havng 4 units would just be wrong design.
Second it would seem that the input data has most complexity (least ordered, high entropy) and as we go from left to right we are reducing the input complexity to a much lower complexity of two class classification situation.
So may be there is a natural ordering of Layer sizes from input to output where size of layer k >= layer k + 1. I am just guesisng.