Hi - I am finishing up the Week 1 Assignment 2 notebook, and I had a few questions
They aren’t about the solutions per se, more about the content and understanding a bit about what I just ‘did’
Thanks in advance!
In no particular order -
is there any reason we used ZeroPadding2D in the first example, vs padding=‘same’ parameter in the second? Are they just 2 ways to accomplish the same thing?
Can you provide a bit more info in the vein of “intuition” as far as why the models were built as they were? For example:
- Why did we add a BatchNorm in the first model vs the second?
- Why the initial choice of parameters (1 conv layer with 32 7x7s in excercise 1, vs 2 conv layers with their parameters as they were set for excercise 2?, or and how to know if a maxpool makes sense to use , etc?
The second exercise says its goal is to ‘learn’ the functional API, but the example is pretty similar to first one (i.e. a list of layers) , just specified slightly differently. Understood that I can go digging in the docs, but some ‘pedagogical’ guidance on why/when/how to do use it to do more than the simple list would be much appreciated!
re: the losses, is .3~ considered good? it seems high, unless maybe I don’t fully understand the units/type of loss keras is using?
one last question (sorry!). In the lectures, its mentioned generally that you need a fair amount of data to get good results out of a network like these, but these examples seem to run on 500/1000 images respectively, with decent results - is there more intuition you could explain to help understand how much input is really needed? i assume it would be a bit better with some more data in this case, but its clearly not millions of images either. I assume it has to do with the size of the images and the low number of categories, but a slightly more methodical thought process would be interesting to see.
apologies again for the long post, I hope it makes sense but I can clarify as needed if anything I asked isn’t clear (or is a dumb question ). Also, apologies if some of this will be covered in the upcoming videos, in which case the answer is more about patience
First Question! Same padding is used when we need an output of the same shape as the input. This value calculates and adds padding required to the input image to ensure the shape before and after. If the values for the padding are zeroes then it can be called zero padding but there are other type of padding called Causal padding in keras so the aren’t difference when we choose the same padding with padding value is zeros or when we use ZeroPadding2D but in the computer vision models usually we use ZeroPadding2D if the data we have is 2D shape
Second Question! we adding BatchNormalization in first model as we make an classification on face image which have more details in every image like eyes color … , and the data is from different distribution
so according to it we will have more weights so we decide to use BatchNormalization to make model more faster and weights value more less in addition to the images came from different distribution so we doing the BatchNormalization layer to make the distribution of data is same that’s will make the gradient descent more powerful to reach to the global minimum loss without oscillate around it …but in the hand models we only enough by divide the values by 255 which is make the image color between 0 and 1 and that is enough as we didn’t care with for example the color of skin but we care of the number of the hand assign and all images is in same distribution also the images details is less so we don’t need to use BatchNormalization layer
Third Question!I thinks what I said above give you a better intuition on why we choose those models and what is the difference between them
Fourth Question! The error is big is the second model because this is mutliclassification not binary so it’s normally will be much bigger than the first model but according to the number of training set we have and the simplicity of the Neural network the error is small and the accuracy is very good …also you can try to add batch normalization layer and see what the accuracy and loss but I think that will not change so much as you hope but try
Note the batchnormalization layer shouldn’t be after the MaxPool2D the best place is before the activation layer
Last Question! Finally The more data you have, the better the model, because it learns from different sources, and as I have read before, the fuel that feeds neural networks is a lot of data do you have .but the 500 or 1000 image now I thinks is small number it is just for the learning but I thinks now less than 10000 image is not sufficient for big Neural network with more powerful feature that is my opinion but if you have small training set the best wat is to tune the model’s parameter as much as possible and to make data augmentation is the best way to increase you data
Note Please, if you have a large number of questions, do not send them all in a unified post so that everyone who searches for Topic can find different opinions about it and the solution between them so that there is no distraction. also please feel free to ask any questions
In addition to Abdelrhaman’s excellent explanations, here are a few more thoughts and some links that I hope will add to the topic:
For the question of when you use the Keras Sequential API and when you use the Functional API, the key point is that the Functional API is more general. The Sequential API can only handle the case in which the way the “layers” you are adding are connected in the simplest way possible: each layer takes one input which is the output of the previous layer. If you have a simple setup like that to build, then you can use either API. But as soon as the case gets more complicated, e.g. one of the layers takes two inputs or the compute graph is not “simply connected”, the Sequential API is not longer sufficient and you no longer have a choice. You’ll see examples very soon in the Residual Net Assignment in Week 2 where you have no choice but to use the Functional API.
Also note that they really don’t give us much background on the two APIs in the notebook. Just a few sketches of examples. You can find it all in the TensorFlow documentation, but the better way to learn more would be to start with this explanatory thread from one of your fellow students which does a great job of explaining in more detail what the two APIs are capable of and how to use them.
On the question of whether the costs are high or low, note that the actual J values don’t really tell you that much. We really only use those as an inexpensive proxy for whether our convergence is working or not. The real way to evaluate the performance of your network at any point in the training is the prediction accuracy: compare the output of the model with the labels on both the training data and the test or cross validation data as appropriate. The percentage of correct predictions is the key evaluation metric for any ML/DL model. That’s all that really matters in the end, right?