Hey there! I was going through the week 1 assignments and was struggling to understand 2 things:
- This is more syntactical, what does this mean: dW1 = 1. / (…) why the ‘.’ after 1? What does the ‘.’ do here?
- More conceptual, what exactly is one iteration of my neural network? If you’re plotting cost vs. no of. iterations then what information is exactly passed in one iteration? A single training example of multiple features to all the hidden units of the first layer OR all the training training examples simultaneously - and if this is the case, why perform multiple iterations if the input data isn’t changing?
For the first question, it’s Python’s data type. If you do not set “period”, then, “1” means “int”. If you set “period”, then, “1.” means “float”. It sometimes has an impact on the math follows.
For the send question, we may need to start with an overview of entire process.
Here is the data set. If a data set is quite large, we may want to split into multiple batches.
“iteration” is the number of trials to feed this batch into the NN.
On the other hand, I think you heard “epoch”. This means the entire process for NN, i.e, feed all data (forward-propagation), calculate the loss, and deliver gradients to update weights in each layer (back-propagation).
So, if an entire data is split into 5 batches, then, it requires 5 iterations to complete the first step. Then, start calculating the loss, and perform back-propagation.
The important thing is we have huge number of parameters (weights) to be optimized. With one single feed of entire data, parameters can not be optimized. What we can do is, again, to calculate the cost (differences between output and expected values), and start back-propagation to update the weights in all layers. Then, start the next epoch, i.e, iterations of batch data feeding. Then, evaluate the cost again, and start back-propagation. We need to repeat this process until the loss reaches to the minimum (zero gradient).
And, to avoid the network learns something dependent to the sequence of data, we may want to shuffle data, of course. And, depending to the complexity of network and data, sometimes we need like 10,000 epochs to get all parameters optimized.
Hope this helps.
who splits the training data into these “batches”? how is the split determined? does the NN execute this by itself? Also, once a batch has already been fed to the Network, is it reused in a future iteration?
Is this the first course that you take in this Specialization ?
I’m curious, since the course 2 is for hyper-parameter tuning and optimization. Some of your question is part of hyper parameter tuning, and relevant to this course. But, I think the majority is covered by the first course of this specialization.
Anyway,
- who splits the training data into these “batches”? → You need to decide.
- how is the split determined? → You need to consider system resources, like memory size, but it is part of hyper parameters that you need to decide.
- does the NN execute this by itself? → Again, your responsibility
- Also, once a batch has already been fed to the Network, is it reused in a future iteration? → It will be used several times depending to “epoch”.
You need to “respect” data, and always think how it can be best utilized to obtain our expected result by using NN. Splitting data into multiple batches, shuffling data, augmenting data with flip, rotation,…, You need to consider all to get the expected result.
Lot’s of things that you will learn from this course, and other courses in this specialization.
Enjoy learning !