Hey guys. So i’m a little bit confused about how Deep Network work. So basically after it does forward prop, compute cost, backward prop, update parameters and repeat that 4 step for num_iterations times in the training images, when it does the test, does it have to do these 4 steps again for num_iteration times? Or can someone maybe correct me? Thanks!
The network performs 4 steps during training to learn the patterns from the data.
During test time you perform the forward pass and compute cost/loss. You don’t perform the backward pass and hence no parameter updates.
Oh so basically it predict using the pattern it learns?
Let me explain to you by using the code.
Training_data = dataloader(file_path)
Test_data = dataloader(file_path)
model = loading_model().
for single_epoch in total_number_of_epochs:
for single_batch in batches(Training_data):
single_batch feed to model()
On every batch or epoch, we are testing the model on test datasets
Oh, but if the Network was trying to predict the answer, does the Network follows the pattern it learn from based what it learn from training sets?
Yes. However, model architecture plays an important role in it. In order to learn from training data in a generalized way, choosing the right deep learning architecture is very important. This will help it perform better with untrained data. E.g test data.
That‘s right. You can steer the training of your network with hyperparameters e.g. to make sure the model complexity is appropriate for your problem (e.g. regression task) and your data. Possibilities include here L2 regularization or dropout.
More details will follow in the 2nd course of the deep learning specialisation along your learning journey:
Thanks! By the way, like the last answer of mine, does predicting using the pattern, with forward propagation need to repeat num_iterations?
here is my take:
- you can to the prediction in a feed forward style just without iteration in case you are only interested in the result of your input with this very trained net (evaluating the trained weights)
- in your case however when you want to evaluate the performance during epochs, it makes sense to do this in an iterative way along the training to monitor how the training is doing. I think @akkefa post is quite nice on this note
Please let me know it this answers your question.
Oh, so the programming assignment that I’ve done, which one of these two option that you’ve given is implemented for the prediction? Oh and also one more thing, does this kind of topic will discussed in upcoming lectures? Thanks!
Yeah, evaluating is done based on the prediction. The good thing is: you have a label as ground truth which helps you in evaluating how good the prediction actually is and adapt your weights based on this.
Of course! In the specialisation you find many more applications and use cases.
Feel free to take a look here if you want to dig deeper.
Thank you for the answer!
Without going into the lion’s share of the work which would be the exploratory analysis, talking to stakeholders to get clear on what goals will make or break your success in the project, identifying and correcting missing and bad data, examining correlations, wrangling the data, deciding on a model, the training and inference might be oversimplified by thinking of it like this:
1–load the data
2–fix the data
3–wrangle the data into the right numeric form
4–normalize it, do feature engineering, PCA, check correlations.
etc., as desired.
5–set up your model.
6–split your data into training, validation, and test groups.
7–“fit” the training data to your model. Check the performance
8–“predict” testing your validation set against the model.
(This lets you see how your model performs with data it hasn’t
9–perhaps adjust your hyperparameters and start over at 7.
Keep doing this until your model’s performance in validation is
10–“predict” the “test” data and check what the final performance
of your model is to make sure you did not overfit the validation
Your preferred model will not always be a feed-forward neural net. You may find most to be simple linear or logistic regressions or decision trees or random forests. If you’re going the deep learning route, you may be in for a long wait that can be reduced by finding a network already “learned” and using transfer learning locking down the first layers and learning only the last new layers. You may need an RNN of some type, an LSTM or GRU. But in almost any of these cases, you’ll still end up “learning” your training data (regularizing it perhaps with L1 or L2 or dropout.
Now if you have all your data for learning and don’t need to do it one record at a time as it comes in, you may be able to use batch gradient descent or mini-batch. Most of the time you might prefer mini-batch, perhaps making sure each batch will fit within a GPU just to make things faster. Otherwise, if you have to learn online, you may use one record at a time learning or “stochastic” also called “online” gradient descent.
Perhaps others more experienced will be able to correct me where I have missed something or gotten something wrong, but I believe that is sort of the general idea and I hope it can be helpful or that I’ll be forgiven if I am wrong. I will be thankful if some here could help make sure I have not led anyone astray by mistake.
I never explained this way but I’ll try to keep it as simple as possible:
The forward prop is the “prediction” phase, you insert data into a network and get the prediction as what comes out from the far end
when you want to train your network - you need the rest of the phases -
you use the output you got to calculate how far off your network is by whatever loss function that you choose to use and the ground truth (that you use only in the training phase!)
and then you backprop which corrects the weights to hopefully fit better for the next batch (or iteration if you prefer this term)
Welcome to the community.
You are right, the backprop checks the losses you have got during the forward pass (by correcting the weights). So, in a way, what you started with initially, you should get the same in the end.
And here’s a good thread to read shared by Christian Simonis.