What happens inside the functions that runs the trained DNN model on the test and training sets? Why does it execute so quickly as opposed to when you are training the model?
Since evaluating the model does not involve any training, only the âforward propagationâ is performed, and it only happens once per example.
You donât have to wonder what happens: weâre just using the functions that we wrote in the previous Step by Step assignment. Take a look at the code in the predict
function in the previous exercise. As Tom says, when youâre using the model to make a prediction, you just run forward propagation once although you can feed it a matrix of samples so that you get multiple predictions in that one pass of forward propagation.
We just wrote the code to do the training here in this assignment, so remember what it does in each training iteration:
- Forward propagation on all samples.
- Backward propagation on all samples.
- Apply the updates to the parameters.
So each training iteration is roughly 2 or 3 times as expensive as just running predict
once. But then we repeat steps 1-3 2500 times. Itâs the 2500 times (or however many training iterations you do) that is the costly part of training, of course.
So when a model is ready for a real world application, would only one input (âexampleâ) be fed into the model at one time or would a batch of inputs be fed into the model? Also, do you know what the typical hardware requirements are for one of these runs or âforward propagationâ iterations?
Are models retrained when there usefulness in real-world applications somehow declines or if they need to be adjusted to be applied in a different environment or say the environment changes? Or is it more once you train a model, it is set and you release it for a particular use case?
I hope my questions are clear and make sense. Iâm just curious about the practical applications of AI in the real world and trying to understand the concepts. Thank you for your help.
Either way - the number of examples doesnât really matter. It depends on how many predictions you want to make.
Yes.
If your training data is no longer valid, or of you obtain more training data, you would re-train the model.
Thanks for getting back to me so quickly. I have several more questions.
Do engineers retrain models in real-time for certain applications, say for example, if new data comes in? or is this infeasible?
If a model is retrained on new data, is this called âpersonalizationâ in some use cases?
and finally, would it be infeasible for a model to be retrained if you only acquired a few more examples versus a large number of examples?
Exactly. Youâre just invoking the predict
function with an input matrix X, which has dimensions n_x x m, where n_x is the number of features in each sample and m is the number of samples. m can be 1 or any other number bigger than that. So the code is exactly the same either way and it just depends on what your goal is.
Albert, I hope you did as I suggested and went back and read the code for the predict
function.
There are ways to do retraining âon the flyâ, but I have not seen those covered in any of the courses here. Maybe the MLOPS specialization would discuss some of the topics there, but I have not taken that and it is currently offline to be revised.
Maybe we will get lucky and someone else who is familiar with some of those techniques will notice this thread and chime in.
But in general if you have a large initial training set and only a few more examples that donât work well with the existing model, you can try doing the retraining with the augmented training set. What happens will depend on the numbers and how different the new examples are. I donât know of a way to predict in a general whether that would make a difference or not. Like many things in ML, the answer is that you have to try it and see what happens.
I didnât see a predict
function when I went back and looked over the Building_your_Deep_Neural_Network_Step_by_Step lab.
I did see Figure 1 with the âpredictâ at the end of the diagram, but couldnât find the function.
Never mind. I found the predict function in the files in the Deep Neural Network-Application lab.
In this assignment, the predictions are computed by L_model_forward().
You can see that in the doctext.
More typically there would be a separate âpredict()â function.
Oh, sorry, youâre right: itâs just given to us in the Week 4 Application exercise. I was thinking of Week 2 (Logistic Regression) and Week 3 (Planar Data) where we wrote a predict
function in both those cases. The actual function definitions are slightly different in the 3 cases, but the fundamental behavior is the same: it just runs forward propagation for whatever the model is and then converts the sigmoid
probability outputs to 0 or 1.
@siral they donât let me play with Enterprise grade toys yet, so I wonât proclaim yet to know all the ins and outs of how all this works, but âon the flyâ retraining is more commonly known as âonline learningâ or âcontinual learningâ.
Though much of the discussion is on online inferencing, Chip Huyen (a former student of Prof. Ng I believe) has a chapter on this (Ch 9 - Continual Learning and Test in Production) in her excellent text Designing Machine Learning Systems[Book].
In addition she has set up a general discussion Discord to go along with the text. There are some really smart people over there, but one in particular comes to mind, a user that goes by the handle âMattrixOperationsâ.
I think if you asked nicely and respectfully (i.e. not pestering), they would be able to answer/address all your related questions.