Hello -

This is a two-part question. The first is more of a confirmation of what is happening (C2, W3), the second is about how to extend what I think is happening to when there are multiple hyper-parameters to be tuned. Any thoughts would be hugely helpful for me. I’ve put together some pieces not explicitly stated in the class and just want to make sure I’m thinking about things correctly.

**Part 1: Confirmation of the process with a single tuning parameter**

In class 2, week 3 assignment, in the section “7 – Iterate to find optimal regularization value”, there is the for loop where the models[i] models are created and used in the .fit(…) function, where a different lambda is used for each loop. This gives different learned weights for each loop. Then, those learned weights for each lambda are used to get a prediction (in the plot_iterate(…) function) for both the training data and cross validation data. Finally, the err_train[i] and err_cv[i] variables (from the plot_iterate) function are used to make the learning curve plot.

**Questions:**

**1.)** That all makes sense to me, if there is something incorrect with that process/logic could someone point it out please?

**2.)** The next step after creating a learning curve isn’t explicitly shown, but it seems pretty important. I assume we would use judgement to pick a lambda that is best, such as a lambda of 0.01 in this case (models[2]). Then, we use the learned weights that are in the models[2] and the lambda of 0.01 that is in the models[2] model, and run the .predict(…) method on the **test** data, via

probs = tf.nn.softmax(models[2].predict(X_test)).numpy()

This could be followed by calculating the error for the test data (using y_test) and that would be considered our final. **Is that next step / logic that I just described the correct way to go about it?**

**Part 2: How to deal with multiple hyper-parameters**

The great thing about the C2, W3 assignment is that since there is one parameter, it was easy to know which learned weights to use, the ones that were learned when the lambda was equal to the lambda we chose (0.01).

**Question:**

**1.)** But what if you are tuning lambda **and** alpha (learning rate)? Here is my thought, can someone please let me know what is wrong about this thinking if anything: Make a learning curve plot of the training and cv data for each tuning parameter being varied, one for lambda and one for alpha (2 plots, 2 curves on each plot). From those plots, if we chose a lambda of 0.01 and an alpha of 0.001 (for example), then those learned weights for those two models would be different from each other since they are two different models. I guess, then would it be correct to create one additional model using the chosen lambda and alpha values and using .fit(…) on the training data which would now give you new learned weights. Then the last step would be to do a .predict(…) using the new learned weights / model on the **test** data (you could do it on cv data too just to see I guess), then calculate an error for the test data. **Is that the correct way to go, am I missing anything?**

Thanks once again!