I got some mix message from week 2’s videos:
In Basic Recipe for Machine Learning, the recipe for improving a deep learning model with low bias and high variance includes trying out regularization. And Andrew also mentioned that unlike traditional machine learning methods, larger networks in deep learning don’t face the bias-variance tradeoff.
==> message 1: large network reduce bias but don’t increase variance
However, in a later video explaining how regularization helps prevent over fitting, I think it said that regularization makes w small and reduces the effects of neurons, which acts like training on a more simple network.
==> message 2: simple (small) network reduce variance, and complex (large) network increase variance
I imagine the answer to this confusion has something to do with we aren’t actually training on a smaller network, or that simple isn’t necessarily small. But I need help clarifying this confusion.
Another question is how does increasing the size of the network reduce bias and not increase variance? Wouldn’t more neurons and layers give the network more freedom to make individual decisions for the samples it was trained on?
I’m not sure about the specific context where Andrew mentioned the first message, I haven’t checked the content of course 2 in some time but I don’t think the conclusion 1 is what he meant. In fact, in this book “Machine Learning Yearning” says:
“Increasing the model size generally reduces bias, but it might also increase variance and the risk of overfitting. However, this overfitting problem usually arises only when you are not using regularization. If you include a well-designed regularization method, then you can usually safely increase the size of the model without increasing overfitting.”
In the book Andrew explains in more detail techniques to reduce bias and variance. I don’t think I’m allowed to post it here although he made available a draft version of the book online. In any case, the summary is as follows:
Techniques for reducing avoidable bias
- Increase the model size
- Modify input features based on insights from error analysis
- Reduce or eliminate regularization
- Modify model architecture
Techniques for reducing variance
- Add more training data
- Add regularization
- Add early stopping
- Feature selection to decrease number / type of input features
- Decrease the model size
Plus the techniques #2 and #4 from the bias reduction
I hope this helps.
Yes this makes a lot more sense. Thanks! But I have a new question regarding your post. Can you please explain how decreasing the number of input feature would help reduce variance?
Hi @Sara, Andrew’s explanation surely is better than mine
Feature selection to decrease number/type of input features: This technique
might help with variance problems, but it might also increase bias. Reducing the number
of features slightly (say going from 1,000 features to 900) is unlikely to have a huge effect
Reducing it significantly (say going from 1,000 features to 100—a 10x reduction)
is more likely to have a significant effect, so long as you are not excluding too many useful
In modern deep learning, when data is plentiful, there has been a shift away from
feature selection, and we are now more likely to give all the features we have to the
algorithm and let the algorithm sort out which ones to use based on the data. But when
your training set is small, feature selection can be very useful