Can someone explain in another way what Prof. Ng is trying to explain here in his video lesson at 4:09?
“In the more general case, we’ll actually try not just three values, but multiple values along the X axis. And one convention would be to sort all of the examples according to the weight or according to the value of this feature and take all the values that are mid points between the sorted list of training. Examples as the values for consideration for this threshold over here. This way, if you have 10 training examples, you will test nine different possible values for this threshold and then try to pick the one that gives you the highest information gain.”
Thanks.