I am curious whether the basic logistic regression and linear regression fits are hiding somewhere in the decision tree algorithm. In particular, I wonder if:
consider a decision tree with 1 continuous feature and a binary output. Take a tree with only a single node, so we only have a single threshold to classify the data. If we were to do logistic regression on the same problem, and classify with a threshold of likelihood > 0.5, would this generate the same algorithm? I am wondering this because the logistic cost function looks a lot like the formula for the information gained.
Consider a decision tree with a single binary feature and a continuous output variable. Take a tree with a single node (so the predicted output value can only take two possible values. If we were to do linear regression on the same problem, does this generate the same algorithm? (because the input is either 0 or 1, again there are only two possible predictions.) I am wondering this because the cost function for the decision tree is based on the variance, which looks very similar to the mean of squared differences function.
Thank you for your time and any comments.