Decision trees with 1 node and 1 feature: do they reduce to linear regression or logistics regression?

Hi all!

I am curious whether the basic logistic regression and linear regression fits are hiding somewhere in the decision tree algorithm. In particular, I wonder if:

  1. consider a decision tree with 1 continuous feature and a binary output. Take a tree with only a single node, so we only have a single threshold to classify the data. If we were to do logistic regression on the same problem, and classify with a threshold of likelihood > 0.5, would this generate the same algorithm? I am wondering this because the logistic cost function looks a lot like the formula for the information gained.

  2. Consider a decision tree with a single binary feature and a continuous output variable. Take a tree with a single node (so the predicted output value can only take two possible values. If we were to do linear regression on the same problem, does this generate the same algorithm? (because the input is either 0 or 1, again there are only two possible predictions.) I am wondering this because the cost function for the decision tree is based on the variance, which looks very similar to the mean of squared differences function.

Thank you for your time and any comments.


Hello @shostakovich,

These are interesting thoughts! I suppose your investigations can start with the following two steps

  1. hand write down a simple dataset that can lead to optimized models that examplify your ideas.

  2. Actually train some models with sklearn using the same simple dataset.

Then we can compare them and see if we can figure something out. Perhaps you might focus on only one of the two ideas first.