Question about Decision trees lab

Hi everybody, first post here!
While doing the Decision Trees optional labs, I stumbled into a “Division by zero” error while solving exercise 3.

I quickly discovered that the problem was one of the split subset being empty and making entropy function crash.
In fact, in my code I pre-calculate p1 like this:

   p1 = count/len(y)

where count is the number of ones in y. If length of y is zero, this line (obviously) crashes.
I solved the problem (and passed the test) by just adding:

   if (len(y) == 0):
        return 0

But I would like to know if I am doing something weird or stupid - and maybe the patch is just fixing a mistake made elsewhere.

Thanks!

Hi @kalbun, welcome to the community!

It makes sense to return 0 when the length of y is 0, as this allows to handle the calculation of entropy for empty subsets. In decision trees, it’s common to encounter scenarios where a split results in empty subsets, which inherently have no entropy because there are no data points to contribute to the uncertainty. In this respect, your approach is not strange or wrong; it’s a valid safeguard. This is necessary for decision trees in many practical implementations.

Hope this helps!

1 Like