Example of encoding the non linearity using feature crossing

I have read in many literatures that feature crossing can help in transforming the non linearity to linear data.

As it is explained in this course too, can you please provide an example of it ?

Hi @kshashankrao

here you can find an explanation which I can recommend:

Source: Feature Crosses: Encoding Nonlinearity  |  Machine Learning  |  Google Developers

In this example you see a nonlinear problem which cannot be easily separated with a linear classifier: by crafting a new feature (also called feature engineering) x_3 = x_1 x_2 with the shown dimensions x_1 and x_2 you can create a new feature x_3.

In a simplified view for this specific example, assuming symmetry of the axis, the following applies for our new feature:

  • negative * negative turns into positive
  • positive * positive turns into positive
  • negative * positive turns into negative
  • positive * negative turns into negative

All blue dots will be positive.
All orange dots will be negative.

With x_3 you build a feature cross by multiplying two existing features. This allows learning a new weight, encoding nonlinear information in the features and by this you make the problem solvable with a linear classifier.

Here you find another example how to transform data, getting rid on non-linearity in the data space: Can we start with the circle equation as decision boundary? - #12 by Christian_Simonis

Best regards

1 Like

Nice, this gave an idea about it.

I was looking into SVM and a method to tackle non linear classification is to use the a kernel trick i.e. transforming the data using a gaussian kernel.

Does this method come under feature crossing ?

Glad to hear that.

Feature crossing is rather using existing features to derive at least one new explicit feature.

Therefore, the kernel trick is no feature crossing since in this case it utilises:

kernel functions, which enable them to operate in a high-dimensional, implicit feature space without ever computing the coordinates of the data in that space.


So no new feature is calculated with the Kernel trick which is described here - feel free to take a look: Can we start with the circle equation as decision boundary? - #14 by Christian_Simonis

Please let me know if your question is answered, @kshashankrao or if anything is open from your end?

Best regards

1 Like

Thanks alot for clarifying it

1 Like