Selecting the right model for the task

Sam_R · April 24, 2024, 10:10am

Suppose that we are given a dataset of house prices and sizes. Then we are given a new value of a house size, and asked to predict the price. One way we can do it, is using linear regression. But I have learnt (elsewhere) that more complex regressions exist, eg polynomial regression. How do we decide whether we should fit a line to our data, or a polynomial curve?

My own hunch, is that one of the ways would be having some theoretical reason. If we had some theoretical reason to believe that the relationship between house size and price is supposed to be linear, that would be a sufficient reason to choose a linear model. In this case, it is hard to think of such a theoretical model, but in the natural sciences one could justify linear modeling with theoretical justifications. Is it a good way to select a model? Is it, when such reasoning is available the best way to select a model ?

hackyon · April 24, 2024, 10:19am

Feature selection/engineering is an art. There are guidelines and recommendations, but (at least so far) there’s no reliable way to find the best model for every case. It’s usually up to the AI engineer to try different combinations, and choose the one that works best.

As an AI engineer, it’s a good idea to have a “guess” on the type of relationship (linear vs polynomial) between a feature and the output when feature engineering

TMosh · April 24, 2024, 4:56pm

When you create the model, as a designer you start simple and then add more complexity, until you get results that are “good enough”.

Creating new features (via polynomial combinations of the existing features) is one way to increase the complexity.

Amildo · April 25, 2024, 3:28pm

As my friend (hackyon) mentioned i think its your art to choose the best model to be fitted with your data.

but you can use Grid Search for tuning and finding the best parameters for your Algorithm.

Consider this sentence " there is no best Model for your data ".

Mujassim_Jamal · April 26, 2024, 5:08am

What I think is that you should first study the relationships among the features of the data. This includes examining how they are correlated with each other and creating graphs to visualize them, along with using many other techniques to determine their nature (linear, polynomial, etc.). This will help you determine the relevant features and decide on the appropriate type of model to use.

Ahmad92 · May 6, 2024, 10:59pm

@Sam_R

I think you should pay attention to:

Data Exploration: To find possible trends, make a picture of the link between size and price.

Model Comparison: Use measures like R-squared and mean squared error to judge models. For figuring out generalizability, cross-validation is very important.

Type of Model: Find a good balance between model fit and ease. If you make your models too complicated, they might overfit and not work well with new data.

They can be put together to find the model that best predicts house prices based on size, giving you both accuracy and ease of use.

Sam_R · May 15, 2024, 12:25pm

Suppose (for the sake of simplicity) that you try to fit a uni-variate model. You would have only one variable, hence there would be no benefit to look at the correlations between variables. How would you decide whether to use a linear or a polynomial model in such a case?

Sam_R · May 15, 2024, 12:26pm

So what guidelines and recommendations are there?

Also, how would you measure the one that works best - simply by performance on the validation set?

TMosh · May 15, 2024, 2:36pm

See my previous reply on this thread.

TMosh · May 15, 2024, 2:37pm

Training sets are used for training.
Validation sets are used to adjust the model parameters (i.e. to avoid overfitting).
Test sets are used to verify the performance of the completed system.

Topic		Replies	Views
Polynomial Regression - choosing degree of the non linearity Supervised ML: Regression and Classification week-2	4	498	June 29, 2022
CW W2 Lab 4: Creating feature vs changing model Supervised ML: Regression and Classification week-2	14	476	May 20, 2023
How to choose correct function for regression? Supervised ML: Regression and Classification week-3	1	506	June 25, 2022
When to use regression vs neural network model Advanced Learning Algorithms week-3	4	386	August 14, 2023
Optional lab: Feature engineering and Polynomial regression Supervised ML: Regression and Classification week-2	1	546	July 11, 2022

Selecting the right model for the task

Related topics