Statistical tests on linear regression and other ML algorithms

Hi, first of all thanks for your great passion in teaching this course
I have some questions about statistical tests indeed for regression, based on some statistical references (i.e. discovering statistics-andy field)
we have some statistical tests on regression like F-change test F-test and T-test, to see for example if common variance is gained on chance or if coefficients (w_i) can significantly improve the prediction of the response variable. also we should check regression assumptions before training the data.
why these stuffs ,specially statistical tests, aren’t considered in creating a model?
are statistical tests being considered in real word projects and qualified academic papers? are they musts?

Hi @Amir_hosein

Welcome to the community

Statistical tests like F-tests, t-tests, are fundamental elements in the analysis of regression models. They help assess the significance of coefficients, validate assumptions, and ensure the reliability of model results. However, their role in model creation varies depending on the specific project goals.

In real-world applications, the emphasis is often on developing practical and effective models that provide accurate predictions or insights. While statistical tests are informative, incorporating an extensive array of tests can complicate the modeling process. Striking a balance between complexity and predictive performance is key.

It’s important to note that some assumptions of regression models, like normality of residuals or constant variance, might not hold in actual datasets. Over-relying on these assumptions and tests could hinder the model’s usefulness. Moreover, intricate models that satisfy every test could lead to overfitting, where noise is learned instead of actual patterns.

In certain cases, domain expertise guides the modeling process. Experts may make informed decisions about variables and interpretations, supplementing or even bypassing certain statistical tests. The focus here is on leveraging contextual knowledge to create meaningful models.

In practical projects, the emphasis might shift towards prediction rather than hypothesis testing. Cross-validation and other techniques aimed at optimizing predictive performance take precedence.

In essence, statistical tests are valuable tools, but their inclusion in model creation depends on the project’s context and objectives. Striking the right balance between statistical rigor, model complexity, and practicality is crucial for building effective regression models.

I hope this help

Best regards
elirod

1 Like

I believe it is because machine learning methods do not require them. Students with a statistics background are often concerned by this.

1 Like