[Seeking Guidance] Heart Disease Classification

Debatreyo_Roy · December 15, 2023, 10:34am

I’m getting these metrics after training a
Logistic Regression model,
a Random Forest model (n_estimators=200, max_features=“sqrt”, bootstrap=True, oob_score=True),
a NN (3 Hidden layers with 64, 32, 16 units, relu; 1 output layer with sigmoid).

Should a Random Forest, or NN usually not have better accuracy?

Link to the Jupyter Notebook: Machine-Learning/Logistic Regression_Project/Heart Disease Prediction/Heart Disease Prediction_binary.ipynb at c51e365e1c9fe16e632306e742861f6a6ed7abc8 · debatreyo/Machine-Learning · GitHub

Deepti_Prasad · December 15, 2023, 2:13pm

Hello Roy,

After going through your notebook, after reviewing your notebook, I came across this link for the error you are getting. it mentions related bootstrap being true.

also related to your max_features you used sqrt, try using the random sub feature(use link to understand better)
Now, RF creates S trees and uses m (=sqrt(M) or =floor(lnM+1)) random subfeatures out of M possible features to create any tree. This is called random subspace method.

Check the link below it might help you

also check if you can use L2 hyper parameter (when I am stating this I am not stating L1 is not correct. Just to see if you find a different predictive analysis.

Regards
DP

rmwkwok · December 16, 2023, 2:41am

Hello @Debatreyo_Roy,

Random Forest → RF
Logistic Regression → LogR
Neural Network → NN

It is extremely dangerous to put an equal sign between a certain method and a certain performance expectation. We can easily build and configure a NN that is doomed to be overfitting the data and be performing very badly.

Your notebook is a good starting point to see some results, but it is jumping too fast to expect to see “complex model wins simple model”.

Did you inspect the other parameters of RF that was not tuned in your GridSearchCV? Are their default values good enough to beat down your best LogR?
In the training logs of your NN, the test set loss drops at first and then climbs later. What does this signal? Is training longer always better? Did you get your best NN? If not, why should we be surprised that a bad NN to perform not better than the best LogR?

@Debatreyo_Roy, my overall impression is that, you have kick-started it which is extremely important, but the notebook has not shown that it had got the best RF nor the best NN to compare with your best LogR, and therefore, we cannot blame the methods.

I recommend you to focus on RF and NN one at a time, study them thoroughly and convince yourself you have got the best RF and the best NN by exploring everything you can configure about them.

For RF, the full list of things you can configure is the list of input arguments for sklearn.ensemble.RandomForestClassifier. However, for NN, there is no such full list. Will you google? WIll you find out how others deal both with overfitting and with underfitting? Show your research in the notebook ;).

Good luck!
Raymond

PS: I moved this to the AI Projects category.

Topic		Replies	Views
Is %97 accuracy too much for a simple model? AI Discussions ai-discussions , model-customization , project , ai-question	8	506	March 3, 2024
W3_A1_Wrong model accuracy Neural Networks and Deep Learning	3	493	March 1, 2023
AI4M Course 2 Week4 RandomForest model performing worse than Cox AI for Medical Prognosis week-4	1	546	January 17, 2023
Neural Network Classification Problem using Tensorflow AI Discussions ai-discussions	6	132	February 15, 2025
C2_W1_Lab02_CoffeeRoasting_TF - Logistic regression vs Neural Network Advanced Learning Algorithms week-1	3	31	December 10, 2024

[Seeking Guidance] Heart Disease Classification

Related topics