Test Accuracy Higher than Train accuracy?

Noted. I’ll try the MANOVA approach.

Another question, Does not having all the categorical values in all features across train, validation and test set cause an issue or bias?

For example, say after spliting the data, train set has all features with all categorical values in each of them as mentioned below:
features A with categories (0,1,2), B with categories (0,1,2,3,4) and C with categories
(0,1,2,3).

However, xval is having A (0,1,2), B(0,2,3), C(0,1,2).
xtest is having A(0,1), B(0,1,2,3), C(0,2).

Would this lead to poor performance of the model? or inaccurate performance?

I am not stating to include all the features, first divide your features into independent and dependent variables based on the understanding of the disorder or disease you are addressing, this could be done using the p-value hypothesis like as you mentioned chi-square test.

Then based on hypothesis results nto your and which scored was more relative to your disorder would be feature to select for which type of MANOVA analysis approach you want to do.

A total other approach would be surely K-fold cross validation too.

Regards
DP

Yes, this is a big problem. It goes back to my original reply - your train, val, and test sets don’t have the same statistics.