Which model to use?

I am working on prediction of the survivors of titanic using their data (I know its weird) but after looting some graphs and feeding it into logistic regression i don’t get any efficient model no matter how i tried. So should I learn neural networks and than apply that or anything other than that you want to suggest.

Have you seen the notebooks here ?

1 Like

Hi thethunderstrome,

It’s completely normal to hit a wall with Logistic Regression on the Titanic dataset! It’s a classic problem, and the solution typically lies in data preparation, rather than simply switching to a more complex model.

Before jumping into Neural Networks, I strongly recommend you focus on two key areas:

1. Feature Engineering (The Biggest Gain)

Logistic Regression is a linear model, and it struggles with raw data. You need to create features that expose non-linear relationships.

  • Extract ‘Title’: The title (e.g., Mr., Miss, Master) from the name is incredibly predictive.

  • Create ‘Family Size’: Combine SibSp and Parch. People traveling alone, in small groups, or in very large groups had different survival chances.

  • Improve Age Imputation: Instead of using the overall average, impute missing ‘Age’ based on the passenger’s ‘Title’.

2. Try Ensemble Methods

These algorithms handle the complexity of the data much better than Logistic Regression without the heavy overhead of Neural Networks.

  • Random Forest: A great starting point. It’s robust, less prone to overfitting, and handles non-linearities automatically.

  • Gradient Boosting (e.g., XGBoost): These are the gold standard for structured, tabular data and will likely give you the highest accuracy.

The takeaway: A well-engineered dataset fed into an XGBoost model will almost certainly outperform raw data fed into a Neural Network on this challenge.