Stuck, need help

I was trying a Kaggle exercise for practice. Which predict the insurance data.

However I’m stuck in a local minima. It doesn’t matter what I do, increasing or decreasing the learning rate, feature scaling, feature engineering , the cost is always stuck around 750000.

My work:

model = Sequential([
    Dense(50, activation='relu', input_shape=(X_train.shape[1],), kernel_regularizer=l2(0.01)),
    Dropout(0.2),
    Dense(30, activation='relu', kernel_regularizer=l2(0.01)),
    Dropout(0.2),
    Dense(20, activation='relu', kernel_regularizer=l2(0.01)),
    Dropout(0.2),
    Dense(40, activation='relu', kernel_regularizer=l2(0.01)),
    Dropout(0.2),
    Dense(10, activation='relu', kernel_regularizer=l2(0.01)),
    Dropout(0.2),
    Dense(1, activation='linear')
])

Got similar result with or without the regularization.

Tried playing with learning rate and momentum as well:

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.008, beta_1=0.9, beta_2=0.999), loss='mse', metrics=['mae']) 

Tried different batch size as well:

history = model.fit(
    X_train_scaled, y_train,
    epochs=50,  # Adjust based on performance
    batch_size=128,  # Larger batch sizes for faster training - model will process 128 samples at a time before updating its weights
    verbose=1)

One thing to note: the data is huge 1 million training data

For this practice exercise, are you intended to use five layers of ReLU units? And have layers with fewer units in the middle of the model? That seems an unusual design.

Did you try a very simple model first?

Using Dropout for regularization. I did try without it. I tried simpler method like having just 2,3 layers with small number of units. But the outcome is same all the time.

Link to the kaggle page please.

Link is not allowed. Go to the Kaggle website and try this series: playground-series-s4e12

Why is sharing the link disallowed?

The target variable is a continuous value with quite a large range. Here’s the sample data from the website:

id,Premium Amount
1200000,1102.545
1200001,1102.545
1200002,1102.545
etc.

A NN is going to have quite a challenge to catch up to such target values.

  1. What specializations have you taken so far? (how about tensorflow developer specialization?)
  2. What are your input features?