I was trying a Kaggle exercise for practice. Which predict the insurance data.
However I’m stuck in a local minima. It doesn’t matter what I do, increasing or decreasing the learning rate, feature scaling, feature engineering , the cost is always stuck around 750000.
My work:
model = Sequential([
Dense(50, activation='relu', input_shape=(X_train.shape[1],), kernel_regularizer=l2(0.01)),
Dropout(0.2),
Dense(30, activation='relu', kernel_regularizer=l2(0.01)),
Dropout(0.2),
Dense(20, activation='relu', kernel_regularizer=l2(0.01)),
Dropout(0.2),
Dense(40, activation='relu', kernel_regularizer=l2(0.01)),
Dropout(0.2),
Dense(10, activation='relu', kernel_regularizer=l2(0.01)),
Dropout(0.2),
Dense(1, activation='linear')
])
Got similar result with or without the regularization.
Tried playing with learning rate and momentum as well:
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.008, beta_1=0.9, beta_2=0.999), loss='mse', metrics=['mae'])
Tried different batch size as well:
history = model.fit(
X_train_scaled, y_train,
epochs=50, # Adjust based on performance
batch_size=128, # Larger batch sizes for faster training - model will process 128 samples at a time before updating its weights
verbose=1)
One thing to note: the data is huge 1 million training data