Course 4, Week 1, Programming Assignment 2

I am curious why the batch size is different between the training statements for Happy model vs Sign model., Y_train, epochs=10, batch_size=16)
train_dataset =, Y_train)).batch(64)


You filed this under Course 3, not Course 4. I modified the title for you by using the little “edit pencil” on the title.

The minibatch size is what Prof Ng calls a “hyperparameter”, meaning a value that you need to choose as the system designer, as opposed to a “parameter”, which can be learned through back propagation. The best choice for a given situation is not always the same, otherwise there would be one “golden” value that everyone always uses. There are just some “rules of thumb”, e.g. the famous quote from Yann LeCun: “Friends don’t let friends use batch sizes greater than 32.” But apparently even that rule doesn’t apply in every situation. :grin: :nerd_face:

1 Like

If you came directly to Course 4 and skipped Course 2, it might be worth looking at some of the lectures in Course 2. The main focus of Weeks 1 and 2 of Course 2 is exploring different hyperparameters and discussing systematic ways of selecting them. I think it’s in the lectures in Week 2 where he introduces and discusses minibatch gradient descent.

Thank you! I wasn’t aware of the hyperparameter vs parameter distinction, but it makes sense. I did take Course 3 (got it confused with 4 - thanks also for correcting my post’s title).

I did take Courses 1-3, but I do find myself referencing then time and again, to remind myself.