After running the Mini-batch gradient descent, the cost is suppose to zigzag as in the lecture. However, this is what I got:
Is this because the batch size is relative small so the zigzag is not so big and make it looks like a smooth line?
There is no guarantee that the cost will oscillate if you use small mini-batches. It can happen, but it’s not guaranteed to happen. It all depends on the properties of your data and the model you have specified. Of course the values of other hyperparameters like the learning rate are influential here as well.