If you see the Soup Bowl image (1st image), the lowest possible cost is probably less than 100.
If you see the contour plot (2nd image), the lowest possible cost is around 1760.
So unless the Soup bowl graph is not from the same training dataset (and used just as a sample image not related to the training set), or the Soup bowl graph Y-axis has a base scale of something like 1000, I am not understanding why the lowest J values seems to be very different in the two graphs.