Just curious about best practice here:
Optimize typically means find parameters values (w, b) corresponding to minimum value of cost (J). In exercises, we use a fixed number of iterations rather than iterate until a minimum J (delta-J < some epsilon) is achieved. I tried larger number of iterations - and the cost J kept on decreasing monotonically.

So - is it a common practice to use a fixed number of iterations (helps control how much we spend on computing) rather than seek true minima? Or that is just for the exercises, and we can “assume” that frameworks/libraries do indeed seek minima.

This is just the first course in the DLS series. There are more sophisticated ways to manage convergence. Here we are just doing the simplest form of Gradient Descent with a fixed learning rate and a fixed number of iterations. We will later learn about more adaptive and sophisticated optimization methods and then later will hand all this off to “frameworks” like TensorFlow. The better way to deal with this is to use dynamic or adaptive methods for managing the learning rate and defining convergence in terms of a threshold for the delta between one step and the next (e.g. stop the process when the delta from one iteration to the next is less than a specified threshold).