The gradients are the partial derivatives of the cost function with respect to the various w and b parameters. The derivative of a function can be either positive or negative at a given point, right? If it’s positive, it says that increasing the w_i value will increase the cost. If it’s negative, it says that increasing the w_i value will decrease the cost.

Of course what Gradient Descent does is to move in the opposite direction of the maximum gradient. That’s why the gradients are multiplied by -1. The maximum gradient points in the direction of the fastest *increase* of the cost and what we want is the fastest *decrease* of the cost, right?