I was playing around with optimization and gradient descent to understand it fully when I came to try to optimize a function that is opaque (i.e a function that for which I don’t know the implementation). I’m curious to know more about this case.
An example of this kind of function would be a function from a 3rd party package that is configurable with some variables and I want to find the best value of these variable for my training set.

So let’s call this function f. f has one variables w to configure its behavior, and takes a parameter x. I have a cost function J, that I want to minimize with the best value of w.

So if I don’t know f then I don’t know the “known” derivative of f. That means there is no easy way to calculate the derivative of J. Right?

So the only way I can think of doing that is to “manually” calculate the derivative of f by changing w by a tiny amount and calculating the difference. However that would be very inefficient as we add a lot more calculation in contrast with the case when it’s a “known” derivative.

So I’m wondering if there exist some other efficient algorithm to optimize this kind of function?

If you use a framework like TensorFlow, you don’t need to manually calculate derivatives because it offers automatic differentiation, which calculates derivatives automatically for you. This feature can be helpful when optimizing these type of functions since it allows you to determine the gradient of the function with respect to the parameters.

I’ve also heard about the Bayesian optimization algorithm, but I don’t know much about it.

Thanks for the answer. I’ll read about automatic differentiation and Bayesian optimization. That seems to be exactly the kind of answer I was looking for

You don’t need to know the derivatives. If you know the outputs of the function for a range of input values, you can use those as a training set and learn a model of the function.

This is super helpful e.g. if you want to model highly complex relationships in highly dimensional parameter spaces, to compute the gradients efficiently.

Here, autodiff helps to align the complexity of parameters in Deeplearning with physical effects and keep them in sync during an optimization which is highly complex!