Does this function even have a unique solution. It looks like a function with complex properties. Do we generally tend to use functions without a theoretical basis just because it was found to work well in practice?

No neural network has a “unique solution”. There’s an easy symmetry argument that shows there are many duplicate solutions to any given solution. And finding an actual global minimum is not what you want in any case, since it would represent extreme “overfitting”. Here is a paper from Yann LeCun’s group which explains some mathematics showing that sufficiently complex neural networks have reasonable solutions even though loss surfaces are extremely complicated and non-convex.

What do you mean “theoretical basis” here? We have a function that can be shown to measure something that is useful for the purposes we are trying to achieve, as demonstrated by good performance on test data.

Or maybe to put it another way: “yes, what we care about here is demonstrated performance, even if we can’t fully explain why it works.” If you think about it, we (meaning “we” in the sense of all humans) do this in many areas. The CDC will approve a drug if it succeeds in clinical trials, even if they can’t explain what the “mechanism of action” is. Of course the definition of success in clinical trials is quite rigorous and requires proof of both safety and efficacy.

Here’s a paper which discusses “weight-space symmetry”. You can find more by googling that as the search term.

yeah but the solution in most cases is locally countable, here it is locally uncountable… you would end up finding a solution just at the neighbourhood of what the proclaimed solution is, is this a well behaved function? i dont know

the solution is probably big local pockets of regions on the domain, so a good theoretical approach should be able to explain the properties of these regions, including bounds and such not just an arbitrary point inside it. how can we use something that we dont fully understand?

I think you are using the term “countable” in a different way than mathematicians use it. Countable numbers do not have to be finite. A set is *countable* if there is a bijection between it and a subset (including a proper subset) of the set of all natural numbers \mathbb{N}. E.g. \mathbb{Z} the set of all integers is countable. \mathbb{R} is uncountable.

Did you read the paper from Yann LeCun’s group that I linked above? It addresses exactly these questions about characterizing the solution spaces.

I also recently brought over a thread by Gordon Robinson that explains weight space symmetry in a more approachable way than the other paper that I referenced above.

so by unique i meant locally unique, hence, i brought the concept of countability in the discussion. its understood what countabillity means as used in common literature

its obvious from an analysis of neural networks that the loss functions are not convex everywhere, i am more interested in the properties of the triplet loss function

So what does “locally unique” mean? And how does the way you used the phrase “countable” map to the mathematical definition that I gave? What does “locally countable” mean?

locally uncountable means that if i were to draw an arbitrarily finite region, there is a possibility that i could find uncountably infinite solutions, it could also be a region whose properties are not known, it could be either open or closed

there is also a possibility that this could happen with neural networks where a locus of points can be admitted as a local solution, but i dont think that happens often (but i am not a practitioner so it would qualify as a guess)

again, its not a discussion on the semantics… i dont know if such a term as local uncountability exists, but rather a way to describe the characteristics of the solution, these seem to be local regions or pools whose properties are not well known and we are just arbitrarily choosing a point from such a pool without a method to it

You’re welcome to continue down this rabbit hole, but I think if you’re really that seriously interested in the issues here you should read the Yann LeCun paper.

As I mentioned earlier in this discussion, finding an actual global minimum would not be that desirable in any case. We are training on one set of data, but then the real criterion for success is how well the model does at predicting on the test set. The point being that spending a lot of energy thinking about whether there are unique solutions in convex subregions of the surface is actually not that relevant or useful. You should really read the paper. Mind you, I have not read it beyond the abstract, so you’re on your own.