So I am quite certain I’m going to have more in-depth questions as I work through the material a few times. Yet up front, a few points on which I’d appreciate clarification:
Still working on fully securing my Linear Algebra knowledge, but I know the variables ‘U’ and ‘V’ can have a somewhat ‘loaded’ context (orthogonal ?)-- For example in computer graphics. Is there some particular reason we are chosing these to start with in the decomposition ? And what if our equation is longer-- Can we go U, V, W, etc ? Or is it ‘standard practice’ to just compress whatever derivatives we are dealing we in our lead up functions to only U and V ?
Along with this, in the given first generic example it appears as if the elements of U and V are rather simply broken up into a neat sort of ‘order of operations’ sort of method where the equation is simple, thus works out nicely. But when we move on to the Logistic Regression, it seems(?) as if we are deducing our derivatives on much larger ‘chunks’ of the final equation.
Sorry if I am not being clear on this, but how are we supposed to know where our ‘break / inflection points’ are ? Wherever their is a new elementary computation involving an additional variable ?
Though he continues with this methodology in ‘Derivatives of a Computation Graph’.
The formula he starts with, J(a,b,c) = 3(a+bc), technically has three partial derivatives-- None the less though the way he decomposes it gives only two variables, u, and v, which together comprise J.
The example is rather a poor one, since its goal is to discuss the concept of the Computational Graph, but the example using that equation (3*(a + b*c)) has nothing at all to do with any machine learning method.
So the business about U and V can simply be ignored, because it’s not really part of any useful algorithm.
Computational Graphs aren’t really used in practice - it’s just a nice graphical method of showing how an algebra computation is performed.
This example makes the topic a lot more complicated than necessary.