Intuitively yes, your understanding is correct.
Everything about using the intermediate layer activation as a ‘training feature’ (assuming we converge to the global optima) that serves as ‘input layer’ to the downstream neural network architecture should work per the KKT conditions. Loosely this is like the “optimal path” condition of dynamic programming.
In simple terms, assume we “magically know a transformation” of input variables that ultimately lead to “features” like a_1=“affordability”, a_2=“awareness” and a_3=“perceived quality” produced by the original network. Assume we truncated neural network as shown below and train the weights “w” (starting from a random 3x1 matrix) with the same loss and same activation function:
Assuming the solution converges to a global optima the weights w learnt by the truncated network should be exactly equal to the weights learnt by the original neural network.
And finally, yes - ultimately if the final layer is a softmax or logit, and if you “magically know a transformation” that converts your input vector (say, image, text, video, …) into the pre-final layer activation, learning the final layer is a simple “logistic regression” optimization task. Also, yes - the hypothetical dimensions extracted using your procedure for the convnet can be treated as a input vector space for the downstream fully connected layer(s). In fact, a deep conv net is just an abstraction of a deep multi-layer neural network with “weight constraints”.