Hello @rmwkwok. Thanks for your input, I appreciate it!
Let me start by answering your questions about np.concatenate() and np.stack() to provide more context to my confusion with @Alireza_Saei response(although I very much appreciate his insight!) and to describe my improved understanding of these methods.
In my response to @Alireza_Saei , the context in which I described np.concatenate and np.stack was as if they are fundamentally different methods. However, with an improved understanding, I now understand they provide similar functionality; to join a sequence of arrays. The key difference is that np.concatenate joins a sequence of arrays along an existing axis and np.stack joins a sequence of arrays along a new axis.
My understanding of this difference is that np.concatenate joins a sequence of arrays along an existing axis in that the input arrays must have the same shape except for the dimension corresponding to the axis parameter(default=0). The returned concatenated array will have the same number of dimensions as the input arrays and have the values concatenated on the default or given, existing axis.
In terms of np.stack, this joins a sequence of arrays on a new axis in which the inputted arrays must have the same shape and dimensions. This method returns a stacked array which has one more dimension than the inputted arrays. In np.stack the axis parameter corresponds to the index of the new axis, which the values are being stacked on, in the dimensions of the array to be returned.
How I would use np.concatenate and np.stack differently, to reflect their different behaviors, in order to create the expected final 2-D array with shape(m,n’) would be based on how I prepare my input arrays.
For example:
The code and outputs above depict my original 2-D feature array(X) from my initial response to @Alireza_Saei as well as my concatenate(X_eng_feat) and stack(X_eng_feat_2) methods(your implementations are also included at the bottom). In terms of array shape and dimensionality, the input arrays for concatenation should be 2-D arrays with identical shape(in this case). For stack they should be 1-D arrays with identical shape in order to obtain the expected 2-D array with shape (m,n’).
In terms of correct transformed values and orientation for the concatenate method, I performed square and cubic transformations on the initial 1-D arrays in which my original 2-D feature array(X) is comprised of. For each feature, I then stacked these 1-D arrays on a new, horizontal axis(axis=1) to create 2-D arrays with shape (5,3) where axis=0 was the number of examples and axis=1 included the initial and transformed values for that feature. I then concatenated these three 2-D arrays along the existing horizontal axis to create the expected final 2-D array with shape (m,n’). For the stack method, I took all of these initial and transformed 1-D feature arrays and stacked them all together on a new horizontal axis(axis=1), in the orientation I think is ideal, to also create the expected final feature array.
Our implementation of concatenate and stack are slightly different and result in slightly different final arrays. For example, your implementation of concatenation works exclusively with 2-D arrays. I tried this implementation and while it creates a 2-D array with the correct shape, dimensionality, and transformed feature values; these features are not in the orientation in which I think is ideal for this model. Specifically, your concatenation implementation has the features in the orientation of x1, x2, x3, x1_square, x2_square, x3_square, x1_cube, x2_cube, x3_cube. My opinion is that a feature orientation like x1, x1_square, x1_cube, x2, x2_square, x2_cube, x3, x3_square, x3_cube would be more ideal as it emulates how the feature variables would be orientated in the model function. In terms of your output with my feature orientation would be 0, 10, 1, 11, 2, 12 (for that example). I’m not sure how much this matters or if it changes anything but it seems to correspond more to how the model is constructed. As you mentioned, I understand that my algorithm can’t read my python code to find out that certain groups of features have a power law relation. I just prefer it because I think it has a more similar organization to the model. I’m sure there is a clever way to implement concatenation which achieves my preferred feature orientations and works exclusively with 2-D arrays where you don’t stack before you concatenate like I did. However, this implementation although more rugged and less pretty, worked for me and I used all the same feature variables in the stack implementation as well.
Your implementation for stack is similar but instead of using array slicing to slice 1-D feature arrays out of 2-D feature arrays and stacking them together horizontally, I just used all the 1-D feature array variables I had already defined. Our stacked arrays also have the same difference in feature orientation. This implementation of stack is pretty much identical to the one I initially posted except I have taken some of your insight and used square brackets to enclose the input arrays in my concatenate and stack methods. I’ve been enclosing them in parentheses and it seems to yield the same outputs so not sure if this merits much consideration.
As an enthusiastic learner, who values the learning process, I would have been more than willing to follow your verification process of looking at each number and seeing if they are organized in the way you expect i.e., the correct structure or (m,n’). However, the “correct” structure was not clear.
@Alireza_Saei initial response outlined that I “don’t need a 3-D matrix for your features. Instead you should expand your feature matrix by generating polynomial terms for each feature and then concatenate them into a single 2-D matrix. Next step, stack them horizontally to create a new feature matrix with shape (m, n’) , where n’ includes the original and polynomial features.”
Given your insight and implementation, everything I outlined above, and the numpy documentation for these methods; would stacking 2-D feature arrays together return a 3-D feature array? If yes, I hope you can understand that trying to verify a correct structure (m,n’) from instructions which do not yield this structure can be slightly misleading and as a result prompted my response. Although this response was coupled with a loose grasp of the fundamental functionality of some of these tools; I can definitely say this understanding has been improved from this discourse. My confusion could also be a product of semantics and my understanding of his instructions or his illustration to me could have been lost in translation as I do not doubt @Alireza_Saei expertise!
Lastly and most importantly, yes, my model would have 9 trainable weights and 1 bias. I say would because I have not started training yet. I can make sure I have the correct number of weights by confirming my weight array “w” has the correct shape (n’,). I can check the number of weights by confirming that this line in the gradient descent function “w = w - alpha * dj_dw” is computing an element by element multiplication between alpha and the gradient and not a dot product to ensure that the shape of the weight array remains constant throughout training, perhaps? I could ensure that the weight gradient dj_dw also has the shape (n’,) and remains constant throughout training. Maybe print out the weights throughout training?
I also say would because it is not clear whether this process would be ideal. Are you saying that if I can ensure that my model will have one trainable weight for each corresponding feature; I can train this model with gradient descent with all the initial and polynomially transformed features included in the model? @TMosh mentioned that a neutral network can potentially address a lot of these questions as it can automatically create non-linear combinations of the input features so that the model gives the lowest cost fit. However, I am not there yet and would like to conduct this analysis as I’ve described throughout my posts if possible?
Thanks again Raymond for your insight. I hope we can continue this discourse.
Best Regards,
Matthew