W2_A2_Ex-2_Reshaping the matrix

On Week 2: Logistic_Regression_with_a_Neural_Network_mindset Exercise 2, we are told to use the formula

X_flatten = X.reshape(X.shape[0],-1).T

on train_set_x_orig. However, when I saw this formula, I decided to use the simpler version

X_flatten = X.reshape(-1, X.shape[0]).

It seems to me that both of these would give the same result. However, I get an error when I applied the second one. Why is this the case?

1 Like

Hi @Sahngyoon_Rhee great question!

The difference between the two reshape operations you mentioned lies in how they arrange the elements of the array in the reshaped matrix.

Let’s break down the two operations:

  1. X_flatten = X.reshape(X.shape[0], -1).T:

    • X.reshape(X.shape[0], -1): This reshapes X into a 2D array where the first dimension is X.shape[0] (the number of training examples) and the second dimension is automatically inferred (-1).
    • .T: This transposes the resulting 2D array. Transposing swaps the rows and columns.
  2. X_flatten = X.reshape(-1, X.shape[0]):

    • This reshapes X into a 2D array where the first dimension is automatically inferred and the second dimension is X.shape[0]. There’s no transpose operation here.

In the context of neural networks, particularly in logistic regression, the usual convention is to have each column of X_flatten represent a training example. This is because neural network libraries often expect the input in this format, where each column is a feature vector for a single example.

  • X.reshape(X.shape[0], -1).T achieves this by first reshaping X with X.shape[0] as the first dimension and then transposing it, making X.shape[0] the second dimension, thus aligning each training example in columns.
  • X.reshape(-1, X.shape[0]), on the other hand, keeps the number of training examples as the second dimension without transposing, which misaligns the data for typical neural network processing.

The error you encountered is likely due to the fact that the reshaped array does not conform to the expected format for subsequent operations in the neural network, leading to dimension mismatches or incorrect processing of data.

If you want to go a little further print both approaches and check if there are differences there

1 Like

In addition to Pastor’s explanation, here is a thread which shows a way to visualize why the two methods are not equivalent. Just because the shapes end up the same is not enough.