Either shape could be used. There is no universal standard. You might find datasets and models with either orientation. So sometimes you have to transpose a matrix in order for the math to work.
The reason W is defined as a 2 \times 3 matrix instead of 3 \times 2 is based on how forward propagation is set up in this example. W is 2 \times 3 to match the structure of the neural network in this example, where there are 2 inputs and 3 neurons in the current layer. This arrangement allows forward propagation to work as expected when calculating activations for each unit in the layer.
However, as @TMosh mentioned, the orientation of W may vary depending on how forward propagation is defined or the conventions of different frameworks. Some models or datasets may define W as \text{(units, features)} instead of \text{(features, units)} , so it’s common to encounter both orientations in practice, where transposing might be necessary to match expected dimensions.