As we know that in neural network y=WX+b, where W is weights, X is input vector of shape (n, m), where n is number of training examples and m is the size of vector representation of words (i.e., the length of largest sentence in corpus).
For matrix multiplication W has to be of shape (no. of hidden units, n), where n is number of training examples.
The output from the testing code for UNQ_C4, I get different result suggesting y=XW+b. Here is the screen shot of that,
That is where I think you get it wrong - if input is of shape (n, m), then W has to be of shape (m, x), where m has to match input dimensionality and x is the desired output dimensionality.
In you example, if input is (2, 3), then W is of shape (3, 10) (3 input features, 10 output features) then output of this layer will be (2, 10) - number of inputs (2) should never get lost in batch processing of neural network. If for example next hidden layer has a W2 of shape (10, 2), then the output will be (2, 2) (for example for binary classification). Note that the first dimension was not lost.
As additional thing, b has also to match the output dimensions (and not the number of inputs). In our example, for the first layer, b would be of shape (10,) and for the second - b2 would be of shape (2,).