Prof Ng spends quite a bit of time on these issues in the lectures. It might be worth watching them again. In Sequence Models, there is quite a bit more flexibility in terms of the way you map from inputs to outputs than there are in DNN or CNN architectures. Look for the lecture in which Prof Ng shows this information, which I wrote in my notes:
What if T_x is different from T_y?
many to many (same or different)
many to one
one to one
one to many
He then proceeds to give examples of all those types of networks and the types of problems they are applicable for. An example of many to many with different input count and output count would be translating sentences from English into French or vice versa: the “timesteps” in the input are the individual words, but there is no guarantee that the French translation will have the same number of words (could be more or could be less in different examples). An example of “many to one” would be sentiment classification, where again the T_x is the number of words in the input sentence (which varies per sample) and then the output is one value (either “Positive/Negative” or maybe a softmax output with more choices).
n_a is the size of the “hidden state” of your RNN node. If you mean the shape of the input, it would be (nx, m, Tx), which would be (1000, 3, 4) in your example. Then the output would be (ny, m, Ty) which would be (2, 3, 1) in that case, because there is only one timestep in the output (the sentiment). It might be the case that you could get away with (1, 3, 1) in that case: a binary output is a special case of softmax with n = 2, so you really only need one value to represent the answer (meaning that a “one hot” vector with two elements is redundant).