How to tokenize data for NER

As I understand sometimes the word in input sentence can be represented after tokenization as a few digits. But I have fixed quantity of labels, which was depended from spaces between words. So, we can have different shapes of tokenized sentences and labels before padding. But as I understand correct there can be not good uncertainties after padding. Like padding label ([PAD]) that will match to the word, that labels was defined by human.

So, should I think about this at all?

Also, I would like to know, how to prevent shapes mismatches. For example I use func, validation_data=val_sentences_tags_zipped.padded_batch(64), epochs=3)

And I have an error after 2000 training steps like:

InvalidArgumentError: Graph execution error:
Node: 'Equal'
required broadcastable shapes
	 [[{{node Equal}}]] [Op:__inference_train_function_900677]

Should it be something like one of anomalies detection mechanisms?

So, the error was exactly about what I scared. I had input like
Trüb GmbH & Co. OHG

Labler did something like

But TextVectorization tokenizer did sentence like
[ 1, 1569, 1072, 3119],

And tags like
[3, 3, 3, 3, 3],

Also, I found very strange behavior of, train_tags_vec).padded_batch(64)

Looks like it doesn’t find maximum length of X and Y between each other. So, it pads just X raw by another X raws in the batch and Y by Y, but not X by Y.

By this reason I had batches like

(<tf.Tensor: shape=(64, 4), dtype=int64, numpy=
  array([[    1,     0,     0,     0],
         [    1,     0,     0,     0],
         [ 1953,     0,     0,     0],
 [    1,     0,     0,     0]])>,
  <tf.Tensor: shape=(64, 5), dtype=int64, numpy=
  array([[1, 0, 0, 0, 0],
         [1, 0, 0, 0, 0],
         [1, 0, 0, 0, 0],
         [5, 5, 0, 0, 0],

So, the shapes of inputs and tags are not of the same shape after a .padded_batch because & symbol just was omitted by tokenizer.

So, my main question is still leave. Are any ways to pad sequences without the defining of exact shapes via padded_shapes parameter and with the saving of different batches shapes inputs concept?