To close the loop on the public thread, there were two mistakes:
When calling the TF NMS function, the full list of boxes for all labels was being passed, instead of the selected list of boxes for the current label in that iteration of the loop.
The arguments to tf.gather were also backward, although you can see they were correct at one point in the previous posts showing exception traces earlier on this thread.